Introducing pipe, The Automattic Machine Learning Pipeline

One of the main projects I’ve been working on over the past year. A generalized machine learning pipeline, pipe serves the entire company and helps Automatticians seamlessly build and deploy machine learning models to predict the likelihood that a given event may occur, e.g., installing a plugin, purchasing a plan, or churning… Read more on data.blog

November 20, 2018 · Yanir Seroussi

Reflections on remote data science work

It’s been about a year and a half since I joined Automattic as a remote data scientist. This is the longest I’ve been in one position since finishing my PhD in 2012. This is also the first time I’ve worked full-time with a fully-distributed team. In this post, I briefly discuss some of the top pluses and minuses of remote work, based on my experience so far. + Flexible hours...

November 3, 2018 · Yanir Seroussi

Defining data science in 2018

I got my first data science job in 2012, the year Harvard Business Review announced data scientist to be the sexiest job of the 21st century. Two years later, I published a post on my then-favourite definition of data science, as the intersection between software engineering and statistics. Unfortunately, that definition became somewhat irrelevant as more and more people jumped on the data science bandwagon – possibly to the point of making data scientist useless as a job title....

July 22, 2018 · Yanir Seroussi

Engineering Data Science at Automattic

A post I’ve written on applying some software engineering best practices to data science projects: Most data scientists have to write code to analyze data or build products. While coding, data scientists act as software engineers. Adopting best practices from software engineering is key to ensuring the correctness, reproducibility, and maintainability of data science projects. This post describes some of our efforts in the area… Read more on data.blog

March 20, 2018 · Yanir Seroussi

Advice for aspiring data scientists and other FAQs

Aspiring data scientists and other visitors to this site often repeat the same questions. This post is the definitive collection of my answers to such questions (which may evolve over time). How do I become a data scientist? It depends on your situation. Before we get into it, have you thought about why you want to become a data scientist? Hmm… Not really. Why should I become a data scientist?...

October 15, 2017 · Yanir Seroussi

My 10-step path to becoming a remote data scientist with Automattic

About two years ago, I read the book The Year without Pants, which describes the author’s experience leading a team at Automattic (the company behind WordPress.com, among other products). Automattic is a fully-distributed company, which means that all of its employees work remotely (hence pants are optional). While the book discusses some of the challenges of working remotely, the author’s general experience was very positive. A few months after reading the book, I decided to look for a full-time position after a period of independent work....

July 29, 2017 · Yanir Seroussi

Exploring and visualising reef life survey data

Last year, I wrote about the Reef Life Survey (RLS) project and my experience with offline data collection on the Great Barrier Reef. I found that using auto-generated flashcards with an increasing level of difficulty is a good way to memorise marine species. Since publishing that post, I have improved the flashcards and built a tool for exploring the aggregate survey data. Both tools are now publicly available on the RLS website....

June 3, 2017 · Yanir Seroussi

Customer lifetime value and the proliferation of misinformation on the internet

Suppose you work for a business that has paying customers. You want to know how much money your customers are likely to spend to inform decisions on customer acquisition and retention budgets. You’ve done a bit of research, and discovered that the figure you want to calculate is commonly called the customer lifetime value. You google the term, and end up on a page with ten results (and probably some ads)....

January 8, 2017 · Yanir Seroussi

Ask Why! Finding motives, causes, and purpose in data science

Some people equate predictive modelling with data science, thinking that mastering various machine learning techniques is the key that unlocks the mysteries of the field. However, there is much more to data science than the What and How of predictive modelling. I recently gave a talk where I argued the importance of asking Why, touching on three different topics: stakeholder motives, cause-and-effect relationships, and finding a sense of purpose. A video of the talk is available below....

September 19, 2016 · Yanir Seroussi

If you don’t pay attention, data can drive you off a cliff

You’re a hotshot manager. You love your dashboards and you keep your finger on the beating pulse of the business. You take pride in using data to drive your decisions rather than shooting from the hip like one of those old-school 1950s bosses. This is the 21st century, and data is king. You even hired a sexy statistician or data scientist, though you don’t really understand what they do. Never mind, you can proudly tell all your friends that you are leading a modern data-driven team....

August 21, 2016 · Yanir Seroussi