If you don't think about your modelling context, you're gonna have a bad time.

Use your human brain to avoid artificial intelligence disasters

Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI.

November 22, 2021 · Yanir Seroussi

Many is not enough: Counting simulations to bootstrap the right way

Previously, I encouraged readers to test different approaches to bootstrapped confidence interval (CI) estimation. Such testing can done by relying on the definition of CIs: Given an infinite number of independent samples from the same population, we expect a ci_level CI to contain the population parameter in exactly ci_level percent of the samples. Therefore, we run “many” simulations (num_simulations), where each simulation generates a random sample from the same population and runs the CI algorithm on the sample....

August 24, 2020 · Yanir Seroussi

Software commodities are eating interesting data science work

The passage of time makes wizards of us all. Today, any dullard can make bells ring across the ocean by tapping out phone numbers, cause inanimate toys to march by barking an order, or activate remote devices by touching a wireless screen. Thomas Edison couldn’t have managed any of this at his peak—and shortly before his time, such powers would have been considered the unique realm of God....

January 11, 2020 · Yanir Seroussi

A day in the life of a remote data scientist

Earlier this year, I gave a talk titled A Day in the Life of a Remote Data Scientist at the Data Science Sydney meetup. The talk covered similar ground to a post I published on remote data science work, with additional details on my daily schedule and projects, some gifs and Sydney jokes, heckling by the audience, and a Q&A session. I managed to watch it a few months ago without cringing too much, so it’s about time to post it here....

December 11, 2019 · Yanir Seroussi

Bootstrapping the right way?

Bootstrapping the right way is a talk I gave earlier this year at the YOW! Data conference in Sydney. You can now watch the video of the talk and have a look through the slides. The content of the talk is similar to a post I published on bootstrapping pitfalls, with some additional simulations. The main takeaways shared in the talk are: Don’t compare single-sample confidence intervals by eye Use enough resamples (15K?...

October 6, 2019 · Yanir Seroussi

Hackers beware: Bootstrap sampling may be harmful

Bootstrap sampling techniques are very appealing, as they don’t require knowing much about statistics and opaque formulas. Instead, all one needs to do is resample the given data many times, and calculate the desired statistics. Therefore, bootstrapping has been promoted as an easy way of modelling uncertainty to hackers who don’t have much statistical knowledge. For example, the main thesis of the excellent Statistics for Hackers talk by Jake VanderPlas is: “If you can write a for-loop, you can do statistics”....

January 7, 2019 · Yanir Seroussi

The most practical causal inference book I’ve read (is still a draft)

I’ve been interested in the area of causal inference in the past few years. In my opinion it’s more exciting and relevant to everyday life than more hyped data science areas like deep learning. However, I’ve found it hard to apply what I’ve learned about causal inference to my work. Now, I believe I’ve finally found a book with practical techniques that I can use on real problems: Causal Inference by Miguel Hernán and Jamie Robins....

December 24, 2018 · Yanir Seroussi

Reflections on remote data science work

It’s been about a year and a half since I joined Automattic as a remote data scientist. This is the longest I’ve been in one position since finishing my PhD in 2012. This is also the first time I’ve worked full-time with a fully-distributed team. In this post, I briefly discuss some of the top pluses and minuses of remote work, based on my experience so far. + Flexible hours...

November 3, 2018 · Yanir Seroussi

Defining data science in 2018

I got my first data science job in 2012, the year Harvard Business Review announced data scientist to be the sexiest job of the 21st century. Two years later, I published a post on my then-favourite definition of data science, as the intersection between software engineering and statistics. Unfortunately, that definition became somewhat irrelevant as more and more people jumped on the data science bandwagon – possibly to the point of making data scientist useless as a job title....

July 22, 2018 · Yanir Seroussi

Engineering Data Science at Automattic

A post I’ve written on applying some software engineering best practices to data science projects: Most data scientists have to write code to analyze data or build products. While coding, data scientists act as software engineers. Adopting best practices from software engineering is key to ensuring the correctness, reproducibility, and maintainability of data science projects. This post describes some of our efforts in the area… Read more on data.blog

March 20, 2018 · Yanir Seroussi