Check out the About page for a general intro and contact options. Browse recent posts below, and get notified about new posts by subscribing to the mailing list at the bottom of any page or on TinyLetter.

Building useful machine learning tools keeps getting easier: A fish ID case study
Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments.

Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials
Epidemiologists analyse clinical trials to estimate the intention-to-treat and per-protocol effects. This post applies their strategies to online experiments.

Use your human brain to avoid artificial intelligence disasters
Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI.

Migrating from WordPress.com to Hugo on GitHub + Cloudflare
My reasons for switching from WordPress.com to Hugo on GitHub + Cloudflare, along with a summary of the solution components and migration process.

Some highlights from 2020
My track record of posting here has been pretty poor in 2020, partly because of a bunch of content Iāve contributed elsewhere. In general, my guiding principle for posting is to only add stuff Iād want to read or cite, e.g., because I havenāt seen it discussed elsewhere. Well, no one has compiled a meta-post of my public work from 2020 (that I know of), so itās finally time to publish it myself....

Many is not enough: Counting simulations to bootstrap the right way
Previously, I encouraged readers to test different approaches to bootstrapped confidence interval (CI) estimation. Such testing can done by relying on the definition of CIs: Given an infinite number of independent samples from the same population, we expect a ci_level CI to contain the population parameter in exactly ci_level percent of the samples. Therefore, we run āmanyā simulations (num_simulations), where each simulation generates a random sample from the same population and runs the CI algorithm on the sample....

Software commodities are eating interesting data science work
The passage of time makes wizards of us all. Today, any dullard can make bells ring across the ocean by tapping out phone numbers, cause inanimate toys to march by barking an order, or activate remote devices by touching a wireless screen. Thomas Edison couldnāt have managed any of this at his peakāand shortly before his time, such powers would have been considered the unique realm of God....

A day in the life of a remote data scientist
Earlier this year, I gave a talk titled A Day in the Life of a Remote Data Scientist at the Data Science Sydney meetup. The talk covered similar ground to a post I published on remote data science work, with additional details on my daily schedule and projects, some gifs and Sydney jokes, heckling by the audience, and a Q&A session. I managed to watch it a few months ago without cringing too much, so itās about time to post it here....

Bootstrapping the right way?
Bootstrapping the right way is a talk I gave earlier this year at the YOW! Data conference in Sydney. You can now watch the video of the talk and have a look through the slides. The content of the talk is similar to a post I published on bootstrapping pitfalls, with some additional simulations. The main takeaways shared in the talk are: Donāt compare single-sample confidence intervals by eye Use enough resamples (15K?...

How to Increase Retention and Revenue in 1,000 Nontrivial Steps
One of the main projects I worked on last year. Recently, Automattic created a Marketing Data team to support marketing efforts with dedicated data capabilities. As we got started, one important question loomed for me and my teammate Demet Dagdelen: What should we data scientists do as part of this team� Read more on data.blog