Customer lifetime value and the proliferation of misinformation on the internet

There’s a lot of misleading content on the estimation of customer lifetime value. Here’s what I learned about doing it well.

January 8, 2017

Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions

Discussing the need for untested assumptions and temporality in causal inference. Mostly based on Samantha Kleinberg’s Causality, Probability, and Time.

May 14, 2016

Why you should stop worrying about deep learning and deepen your understanding of causality instead

Causality is often overlooked but is of much higher relevance to most data scientists than deep learning.

February 14, 2016

The joys of offline data collection

Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey.

January 24, 2016

The hardest parts of data science

Defining feasible problems and coming up with reasonable ways of measuring solutions is harder than building accurate models or obtaining clean data.

November 23, 2015

Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling

Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work.

October 19, 2015

The wonderful world of recommender systems

Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems.

October 2, 2015

Learning about deep learning through album cover classification

Progress on my album cover classification project, highlighting lessons that would be useful to others who are getting started with deep learning.

July 6, 2015

Hopping on the deep learning bandwagon

To become proficient at solving data science problems, you need to get your hands dirty. Here, I used album cover classification to learn about deep learning.

June 6, 2015

First steps in data science: author-aware sentiment analysis

I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program.

May 2, 2015

My PhD work

An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models.

March 30, 2015

Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)

My team’s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams).

February 11, 2015

Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)

Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams).

January 29, 2015

Stochastic Gradient Boosting: Choosing the Best Number of Iterations

Exploring an approach to choosing the optimal number of iterations in stochastic gradient boosting, following a bug I found in scikit-learn.

December 29, 2014

Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)

Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams.

November 19, 2014

Greek Media Monitoring Kaggle competition: My approach

Summary of my approach to the Greek Media Monitoring Kaggle competition, where I finished 6th out of 120 teams.

October 7, 2014

Bandcamp recommendation and discovery algorithms

The recommendation backend for my BCRecommender service for personalised Bandcamp music discovery.

September 19, 2014

How to (almost) win Kaggle competitions

Summary of a talk I gave at the Data Science Sydney meetup with ten tips on almost-winning Kaggle competitions.

August 24, 2014

Kaggle competition tips and summaries

Pointers to all my Kaggle advice posts and competition summaries.

April 5, 2014
Subscribe