Predictive Modelling

Customer lifetime value and the proliferation of misinformation on the internet

There’s a lot of misleading content on the estimation of customer lifetime value. Here’s what I learned about doing it well.

Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions

Discussing the need for untested assumptions and temporality in causal inference. Mostly based on Samantha Kleinberg’s Causality, Probability, and Time.

Why you should stop worrying about deep learning and deepen your understanding of causality instead

Causality is often overlooked but is of much higher relevance to most data scientists than deep learning.

The joys of offline data collection

Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey.

The hardest parts of data science

Defining feasible problems and coming up with reasonable ways of measuring solutions is harder than building accurate models or obtaining clean data.

Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling

Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work.

The wonderful world of recommender systems

Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems.

Learning about deep learning through album cover classification

Progress on my album cover classification project, highlighting lessons that would be useful to others who are getting started with deep learning.

Hopping on the deep learning bandwagon

To become proficient at solving data science problems, you need to get your hands dirty. Here, I used album cover classification to learn about deep learning.

First steps in data science: author-aware sentiment analysis

I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program.

My PhD work

An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models.

Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)

My team’s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams).

Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)

Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams).

Stochastic Gradient Boosting: Choosing the Best Number of Iterations

Exploring an approach to choosing the optimal number of iterations in stochastic gradient boosting, following a bug I found in scikit-learn.

Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)

Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams.

Greek Media Monitoring Kaggle competition: My approach

Summary of my approach to the Greek Media Monitoring Kaggle competition, where I finished 6th out of 120 teams.

Bandcamp recommendation and discovery algorithms

The recommendation backend for my BCRecommender service for personalised Bandcamp music discovery.

How to (almost) win Kaggle competitions

Summary of a talk I gave at the Data Science Sydney meetup with ten tips on almost-winning Kaggle competitions.

Kaggle competition tips and summaries

Pointers to all my Kaggle advice posts and competition summaries.