## Customer lifetime value and the proliferation of misinformation on the internet

There’s a lot of misleading content on the estimation of customer lifetime value. Here’s what I learned about doing it well.

There’s a lot of misleading content on the estimation of customer lifetime value. Here’s what I learned about doing it well.

Discussing the need for untested assumptions and temporality in causal inference. Mostly based on Samantha Kleinberg’s Causality, Probability, and Time.

Causality is often overlooked but is of much higher relevance to most data scientists than deep learning.

Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey.

Defining feasible problems and coming up with reasonable ways of measuring solutions is harder than building accurate models or obtaining clean data.

Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work.

Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems.

Progress on my album cover classification project, highlighting lessons that would be useful to others who are getting started with deep learning.

To become proficient at solving data science problems, you need to get your hands dirty. Here, I used album cover classification to learn about deep learning.

I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program.

An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models.

My team’s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams).

Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams).

Exploring an approach to choosing the optimal number of iterations in stochastic gradient boosting, following a bug I found in scikit-learn.

Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams.

Summary of my approach to the Greek Media Monitoring Kaggle competition, where I finished 6th out of 120 teams.

The recommendation backend for my BCRecommender service for personalised Bandcamp music discovery.

Summary of a talk I gave at the Data Science Sydney meetup with ten tips on almost-winning Kaggle competitions.

Pointers to all my Kaggle advice posts and competition summaries.