Machine Learning

AI/ML lifecycle models versus real-world mess

The real world of AI/ML doesn’t fit into a neat diagram, so I created another diagram and a maturity heatmap to model the mess.

Is your tech stack ready for data-intensive applications?

Questions to assess the quality of tech stacks and lifecycles, with a focus on artificial intelligence, machine learning, and analytics.

Dealing with endless data changes

Quotes from Demetrios Brinkmann on the relationship between MLOps and DevOps, with MLOps allowing for managing changes that come from data.

Assessing a startup's data-to-AI health

Reviewing the areas that should be assessed to determine a startup’s opportunities and challenges on the data/AI/ML front.

AI does not obviate the need for testing and observability

It’s easy to prototype with AI, but production-grade AI apps require even more thorough testing and observability than traditional software.

Artificial intelligence, automation, and the art of counting fish

Discussing the use of AI to automate underwater marine surveys as an example of the uneven distribution of technological advancement.

Questions to consider when using AI for PDF data extraction

Discussing considerations that arise when attempting to automate the extraction of structured data from PDFs and similar documents.

Two types of startup data problems

Classifying startups as ML-centric or non-ML is a helpful exercise to uncover the data challenges they’re likely to face.

Avoiding AI complexity: First, write no code

Two stories of getting AI functionality to production, which demonstrate the risks inherent in custom development versus starting with a no-code approach.

Transfer learning applies to energy market bidding

An interesting approach to bidding of energy storage assets, showing that training on New York data is transferable to Queensland.

Supporting volunteer monitoring of marine biodiversity with modern web and data tools

Summarising the work Uri Seroussi and I did to improve Reef Life Survey’s Reef Species of the World app.

Google's Rules of Machine Learning still apply in the age of large language models

Despite the excitement around large language models, building with machine learning remains an engineering problem with established best practices.

ChatGPT is transformative AI

My perspective after a week of using ChatGPT: This is a step change in finding distilled information, and it’s only the beginning.

Causal Machine Learning is off to a good start, despite some issues

Reviewing the first three chapters of the book Causal Machine Learning by Robert Osazuwa Ness.

Building useful machine learning tools keeps getting easier: A fish ID case study

Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments.

Use your human brain to avoid artificial intelligence disasters

Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI.

My work with Automattic

Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I’ve done with the company.

Defining data science in 2018

Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions.

Why you should stop worrying about deep learning and deepen your understanding of causality instead

Causality is often overlooked but is of much higher relevance to most data scientists than deep learning.

Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling

Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work.

The wonderful world of recommender systems

Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems.

Learning about deep learning through album cover classification

Progress on my album cover classification project, highlighting lessons that would be useful to others who are getting started with deep learning.

Hopping on the deep learning bandwagon

To become proficient at solving data science problems, you need to get your hands dirty. Here, I used album cover classification to learn about deep learning.

First steps in data science: author-aware sentiment analysis

I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program.

My PhD work

An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models.

Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)

My team’s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams).

Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)

Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams).

Stochastic Gradient Boosting: Choosing the Best Number of Iterations

Exploring an approach to choosing the optimal number of iterations in stochastic gradient boosting, following a bug I found in scikit-learn.