Stay alert! Security is everyone's responsibility

Questions to assess the security posture of a startup, focusing on basic hygiene and handling of sensitive data.

July 1, 2024

Is your tech stack ready for data-intensive applications?

Questions to assess the quality of tech stacks and lifecycles, with a focus on artificial intelligence, machine learning, and analytics.

June 24, 2024

Dealing with endless data changes

Quotes from Demetrios Brinkmann on the relationship between MLOps and DevOps, with MLOps allowing for managing changes that come from data.

June 22, 2024

How to avoid startups with poor development processes

Questions that prospective data specialists and engineers should ask about development processes before accepting a startup role.

June 3, 2024

Assessing a startup's data-to-AI health

Reviewing the areas that should be assessed to determine a startup’s opportunities and challenges on the data/AI/ML front.

April 22, 2024

AI does not obviate the need for testing and observability

It’s easy to prototype with AI, but production-grade AI apps require even more thorough testing and observability than traditional software.

April 15, 2024

Questions to consider when using AI for PDF data extraction

Discussing considerations that arise when attempting to automate the extraction of structured data from PDFs and similar documents.

March 11, 2024

Avoiding AI complexity: First, write no code

Two stories of getting AI functionality to production, which demonstrate the risks inherent in custom development versus starting with a no-code approach.

February 26, 2024

Nudging ChatGPT to invent books you have no time to read

Getting ChatGPT Plus to elaborate on possible book content and produce a PDF cheatsheet, with the goal of learning about its capabilities.

February 12, 2024

Future software development may require fewer humans

Reflecting on an interview with Jason Warner, CEO of poolside.

February 6, 2024

Supporting volunteer monitoring of marine biodiversity with modern web and data tools

Summarising the work Uri Seroussi and I did to improve Reef Life Survey’s Reef Species of the World app.

November 29, 2023

You don't need a proprietary API for static maps

For many use cases, libraries like cartopy are better than the likes of Mapbox and Google Maps.

November 21, 2023

Lessons from reluctant data engineering

Video and summary of a talk I gave at DataEngBytes Brisbane on what I learned from doing data engineering as part of every data science role I had.

October 25, 2023

Google's Rules of Machine Learning still apply in the age of large language models

Despite the excitement around large language models, building with machine learning remains an engineering problem with established best practices.

September 21, 2023

Was data science a failure mode of software engineering?

Yes, data science projects have suffered from classic software engineering mistakes, but the field is maturing with the rise of new engineering roles.

June 30, 2023

How hackable are automated coding assessments?

Exploring the hackability of speed-based coding tests, using CodeSignal’s Industry Coding Framework as a case study.

May 26, 2023

Building useful machine learning tools keeps getting easier: A fish ID case study

Lessons learned building a fish ID web app with and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments.

March 20, 2022

My work with Automattic

Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I’ve done with the company.

October 7, 2021

Software commodities are eating interesting data science work

Being a data scientist can sometimes feel like a race against software commodities that replace interesting work. What can one do to remain relevant?

January 11, 2020

Bootstrapping the right way?

Video and summary of a talk I gave at YOW! Data on bootstrap estimation of confidence intervals.

October 6, 2019

Hackers beware: Bootstrap sampling may be harmful

Bootstrap sampling has been promoted as an easy way of modelling uncertainty to hackers without much statistical knowledge. But things aren’t that simple.

January 7, 2019

Exploring and visualising Reef Life Survey data

Web tools I built to visualise Reef Life Survey data and assist citizen scientists in underwater visual census work.

June 3, 2017

Is Data Scientist a useless job title?

It seems like anyone who touches data can call themselves a data scientist, which makes the title useless. The work they do can still be useful, though.

August 4, 2016

Migrating a simple web application from MongoDB to Elasticsearch

Migrating BCRecommender from MongoDB to Elasticsearch made it possible to offer a richer search experience to users at a similar cost, among other benefits.

November 4, 2015

The wonderful world of recommender systems

Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems.

October 2, 2015


Migrating my web apps away from due to reliability issues. Self-hosting is a better solution.

July 31, 2015

First steps in data science: author-aware sentiment analysis

I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program.

May 2, 2015

Automating bulk data imports

A script for importing data into the Parse backend-as-a-service.

January 15, 2015

What is data science?

Data science has been a hot term in the past few years. Still, there isn’t a single definition of the field. This post discusses my favourite definition.

October 23, 2014

Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)

Iterating on my BCRecommender service with the goal of keeping costs low while providing a valuable music recommendation service.

September 7, 2014