Questions to consider when using AI for PDF data extraction

Discussing considerations that arise when attempting to automate the extraction of structured data from PDFs and similar documents.

March 11, 2024

Two types of startup data problems

Classifying startups as ML-centric or non-ML is a helpful exercise to uncover the data challenges they’re likely to face.

March 4, 2024

Avoiding AI complexity: First, write no code

Two stories of getting AI functionality to production, which demonstrate the risks inherent in custom development versus starting with a no-code approach.

February 26, 2024

Nudging ChatGPT to invent books you have no time to read

Getting ChatGPT Plus to elaborate on possible book content and produce a PDF cheatsheet, with the goal of learning about its capabilities.

February 12, 2024

Future software development may require fewer humans

Reflecting on an interview with Jason Warner, CEO of poolside.

February 6, 2024

New decade, new tagline: Data & AI for Impact

Shifting focus to ‘Data & AI for Impact’, with more startup-related content, increased posting frequency, and deeper audience engagement.

January 19, 2024

Artificial intelligence was a marketing term all along – just call it automation

Replacing ‘artificial intelligence’ with ‘automation’ is a useful trick for cutting through the hype.

October 6, 2023

Google's Rules of Machine Learning still apply in the age of large language models

Despite the excitement around large language models, building with machine learning remains an engineering problem with established best practices.

September 21, 2023

Was data science a failure mode of software engineering?

Yes, data science projects have suffered from classic software engineering mistakes, but the field is maturing with the rise of new engineering roles.

June 30, 2023

How hackable are automated coding assessments?

Exploring the hackability of speed-based coding tests, using CodeSignal’s Industry Coding Framework as a case study.

May 26, 2023

Remaining relevant as a small language model

Bing Chat recently quipped that humans are small language models. Here are some of my thoughts on how we small language models can remain relevant (for now).

April 21, 2023

ChatGPT is transformative AI

My perspective after a week of using ChatGPT: This is a step change in finding distilled information, and it’s only the beginning.

December 11, 2022

Causal Machine Learning is off to a good start, despite some issues

Reviewing the first three chapters of the book Causal Machine Learning by Robert Osazuwa Ness.

September 12, 2022

Building useful machine learning tools keeps getting easier: A fish ID case study

Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments.

March 20, 2022

Use your human brain to avoid artificial intelligence disasters

Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI.

November 22, 2021

Defining data science in 2018

Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions.

July 22, 2018

My PhD work

An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models.

March 30, 2015
Subscribe