The rise of greedy robots

Given the impressive advancement of machine intelligence in recent years, many people have been speculating on what the future holds when it comes to the power and roles of robots in our society. Some have even called for regulation of machine intelligence before it’s too late. My take on this issue is that there is no need to speculate – machine intelligence is already here, with greedy robots already dominating our lives....

March 20, 2016 · Yanir Seroussi

Why you should stop worrying about deep learning and deepen your understanding of causality instead

Everywhere you go these days, you hear about deep learning’s impressive advancements. New deep learning libraries, tools, and products get announced on a regular basis, making the average data scientist feel like they’re missing out if they don’t hop on the deep learning bandwagon. However, as Kamil Bartocha put it in his post The Inconvenient Truth About Data Science, 95% of tasks do not require deep learning. This is obviously a made up number, but it’s probably an accurate representation of the everyday reality of many data scientists....

February 14, 2016 · Yanir Seroussi

The joys of offline data collection

Many modern data scientists don’t get to experience data collection in the offline world. Recently, I spent a month sailing down the northern Great Barrier Reef, collecting data for the Reef Life Survey project. In addition to being a great diving experience, the trip helped me obtain general insights on data collection and machine learning, which are shared in this article. The Reef Life Survey project Reef Life Survey (RLS) is a citizen scientist project, led by a team from the University of Tasmania....

January 24, 2016 · Yanir Seroussi

This holiday season, give me real insights

Merriam-Webster defines an insight as an understanding of the true nature of something. Many companies seem to define an insight as any piece of data or information, which I would call a pseudo-insight. This post surveys some examples of pseudo-insights, and discusses how these can be built upon to provide real insights. Exhibit A: WordPress stats This website is hosted on wordpress.com. I’m generally happy with WordPress – though it’s not as exciting and shiny as newer competitors, it is rock-solid and very feature-rich....

December 8, 2015 · Yanir Seroussi

The hardest parts of data science

Contrary to common belief, the hardest part of data science isn’t building an accurate model or obtaining good, clean data. It is much harder to define feasible problems and come up with reasonable ways of measuring solutions. This post discusses some examples of these issues and how they can be addressed. The not-so-hard parts Before discussing the hardest parts of data science, it’s worth quickly addressing the two main contenders: model fitting and data collection/cleaning....

November 23, 2015 · Yanir Seroussi

Migrating a simple web application from MongoDB to Elasticsearch

Bandcamp Recommender (BCRecommender) is a web application that serves music recommendations from Bandcamp. I recently switched BCRecommender’s data store from MongoDB to Elasticsearch. This has made it possible to offer a richer search experience to users at a similar cost. This post describes the migration process and discusses some of the advantages and disadvantages of using Elasticsearch instead of MongoDB. Motivation: Why swap MongoDB for Elasticsearch? I’ve written a few posts in the past on BCRecommender’s design and implementation....

November 4, 2015 · Yanir Seroussi

Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling

I recently finished reading the book In Defense of Food: An Eater’s Manifesto by Michael Pollan. The book criticises nutritionism – the idea that one should eat according to the sum of measured nutrients while ignoring the food that contains these nutrients. The key argument of the book is that since the knowledge derived using food science is still very limited, completely relying on the partial findings and tools provided by this science is likely to lead to health issues....

October 19, 2015 · Yanir Seroussi

The wonderful world of recommender systems

I recently gave a talk about recommender systems at the Data Science Sydney meetup (the slides are available here). This post roughly follows the outline of the talk, expanding on some of the key points in non-slide form (i.e., complete sentences and paragraphs!). The first few sections give a broad overview of the field and the common recommendation paradigms, while the final part is dedicated to debunking five common myths about recommender systems....

October 2, 2015 · Yanir Seroussi

You don’t need a data scientist (yet)

The hype around big data has caused many organisations to hire data scientists without giving much thought to what these data scientists are going to do and whether they’re actually needed. This is a source of frustration for all parties involved. This post discusses some questions you should ask yourself before deciding to hire your first data scientist. Q1: Do you know what data scientists do? Somewhat surprisingly, there are quite a few companies that hire data scientists without having a clear idea of what data scientists actually do....

August 24, 2015 · Yanir Seroussi

Goodbye, Parse.com

Over the past year, I’ve been using Parse‘s free backend-as-a-service and web hosting to serve BCRecommender (music recommendation service) and Price Dingo (now-closed shopping comparison engine). The main lesson: You get what you pay for. Despite some improvements, Parse remains very unreliable, and any time saved by using their APIs and SDKs tends to be offset by having to work around the restrictions of their sandboxed environment. This post details some of the issues I faced and the transition away from the service....

July 31, 2015 · Yanir Seroussi