Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)

This is the second and last post summarising my team’s solution for the Yandex search personalisation Kaggle competition. See the first post for a summary of the dataset, evaluation approach, and some thoughts about search engine optimisation and privacy. This post discusses the algorithms and features we used. To quickly recap the first post, Yandex released a 16GB dataset of query & click logs. The goal of the competition was to use this data to rerank query results such that the more relevant results appear before less relevant results....

February 11, 2015 · Yanir Seroussi

Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)

About a year ago, I participated in the Yandex search personalisation Kaggle competition. I started off as a solo competitor, and then added a few Kaggle newbies to the team as part of a program I was running for the Sydney Data Science Meetup. My team hasn’t done too badly, finishing 9th out of 194 teams. As is usually the case with Kaggle competitions, the most valuable part was the lessons learned from the experience....

January 29, 2015 · Yanir Seroussi

Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)

Messy data, buggy software, but all in all a good learning experience... Early last year, I had some free time on my hands, so I decided to participate in yet another Kaggle competition. Having never done any price forecasting work before, I thought it would be interesting to work on the Blue Book for Bulldozers competition, where the goal was to predict the sale price of auctioned bulldozers. I’ve done alright, finishing 9th out of 476 teams....

November 19, 2014 · Yanir Seroussi

Greek Media Monitoring Kaggle competition: My approach

A few months ago I participated in the Kaggle Greek Media Monitoring competition. The goal of the competition was doing multilabel classification of texts scanned from Greek print media. Despite not having much time due to travelling and other commitments, I managed to finish 6th (out of 120 teams). This post describes my approach to the problem. Data & evaluation The data consists of articles scanned from Greek print media in May-September 2013....

October 7, 2014 · Yanir Seroussi

How to (almost) win Kaggle competitions

Last week, I gave a talk at the Data Science Sydney Meetup group about some of the lessons I learned through almost winning five Kaggle competitions. The core of the talk was ten tips, which I think are worth putting in a post (the original slides are here). Some of these tips were covered in my beginner tips post from a few months ago. Similar advice was also recently published on the Kaggle blog – it’s great to see that my tips are in line with the thoughts of other prolific kagglers....

August 24, 2014 · Yanir Seroussi

Kaggle competition tips and summaries

Over the years, I’ve participated in a few Kaggle competitions and wrote a bit about my experiences. This page contains pointers to all my posts, and will be updated if/when I participate in more competitions. General advice posts 10 Steps to Success in Kaggle Data Science Competitions (guest post on KDNuggets) How to (almost) win Kaggle competitions Kaggle beginner tips Solution posts Greek Media Monitoring Multilabel Classification [6th/120] – multi-label classification of pre-tokenised texts Personalised Web Search Challenge [9th/194] – reranking web search results in a personalised manner Blue Book for Bulldozers [9th/476] – forecasting auction sale price of bulldozers ICFHR 2012 – Arabic Writer Identification Competition [3rd/42] – classifying handwritten texts by the identity of the writer (Kaggle blog post) EMC Data Science Global Hackathon (Air Quality Prediction) [6th/110] – forecasting levels of air pollutants (Kaggle forum post)

April 5, 2014 · Yanir Seroussi