Stochastic Gradient Boosting: Choosing the Best Number of Iterations

In my summary of the Kaggle bulldozer price forecasting competition, I mentioned that part of my solution was based on stochastic gradient boosting. To reduce runtime, the number of boosting iterations was set by minimising the loss on the out-of-bag (OOB) samples, skipping trees where samples are in-bag. This approach was motivated by a bug in scikit-learn, where the OOB loss estimate was calculated on the in-bag samples, meaning that it always improved (and thus was useless for the purpose of setting the number of iterations)....

December 29, 2014 · Yanir Seroussi

Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)

Messy data, buggy software, but all in all a good learning experience... Early last year, I had some free time on my hands, so I decided to participate in yet another Kaggle competition. Having never done any price forecasting work before, I thought it would be interesting to work on the Blue Book for Bulldozers competition, where the goal was to predict the sale price of auctioned bulldozers. I’ve done alright, finishing 9th out of 476 teams....

November 19, 2014 · Yanir Seroussi