In sports analytics there is quite often discussion about which sports are easier to predict. The truth is that there is not a clear answer. Since this area overlaps with sports betting, there are many companies that are using proprietary predictive models, so we don’t really know what they’re doing. Also, if someone builds a good predictive model, there would be a stronger incentive to use it for profit, rather than publish it in an academic journal.

In this series of articles, published over 2 parts, I will discuss some of my findings regarding some research I did on the feasibility of predicting outcomes regarding the most popular sports. In this part I will discuss about football and basketball.

Football (soccer) predictions

football predictions

Football is by far the most popular sport in the world. Therefore, it only makes sense that there is a great deal of interest in building predictive and statistical models for it. In terms of academic papers a search on google scholar produces more than 5000 results. I have removed the terms “american” (to disambiguate NFL), strains (to take out injury-related papers) and video (to take out video analysis). Football is the most popular sport on the planet so it makes sense that such a large body of academic research would exist around predicting football games. I’ve also done some work in this area myself: Using Twitter to predict Premier League outcomes.

In any case, predicting football games for profit is 100% feasible. These are some companies that are doing it:

I have also been in many private conversations with people who claimed they were making a good profit out of football betting. One individual had claimed he had made 100% return on investment, even though I never got to verify his claims.


basketball predictions

Basketball is another very popular sport. A google scholar search provides around 2000 results regarding predictive models in basketball.

Kaggle hosts an annual competition for predicting the NCAA tournament. The latest competition can be found here.

A paper in the workshop for sports analytics at ECML 2013 also worked on the same problem: Predicting NCAAB match outcomes using ML techniques – some results and lessons learned. Their results demonstrated that there is a ceiling at around 75% percent accuracy that can’t be surpassed using the available data. Of course, it is unclear how much an algorithm would improve with more detailed stats. Also, the approach the authors used did not implement any ensemble methods which have become very popular in machine learning competitions.

The authors’ attributed that the ceiling is partly due to the inherent chance that plays a role in the final outcome. This paper further pursued this concept: Exploring chance in NCAA basketball.

So, what the verdict? Yes, it is possible to predict basketball games. Whether it is possible to make money on that, I am not sure, since I didn’t find any papers talking about this or any companies claiming to do make money by beating the odds. Also, it is important to understand that there are different tournaments (NCAA, NBA, Euroleague), so a model that is successful on one case might not work on another.

I also tried to find good papers on NBA predictions, but I couldn’t find anything worth reporting. I believe that because sports betting is illegal in the USA, there is limited interest in building predictive models for NBA.