SlideShare ist ein Scribd-Unternehmen logo
1 von 6
Downloaden Sie, um offline zu lesen
Discuss the Effectiveness and the Ethics of the Application of Data
Analytics to Football Statistics
By George Stevens
To the majority of people, statistics in football are about as simple as it gets: number of
shots, shots on target, possession, corners, red and yellow cards etc. Many people would
say, in a game where anything can happen and luck can play a big part, that these statistics
don’t show anything and even commentators and pundits regularly disregard the importance
of statistics whenever an anomalous result happens. Considering the basic nature of these
figures, those who don’t trust them may have a point but if you look at the considerable
investment made by top level clubs into football analytics, then surely statistics are important
for more than just a summary or a soundbite.
One application of football statistics would be to use them to create a prediction model.
Whether you’re just a typical fan trying to get one over on your friends by predicting results,
or someone who wants to make serious money in the football betting industry, the concept of
a fool proof prediction model is a very attractive idea. If someone were able to create such a
model, the possibilities would be endless for them but in reality this is a fanciful idea and not
something that has been created. It is however easier to find a model that fares slightly
better than if just using random uneducated guesses. Given that every football match has
three possible outcomes, a home win, an away win and a draw, it is not difficult to deduce
that the possibility of un-intuitively guessing the outcome of a match is one third, therefore a
model that’s able to get more than a third of results correct could be considered successful.
Some people would consider looking at what the bookmakers predict as the most likely
outcome of a match as a reasonable way of making a prediction. This can be backed up the
by the data in the Mirror article “Backing the favourite: But in the Premier League do they
always win?” [1]. According to this article, the bookmaker’s favourite for Premier League
matches wins 54.7% of the time, a much better success rate than just guessing would
achieve. On the face of it, this does seem like a fairly high success rate, however it would
also be logical to suggest that a football fan with a good understanding of the Premier
League would also be able to guess results correctly more often than someone just
randomly guessing. We also need to consider that bookmaker’s odds aren’t there to help us.
Bookmakers will obviously need to make some sort of a profit so the odds that they give for
matches will have to reflect that. If there was a betting company that had a very successful
prediction model would they always base their odds on that? I think if they had a model that
predicted results correctly 80% of the time for example, then basing their odds on that would
have a negative effect on the amount of money they get. As soon as someone clocks on to
the fact that the result that the bookmakers say is most likely is almost always the correct
result, customers would put down a significant amount of money on the bookmakers
favourite, knowing that they will win almost all the time.
Another method of using statistics to predict the outcome of a match that has become
increasingly popular is comparing the expected goals that the two teams involved have. The
expected goals, a statistic created by the football statistician and blogger Michael Caley [2] is
the probability, based on historical averages, of a particular shot going in. This takes into
many factors to determine the probability of a player or team scoring from a chance. These
include:
• Shot location e.g. angle, distance
• Shot type e.g. header, foot, free kick
• Assist type e.g. cross, through-ball, rebound, pass
• Speed of attack e.g. counter, possession, ground covered quickly or slowly, long ball
• Dribble from attacking player, beat the keeper or defender
• Set play or open
This statistic can then be manipulated to calculate the expected goals scored, and the
expected goals conceded for a particular team. The graph below from 11tegen11.net [3],
based on OPTA data, shows how different types of shot statistics correlate with future
performance over the course of the season.
From this we can see that from all the shot statistics, the expected goals ratio is the best
predictor for future performance.
However, this isn’t a perfect formula, there are anomalies. Two teams could be playing each
other, one with a high expected goals, and another with a low expected goals, with the team
that has a lower expected goals winning the game. This does happen but people do tend to
be shocked when anomalous results come up. Take for example when the best team in the
world at the time, Barcelona, lost 2-1 to a mediocre Celtic team.
Even though the UEFA stats shown above [4] suggest Barcelona should have very
comfortably beat Celtic, they didn’t but the fact that the result was such a surprise provides
support for the expected goals model as a good predictor for matches’ outcomes. This
model does also miss out some vital pieces of information, such that the proximity of the
defender, the player taking the shot (when looking at a team rather than an individual) and
good chances that aren’t shots on goal.
Overall, for a model in its early stages it appears to generally be a good model for predicting
the outcomes of matches, player and team performances, and helps bookmakers remain the
devil.
The use of expected goals isn’t limited to making predictions though. Looking at expected
goals statistics can be a very good way of measuring the performance of either a team or a
player. A team or player regularly outscoring their expected goals could be seen as over
performing based on the shots on goal that they are getting and vice versa. There are
several benefits of knowing whether or not someone is under or over performing with
regards to their expected goals. Say for example you have a player that has scored six goals
in their last ten games. This is a more than respectable tally but just looking at the goal to
game ratio does not take into account how many shots they take or how easy the chances
they scored from were, so you can’t really deduce whether or not they are doing well, or if
they are just scoring all the really chances that they have had over that time. If we introduced
expected goals into this scenario, we would have a much better idea of whether this was a
player performing well or if they were just performing as well as any average player would
given the same chances.
An example that demonstrates how expected goals can be used to judge a player’s
performance would be to look at Gareth Bale in the 2012/2013 Premier League season. In
this season many said that if it weren’t Gareth Bale, Tottenham would have finished much
lower than they did. His performance in that season was so good that it earnt him the PFA
player of the year award and a transfer to Real Madrid, breaking the world record transfer
fee. Some people said that while he was good, he may not have been worth £75million but if
we take a closer look at his goal scoring performance in that season, we can see that he
was phenomenal. From the 166 chances that Bale had in that season, he scored twenty
goals, a very good total especially for a winger, but when you consider the quality of these
chances by looking at the expected goals, you can see that his performance was more
outstanding than meets the eye. The following graph from the blog post “Running a Simple
Simulation with Excel” by Mark Taylor [5], shows the percentage likelihood of achieving
different goal totals based on the expected goals from Bales chances.
As the graph shows, the chance of scoring twenty goals from those chances is very low, so
from this we can deduce that Bale was significantly over performing and therefore regularly
scoring difficult chances, which emphasizes his quality that season.
Another way in which knowing whether a player is over or underperforming would be useful
is in a clubs scouting network. Scouts would be able to advise clubs about potential transfers
and give detailed information regarding their goal scoring performance which could lead to
clubs securing bargains in the transfer market and avoid overpaying for players who don’t
necessarily have a good goal scoring performance.
This isn’t the perfect statistic though, there is limitations to using expected goals to analyse
performance. As well as the limitations mentioned earlier, this statistic can only be used to
analyse attacking performance, for example you could have a defensive player who is
underperforming according to this method, however if goal scoring isn’t really their priority
anyway, are they really underperforming? Furthermore, to properly analyse player
performance using this method, it would have to be over a fairly long period of time because
it’s not unusual for a player to score lucky goals that you wouldn’t expect them to score and
these would have an influence on the perceived performance over short time period.
Obviously this isn’t the only statistic clubs will use, in fact clubs use such a huge range
statistic that the majority of Premier League clubs have a partnership with the football data
analytics company ProZone and in 2014 Arsenal invested over £2million in buying a similar
company, StatDNA, for their analytics [6]. At first some managers were unconvinced by the
use of data analytics. According to the Wired.com article “The winning formula: data
analytics has become the latest tool keeping football teams one step ahead” by Joao
Medeiros [7], in 2005, Southampton hired ProZone consultant, Simon Wilson to help provide
to give advice to the team. The manager at the time Harry Redknapp was not a fan. On one
occasion, Wilson gave a briefing to the team and the manager to try and help them before a
match, which subsequently, Southampton lost 3-2. On the team bus, Redknapp turned to
Wilson and said, "I'll tell you what, next week, why don't we get your computer to play
against their computer and see who wins?".
The same article [7] explains how there was several managers who did embrace data
analytics. One of the first major cases of this was with Sam Allardyce in 2000. At the time he
was managing a Bolton Wanderers, who were in the Championship (second division). Whilst
ProZone weren’t sure whether a team of Bolton’s stature would be able to afford them, they
thought that if it worked well for a club in Bolton’s position, it would give even better publicity
than if it were to work for a club like Arsenal or Manchester United that was already
successful. That season, Bolton were promoted to the Premier League, where Sam
Allardyce, based his game plan around statistics provided, for example, he insisted on his
players taking in-swinging corner kicks and crosses as a larger proportion of goals are
scored this way but also made sure to practice defending them in training. This led to Bolton
having a consistent record of top eight premier league finishes between 2003 and 2007 and
also led to them qualifying for the UEFA cup (now Europa League) for the first time in their
history.
Nowadays, the use of analytics is hugely important for football clubs but it doesn’t come
cheap, which raises the question: Is it fair that this in depth data is only available for clubs
that can afford it? On the one hand, you can’t expect companies such as ProZone and
StatDNA to give away data they’ve collected for free and you could argue that clubs are just
using their budget for the good of the team in the same way as buying players. On the other
hand, this could further widen the gap between big clubs and small clubs. It could become a
vicious circle for the less rich clubs. Clubs that can’t afford the best data analytics could be at
a disadvantage, and if they struggle to progress because of this, they probably won’t have a
significant enough increase in revenue to be able to buy afford the analytics so will continue
to struggle.
All in all, while there is still some sceptics when it comes to the use of statistics in football,
the growing importance cannot be denied. If clubs like Arsenal are prepared to pay over
£2million to acquire data analytics companies, and the other Premier League clubs using
companies like ProZone, then it doesn’t look like it’s going to die down either. Whilst there
are issues related to the fairness of richer clubs being able to afford the best data analytics,
these clubs are just using the money they have available to them, and there are many bigger
issues related to money in football than the use of data analytics. Overall, I think that the
growing interest from fans coupled with the increasing use of big data in football, means that
use of statistics is only going to become more popular and as a mathematics student who is
passionate about football, I’m more than ok about it.
References
1. Backing the favourite: But in the Premier League do they always win?
By David Dubas-Fisher
http://www.mirror.co.uk/sport/football/news/backing-favourite-premier-league-
always-4475089
Accessed: 04/04/2016
2. Let’s Talk about Expected Goals, By Michael Caley
http://cartilagefreecaptain.sbnation.com/2015/4/10/8381071/football-statistics-
expected-goals-michael-caley-deadspin
Accessed: 10/03/2016
3. The Best Predictor for Future Performance is Expected Goals
By “11tegen11”
http://11tegen11.net/2015/01/05/the-best-predictor-for-future-performance-is-
expected-goals/
Accessed: 05/05/2016
4. Celtic vs Barcelona 07/11/2012
http://www.uefa.com/uefachampionsleague/season=2013/matches/round=20
00347/match=2009541/postmatch/statistics/
Accessed: 06/04/2016
5. Running a simple simulation in Excel
By Mark Taylor
http://thepowerofgoals.blogspot.co.uk/2016/01/running-simple-simulation-
with-excel.html by
Accessed: 06/04/2016
6. Arsenal’s ‘secret’ signing: Club buys £2m revolutionary data company
By David Hytner
http://www.theguardian.com/football/2014/oct/17/arsenal-place-trust-arsene-
wenger-army-statdna-data-analysts
Accessed: 04/04/2016
7. The winning formula: data analytics has become the latest tool keeping
football teams one step ahead
By Joao Medeiros
http://www.wired.co.uk/magazine/archive/2014/01/features/the-winning-
formula
Accessed 03/04/2016

Weitere ähnliche Inhalte

Andere mochten auch

Chapter 2 Laplace Transform
Chapter 2 Laplace TransformChapter 2 Laplace Transform
Chapter 2 Laplace Transform
Zakiah Saad
 

Andere mochten auch (12)

PSORÍASE MANUAL TIPOGRÁFICO
PSORÍASE MANUAL TIPOGRÁFICOPSORÍASE MANUAL TIPOGRÁFICO
PSORÍASE MANUAL TIPOGRÁFICO
 
Front cover q1
Front cover q1Front cover q1
Front cover q1
 
Excerpt
ExcerptExcerpt
Excerpt
 
Guia nacional-de-vigilancia-e-inteligencia-estrategica
Guia nacional-de-vigilancia-e-inteligencia-estrategicaGuia nacional-de-vigilancia-e-inteligencia-estrategica
Guia nacional-de-vigilancia-e-inteligencia-estrategica
 
Tips for beauty
Tips for beautyTips for beauty
Tips for beauty
 
Microsoft Office Spellcheck as an Assistive Technology
Microsoft Office Spellcheck as an Assistive TechnologyMicrosoft Office Spellcheck as an Assistive Technology
Microsoft Office Spellcheck as an Assistive Technology
 
Introducción a-la-contabilidad
Introducción a-la-contabilidadIntroducción a-la-contabilidad
Introducción a-la-contabilidad
 
Lathe machine
Lathe machineLathe machine
Lathe machine
 
Razones financieras
Razones financierasRazones financieras
Razones financieras
 
The Skin I'm In
The Skin I'm InThe Skin I'm In
The Skin I'm In
 
Ejercicio 2 de clase contabilidad
Ejercicio 2 de clase contabilidadEjercicio 2 de clase contabilidad
Ejercicio 2 de clase contabilidad
 
Chapter 2 Laplace Transform
Chapter 2 Laplace TransformChapter 2 Laplace Transform
Chapter 2 Laplace Transform
 

(805344378) Discuss the Effectiveness and the Ethics of the Application of Data

  • 1. Discuss the Effectiveness and the Ethics of the Application of Data Analytics to Football Statistics By George Stevens To the majority of people, statistics in football are about as simple as it gets: number of shots, shots on target, possession, corners, red and yellow cards etc. Many people would say, in a game where anything can happen and luck can play a big part, that these statistics don’t show anything and even commentators and pundits regularly disregard the importance of statistics whenever an anomalous result happens. Considering the basic nature of these figures, those who don’t trust them may have a point but if you look at the considerable investment made by top level clubs into football analytics, then surely statistics are important for more than just a summary or a soundbite. One application of football statistics would be to use them to create a prediction model. Whether you’re just a typical fan trying to get one over on your friends by predicting results, or someone who wants to make serious money in the football betting industry, the concept of a fool proof prediction model is a very attractive idea. If someone were able to create such a model, the possibilities would be endless for them but in reality this is a fanciful idea and not something that has been created. It is however easier to find a model that fares slightly better than if just using random uneducated guesses. Given that every football match has three possible outcomes, a home win, an away win and a draw, it is not difficult to deduce that the possibility of un-intuitively guessing the outcome of a match is one third, therefore a model that’s able to get more than a third of results correct could be considered successful. Some people would consider looking at what the bookmakers predict as the most likely outcome of a match as a reasonable way of making a prediction. This can be backed up the by the data in the Mirror article “Backing the favourite: But in the Premier League do they always win?” [1]. According to this article, the bookmaker’s favourite for Premier League matches wins 54.7% of the time, a much better success rate than just guessing would achieve. On the face of it, this does seem like a fairly high success rate, however it would also be logical to suggest that a football fan with a good understanding of the Premier League would also be able to guess results correctly more often than someone just randomly guessing. We also need to consider that bookmaker’s odds aren’t there to help us. Bookmakers will obviously need to make some sort of a profit so the odds that they give for matches will have to reflect that. If there was a betting company that had a very successful prediction model would they always base their odds on that? I think if they had a model that predicted results correctly 80% of the time for example, then basing their odds on that would have a negative effect on the amount of money they get. As soon as someone clocks on to the fact that the result that the bookmakers say is most likely is almost always the correct result, customers would put down a significant amount of money on the bookmakers favourite, knowing that they will win almost all the time. Another method of using statistics to predict the outcome of a match that has become increasingly popular is comparing the expected goals that the two teams involved have. The expected goals, a statistic created by the football statistician and blogger Michael Caley [2] is the probability, based on historical averages, of a particular shot going in. This takes into
  • 2. many factors to determine the probability of a player or team scoring from a chance. These include: • Shot location e.g. angle, distance • Shot type e.g. header, foot, free kick • Assist type e.g. cross, through-ball, rebound, pass • Speed of attack e.g. counter, possession, ground covered quickly or slowly, long ball • Dribble from attacking player, beat the keeper or defender • Set play or open This statistic can then be manipulated to calculate the expected goals scored, and the expected goals conceded for a particular team. The graph below from 11tegen11.net [3], based on OPTA data, shows how different types of shot statistics correlate with future performance over the course of the season. From this we can see that from all the shot statistics, the expected goals ratio is the best predictor for future performance. However, this isn’t a perfect formula, there are anomalies. Two teams could be playing each other, one with a high expected goals, and another with a low expected goals, with the team that has a lower expected goals winning the game. This does happen but people do tend to be shocked when anomalous results come up. Take for example when the best team in the world at the time, Barcelona, lost 2-1 to a mediocre Celtic team.
  • 3. Even though the UEFA stats shown above [4] suggest Barcelona should have very comfortably beat Celtic, they didn’t but the fact that the result was such a surprise provides support for the expected goals model as a good predictor for matches’ outcomes. This model does also miss out some vital pieces of information, such that the proximity of the defender, the player taking the shot (when looking at a team rather than an individual) and good chances that aren’t shots on goal. Overall, for a model in its early stages it appears to generally be a good model for predicting the outcomes of matches, player and team performances, and helps bookmakers remain the devil. The use of expected goals isn’t limited to making predictions though. Looking at expected goals statistics can be a very good way of measuring the performance of either a team or a player. A team or player regularly outscoring their expected goals could be seen as over performing based on the shots on goal that they are getting and vice versa. There are several benefits of knowing whether or not someone is under or over performing with regards to their expected goals. Say for example you have a player that has scored six goals in their last ten games. This is a more than respectable tally but just looking at the goal to game ratio does not take into account how many shots they take or how easy the chances they scored from were, so you can’t really deduce whether or not they are doing well, or if they are just scoring all the really chances that they have had over that time. If we introduced expected goals into this scenario, we would have a much better idea of whether this was a player performing well or if they were just performing as well as any average player would given the same chances. An example that demonstrates how expected goals can be used to judge a player’s performance would be to look at Gareth Bale in the 2012/2013 Premier League season. In this season many said that if it weren’t Gareth Bale, Tottenham would have finished much lower than they did. His performance in that season was so good that it earnt him the PFA player of the year award and a transfer to Real Madrid, breaking the world record transfer fee. Some people said that while he was good, he may not have been worth £75million but if we take a closer look at his goal scoring performance in that season, we can see that he was phenomenal. From the 166 chances that Bale had in that season, he scored twenty goals, a very good total especially for a winger, but when you consider the quality of these chances by looking at the expected goals, you can see that his performance was more outstanding than meets the eye. The following graph from the blog post “Running a Simple Simulation with Excel” by Mark Taylor [5], shows the percentage likelihood of achieving different goal totals based on the expected goals from Bales chances.
  • 4. As the graph shows, the chance of scoring twenty goals from those chances is very low, so from this we can deduce that Bale was significantly over performing and therefore regularly scoring difficult chances, which emphasizes his quality that season. Another way in which knowing whether a player is over or underperforming would be useful is in a clubs scouting network. Scouts would be able to advise clubs about potential transfers and give detailed information regarding their goal scoring performance which could lead to clubs securing bargains in the transfer market and avoid overpaying for players who don’t necessarily have a good goal scoring performance. This isn’t the perfect statistic though, there is limitations to using expected goals to analyse performance. As well as the limitations mentioned earlier, this statistic can only be used to analyse attacking performance, for example you could have a defensive player who is underperforming according to this method, however if goal scoring isn’t really their priority anyway, are they really underperforming? Furthermore, to properly analyse player performance using this method, it would have to be over a fairly long period of time because it’s not unusual for a player to score lucky goals that you wouldn’t expect them to score and these would have an influence on the perceived performance over short time period. Obviously this isn’t the only statistic clubs will use, in fact clubs use such a huge range statistic that the majority of Premier League clubs have a partnership with the football data analytics company ProZone and in 2014 Arsenal invested over £2million in buying a similar company, StatDNA, for their analytics [6]. At first some managers were unconvinced by the use of data analytics. According to the Wired.com article “The winning formula: data analytics has become the latest tool keeping football teams one step ahead” by Joao Medeiros [7], in 2005, Southampton hired ProZone consultant, Simon Wilson to help provide to give advice to the team. The manager at the time Harry Redknapp was not a fan. On one occasion, Wilson gave a briefing to the team and the manager to try and help them before a match, which subsequently, Southampton lost 3-2. On the team bus, Redknapp turned to Wilson and said, "I'll tell you what, next week, why don't we get your computer to play against their computer and see who wins?". The same article [7] explains how there was several managers who did embrace data analytics. One of the first major cases of this was with Sam Allardyce in 2000. At the time he was managing a Bolton Wanderers, who were in the Championship (second division). Whilst ProZone weren’t sure whether a team of Bolton’s stature would be able to afford them, they thought that if it worked well for a club in Bolton’s position, it would give even better publicity than if it were to work for a club like Arsenal or Manchester United that was already successful. That season, Bolton were promoted to the Premier League, where Sam Allardyce, based his game plan around statistics provided, for example, he insisted on his players taking in-swinging corner kicks and crosses as a larger proportion of goals are scored this way but also made sure to practice defending them in training. This led to Bolton having a consistent record of top eight premier league finishes between 2003 and 2007 and also led to them qualifying for the UEFA cup (now Europa League) for the first time in their history. Nowadays, the use of analytics is hugely important for football clubs but it doesn’t come cheap, which raises the question: Is it fair that this in depth data is only available for clubs that can afford it? On the one hand, you can’t expect companies such as ProZone and StatDNA to give away data they’ve collected for free and you could argue that clubs are just using their budget for the good of the team in the same way as buying players. On the other hand, this could further widen the gap between big clubs and small clubs. It could become a vicious circle for the less rich clubs. Clubs that can’t afford the best data analytics could be at
  • 5. a disadvantage, and if they struggle to progress because of this, they probably won’t have a significant enough increase in revenue to be able to buy afford the analytics so will continue to struggle. All in all, while there is still some sceptics when it comes to the use of statistics in football, the growing importance cannot be denied. If clubs like Arsenal are prepared to pay over £2million to acquire data analytics companies, and the other Premier League clubs using companies like ProZone, then it doesn’t look like it’s going to die down either. Whilst there are issues related to the fairness of richer clubs being able to afford the best data analytics, these clubs are just using the money they have available to them, and there are many bigger issues related to money in football than the use of data analytics. Overall, I think that the growing interest from fans coupled with the increasing use of big data in football, means that use of statistics is only going to become more popular and as a mathematics student who is passionate about football, I’m more than ok about it.
  • 6. References 1. Backing the favourite: But in the Premier League do they always win? By David Dubas-Fisher http://www.mirror.co.uk/sport/football/news/backing-favourite-premier-league- always-4475089 Accessed: 04/04/2016 2. Let’s Talk about Expected Goals, By Michael Caley http://cartilagefreecaptain.sbnation.com/2015/4/10/8381071/football-statistics- expected-goals-michael-caley-deadspin Accessed: 10/03/2016 3. The Best Predictor for Future Performance is Expected Goals By “11tegen11” http://11tegen11.net/2015/01/05/the-best-predictor-for-future-performance-is- expected-goals/ Accessed: 05/05/2016 4. Celtic vs Barcelona 07/11/2012 http://www.uefa.com/uefachampionsleague/season=2013/matches/round=20 00347/match=2009541/postmatch/statistics/ Accessed: 06/04/2016 5. Running a simple simulation in Excel By Mark Taylor http://thepowerofgoals.blogspot.co.uk/2016/01/running-simple-simulation- with-excel.html by Accessed: 06/04/2016 6. Arsenal’s ‘secret’ signing: Club buys £2m revolutionary data company By David Hytner http://www.theguardian.com/football/2014/oct/17/arsenal-place-trust-arsene- wenger-army-statdna-data-analysts Accessed: 04/04/2016 7. The winning formula: data analytics has become the latest tool keeping football teams one step ahead By Joao Medeiros http://www.wired.co.uk/magazine/archive/2014/01/features/the-winning- formula Accessed 03/04/2016