The document discusses sport analytics in soccer and what it aims to achieve. It can be used to rate and predict player and team performance, understand the effect of important player attributes on outcomes, and potentially predict the outcome of games. While some large clubs and competitions are starting to use analytics, it is still not widely used or transparently explained. The document also discusses building models using machine learning and large datasets of player attributes and statistics to rate players, compare to expert ratings, and predict performance and outcomes.
1. MoneyBall- Sport Analytics - Soccer
What’s really Sport Analytics?
Is it the prediction of an outcome of a Game?
Is it the performance prediction of a specific player or a team?
Is it a way to build a (new) strategy for the upcoming competition?
Is it a way to rate a specific player, rank and buy or sell that specific player?
Is it a way to connect the players to the fans and to the brans and sponsors?
Is it a way to evaluate and understand the effect of the social media?
Of course, not all teams using Analytical tools. In addition, clear and transparent
presentation and explanation of the analytical results to coaches, managers and players
are not an easy task. Soccer Analytics becomes more acceptable in some big Clubs
and big competition. However, this is still a small part of puzzle and it all depends on the
Managers and specific clubs and culture. Furthermore, even Analysts do not reveal,
transparent, or candid about what’s under the hood or their real analysis is. In another
word, they don’t reveals their secret recipe. In this project, we have used a very limited
2. number of player’s attributes that are easy and not expensive to gather, to both reverse
engineer the most advance Rating and Performance index and then to propose a more
robust and easy model for future players ratings and performance prediction. The
program runs on Spark and Cloud Environment, and can be used for Terra-Petta scale
of data, from Multiple of years, with thousands of players, with 100s of the attributes.
3.
4.
5.
6. What's Sport-Soccer Analytics?
Player Rating and Performance
Expert Vs. Machine Learning Player Rating and Performance
What are the most important players’ attributes linked to their performance
What criteria expert use when they evaluate Players? Is there a way to reverse
engineering their criteria?
What attributes are important for each specific positions?
Can we use the rating and Players’ attributes to predict the outcome of a game?
Can we aggregate the players’ rating to come up with a team rating?
Is there a way to correlate the team rating to the outcome of the game?
Does Expert rating influenced by individual/team rating or by the outcome of the
game?
Can we predict the outcome of the new game given the past performance of the
players?
7.
8.
9. What's Sport-Soccer Analytics?
Soccer Analytics: Modeling of Soccer prior, during and after the game using Scientific
techniques to match or predict a set of outcomes.
10. Expert Player Rating: Player performance is rated by Expert. These ratings are black-
Box based on the Expert Latent-Knowledge and their experience and can’t be
precisely defined.
Soccer Performance Analytics: Is a tool to help players, coaches and managers to
quantitatively assess the players and team performance and help to improve both
players and team performance and design a set of wining strategies for up coming
game(s).
Soccer Analytics using advance analytics and visualization tools such as Machine
Learning and network analytics becomes more popular and more and more will be
used for performance comparison and prediction of the outcome of the games.
11. How to Predict and model the overall performance and ratings of the players?
• Companies such as OPTA, Prozone, Amisco, and WhoScored are now being able to
collect rich soccer data.
• For Sport-Soccer Analytics a rich data set which contains more than 210 attributes of
players including 198 performance statistics are being used. To calculate the overall
performance and ratings of the players, some or all of the following player’s attributes
are being used. Some of the very advanced Expert Ratings include; Caapello Index,
Castrol Index, and WhoScored.com. These Ratings include Player’s Rating at each
Match or cumulative Ratings.
• For classification-regression and clustering, there are many Machine linear models
that can be used. For classification-regression model, one can liner models (SVMs,
logistic regression, linear regression), naive Bayes, Regression by Discretization using
J48, Additive Regression with Decision Stump, decision trees, ensembles of trees
(Random Forests and Gradient-Boosted Trees), isotonic regression, Multilayer
Perceptron, RBF Network. For Clustering, one can use k-means, clustering using
affinity propagation, Agglomerative Clustering (Ward, Average, and Complete),
Gaussian mixture, power iteration clustering (PIC), latent Dirichlet allocation (LDA).
Furthermore one can used dimensionality reduction such as singular value
decomposition (SVD) and principal component analysis (PCA) to reduce the feature
space.
12.
13. Overall Rating and Performance Index based on Player’s Attributes
• Nationality, Club, League, Age, Height, String Foot, Position (GK, CB, RB, LB, DM,
CM, RM, LM, AM, RW, LW, SS, CF)
14. • Attacking Prowess, Ball Control, Dribbling, Low Pass, Lofted Pass, Finishing
• Place Kicking, Swerve, Header, Defensive Prowess, Ball Winning, Kicking Power,
Speed, Explosive Power, Body Balance, Jump, Stamina, Goalkeeping, Saving, Form,
Injury, Resistance, Weak Foot Use, Weak Foot Accuracy, Trickster, Mazing Run,
Speeding Bullet,, Incisive Run, Long Ball Expert, Early Cross, Long Ranger
• Scissors Feint, Flip Flap, Marseille Turn, Sombrero, Cut Behind & Turn, Scotch Move,
Long Range Drive, Knuckle Shot, Acrobatic Finishing, First-time Shot, One-touch Pass,
Weighted Pass, Pinpoint Crossing, Outside Curler, Low Punt Trajectory, Long Throw,
GK Long Throw, Man Marking, Track Back, Captancy, Super-sub, Fighting Spirit