The “insights” of SAC Smart Discovery unraveled!

Wouldn’t it be great, having a tool ready to conduct our hard searches for correlations and coming up with the brightest insights available? Well SAP states that the Smart Discovery add on of SAP Analytics Cloud (SAC) is such an innovation. But is it really like that? Does it really come up with the analyses outcomes nobody would come up with? And are those analyses actually of a high quality? I was really happy to put that to the test. I have been investing a lot of my time in analyzing aspects in R, and this is what I will use to “unravel the Smart Discovery” part (well, to my abilities and opinion this is).

Some background:

The analysis I set up is to see what actually influences the Value of a soccer player. The main purpose of actually starting the analysis, was the urge to predict a player’s value (so a player’s worth). To do this, I used the FIFA 2019 dataset, where I predicted values based on a created trained part of the set consisting of 60% of the actual set. Though predicting the correct value is/was way more difficult than I thought and only in 1286 of a total of 7372 predictions were correct, which is only a small 18% of all cases (of course this is kind of logical, as there are so many factors influencing a player’s value, which are not included in this dataset and this one is fairly limited…).

Nevertheless, it made me curious and I wanted to find out WHAT influenced a players value and find the relations. An analysis to which I hoped SAC Smart Discovery would provide me with some additional outcomes than the ones I already created over time…

The Smart Discovery Outcome:

In order to run the Smart Discovery of SAP Analytics Cloud (SAC), you select a measure or dimension you want to know “more about” as SAP states it. Using a player’s Value as the “I want to know more about” setting, there is one 1 page (in SAP’s “selling story” it is actually 4…) created and it looks like this of the image below. In orange my remarks are already included.

So there are actually some really interesting and fair aspects in the Smart Discovery outcome above:

  • Summary information: the total value doesn’t say much, though the max value is 118.500.000,00, which is actually Neymar Jr because of his transfer to Paris Saint-German and the min value (0) shows the wide range of values.
  • Upper right chart: Of the 18.206 values, 16.677 are have a player’s value of 0, which means they can be picked up for free. Could be interesting if this is what you are aiming for.
  • Bottom left chart: a correlation (association in SAP’s words) between the total wages paid versus the “most valuable clubs”. There is a fair relation as more valuable players are paid a higher salary (=wage).

But as my short analysis in the above picture already shows, there are 2 chart that really raised my eyebrows: “Value by Preferred Foot” and “Association between Jersey Number and Value by Club”. Let’s dive into both of them and see if those chart actually present the correct data…

Is “Value by Preferred Foot” actually presenting the correct information?

Apparently it depends on your preferred foot (so whether you are a lefty of a righty) whether you have a high value as a player. I would think “both” would be the most valuable, though SAC states that a player is more valuable when he is right footed. But is this actually true? Or is this chart nothing more than a misleading visual?

The chart insinuates that when you are a “righty” your value as a player is way higher. When just looking at the chart and using a logical way of looking at it, I concluded that the value presented is the SUM of values. Not the correct way of presenting an analysis of Value by Preferred Foot if you ask me.  It is generally known: there are more right footed players than there are left footed ones. Let’s use the analysis in RStudio using R (which is my back up for this whole document) to show whether this is correct:

  • Left: 4201
  • Right: 13894
  • Unassigned: 111

So in this case it would be logical to calculate an average of the player’s value based on their preferred foot. And this results into the following:

Preferred Foot Average Value Difference L vs R
Left 2.591.279 + 220.213
Right 2.371.066  
Unassigned 153.243  

So actually it can be said that on average you are more valuable as a left footed player, as the numbers above don’t lie…

Note that the chart in SAC emphasizes on the fact that “Position ST” influences the chart the most. Just for your information: ST stand for striker. So let’s see whether this is actually merely a logical aspect or if Smart Discovery actually came up with an interesting point.

In total there are 27 positions, all varying from goalkeeper to striker covering all the possible positions on the field. Of all those positions, the ST position has the most players included (to know 2145). And when looking at the top 10 most valuable players on the position ST, the names of those players make it even more clear why they have such an influence on the chart:

Name Value Preferred Foot Position
H. Kane 83.500.000 Right ST
Cristiano Ronaldo 77.000.000 Right ST
R. Lewandowski 77.000.000 Right ST
S. Aguero 64.500.000 Right ST
M. Icardi 64.500.000 Right ST
R. Lukaku 62.500.000 Left ST
G. Bale 60.000.000 Left ST
C. Immobile 52.000.000 Right ST
A. Lacazette 45.000.000 Right ST
M. Depay 42.000.000 Right ST

So SAC insinuates that ST has a high influence on the chart, and they are not incorrect. The “only” part they missed is that there are more Strikers than for example Right Attacking Midfielders (RAM). This is why they have the highest influence on the Value, including also the valuable players included in the list of strikers. The real position with the highest value is not the Striker as can be seen in the chart below created in RStudio using ggplot, but it’s the LF position (Left Forward) including only 15 players, among which are players like Hazard, Dybala and Iniesta.

Due to the low number of players, it is not seen as a large influence on the value, but looking at the average value, this is a position that needs to be taken into account. And also in this case: the total list of LF players contains only 3 left footed players…

Though it must be said: SAP’s chart is correct. The total SUM of value for right footed players is way bigger, but this is obvious as there are more right footed players. Also the influence of Strikers is the largest, but this one is also logical, as this is the position that includes most players. But the question is: does the chart bring you any insights? Or is it the analysis we just did?

Let’s dive into the more complex chart to analyze: the correlation between jersey number and value…

Is the association really there (and correct) between Jersey Number, Club and Value?

The analysis on a so-called “relation” (or association as SAP refers to it) between Jersey Number, Club and the value is a little bit trickier. When I first saw this chart I was like: “hell no, no way you are more valuable when playing with number 10”.

First thing that really surprises me is that the Smart Discovery actually summed up the Jersey Numbers. This means when a Club had all players playing with Jersey Numbers over 30 instead of number 2, they will rise on the x-axis. So we are not even looking at an association between Jersey Number and Value if I might say so. But not taking this first strange aspect into account, let’s conduct an analysis to see whether there is some sort of truth in this chart. With that I merely take the title of the chart into account, than the data point. So I will be looking for the association between Jersey Number and Value by Club.

Apparently Real Madrid is the most valuable club as it is the highest dot on the y-axis. To check this, I lined the top 10 most valuable clubs in RStudio using the ggplot R package, generating the following chart:

A club’s value is calculated based on the sum of all player’s values. So the chart in R is equal to the chart on the y-axis of the SAC-chart. The higher a club is on the y-axis, the more valuable it is.

In order to know if a player is more valuable when they wear a certain number, it is good to look at the top 4 players (Neymar Jr, De Bruyne, Messi and Hazard) who are shown in the chart below. Those highlighted players are the players with a value over 90.000.000, and with that the 4 most valuable players.

Alright, knowing the names of the top 4 players, we can have a look at the Jersey Numbers.

Name Jersey Nr Value Overall Club
Neymar Jr 10 118.500.000 92 Paris Saint Germain
L. Messi 10 110.500.000 94 FC Barcelona
K. De Bruyne 7 102.000.000 91 Manchester City
E. Hazard 10   93.000.000 91 Chelsea

When you see the list like this, it seems that when you play with Jersey Number 10, you are very valuable. Or well: the most valuable players appear to be playing with mostly number 10. Depends on the way you formulate this aspect to see the relation… But does that mean that all players who play with number 10 are as valuable? No of course not. This list of the bottom 10 players with Jersey Number 10 confirms it:

Name Jersey Number Value Overall Club
M. Etxeberria 10 0 74 No Club
I. Kovacs 10 0 73 No Club
S. Nakamura 10 0 72 Jubilo Iwata
J. Campos 10 0 71 No Club
B. Nivet 10 0 71 ESTAC Troyes
A. De Jong 10 0 59 No Club
B. Singh 10 0 58 No Club
R. Cretaro 10 40.000 57 Sligo Rovers
Ryan Yong Gi 10 50.000 58 Vegalta Sendai
K. Brennan 10 60.000 60 St. Patrick’s Athletic

From this list, the names do not ring a bell (at least not to me), but they all do play with Jersey Number 10. So it is obvious that the Jersey Number does not influence your value as a player. Though playing for an important club from the Top 10  makes it possible that the Jersey Number influences the value. Or is it the other way around? Does the player with the high value choose his Jersey Number? So with that it would be merely the player’s personal influence, less than the Jersey Number’s influence that actually influences the player’s value. Playing with Jersey Number 10 for a club like VVV Venlo, doesn’t equalize a value of 118.500.000 like Neymar Jr has.

And creating a list based on the calculated correlations using the Spearman method (the dataset has large outliers, so hence the Spearman method), shows there is close to no correlation (SAP’s association) between a player’s value and his Jersey Number:

Overall 0.9163082
Wage 0.7839799
Reactions 0.7507812
Potential 0.7455360
Ball Control 0.7375126
Composure 0.7012314
 
Jersey Number – 0.1779670

With that are Jersey Numbers often also linked to the position on the field, so suggesting there is a relation between Jersey Number and value is rather strange. Even though lately in the football industry, as players get more and more to do with branding and merchandise, they prefer to keep their Jersey Number the same, even when they switch clubs. So the personal influence of a player on his Jersey Number is only growing, BUT only to a point where the CLUB actually thinks the player is valuable enough to obtain his preferred Jersey Number.

Based on the chart provided by SAC Smart Discovery, even more analyses are possible and there are way more things that I could analyze to show the “accurateness” of the chart provided. Though with this start, I think I made a fair point at not always believing what you see at first sight…

Conclusion

To provide yourself with some basic insights the SAC Smart Discovery can be helpful, though I would recommend not following it without a second thought. Smart Discovery is not a human and whatever data you drag into this part of the tool, the tool behaves on it just as it normally does and what it actually does in the background remains a black box. A measure is a measure, a dimension a dimension and that’s it. Make sure you know your data before just “accepting” what SAC Smart Discovery brings back to you. As you can see, it is not always what it looks like!

With that I need to point out that the charts created in SAC are not incorrect, though they do not add a lot of “added value” to the analysis. The charts are basic, though the titles can be very misleading (look at the one with the association between Jersey Number and Value…). In order to create the “perfect SAP selling speech Smart Discovery outcome”, data has to be set up in a very specific way (like SAP did for their code jams in order to show the usefulness of the Smart Discovery). Though in many cases the data YOU use differs and with that makes the Smart Discovery less useful as suggested.

But please, when we differ in opinion, I would like to invite you to exchange our thoughts and to get deeper into it together. Also, when you would like to receive the R analysis to back up this document, don’t hesitate to contact me on d.ambaum@jugo.nl.