Football & Data Analysis

Video
Article
Case Study
Podcast
59
minutes

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Jamie Cook joins us for a look at his long-running side project, where he uses data analytics to gain insight into football matches.

See more podcast episodes

Transcript coming soon.

Transcript

Scott: So, Jamie is a data engineer, and we brought him on today to talk about some of the stuff he's been up to with regards to predicting fantasy football and it's, I think hopefully a really nice way to understand how data can really go to something that's pretty cool and accessible to talk about, and hopefully we can move some of the conversation onto how it translates to the business as well. Great to have you here.

Jamie: Thanks for inviting me.

Scott: Not at all. I think before we start it's worth calling out we've got some listeners in the states, so we are talking about soccer today right - english football?

Jamie: Yes.

Scott: Cool, yeah, just before we go down that rabbit hole. So, do you want to give us a quick five-minute intro to your background and kind of, what you've been up to and what you're doing with this project as well?

Jamie: Yeah, definitely. So i'm Jamie, i'm a data engineer. I've been working with data for many years in different sort of forms. From manufacturing data, which is where I really cut my teeth on it, and that's what really introduced me to the world of stats statistical analysis and what you can actually do with data. Through various places that I have worked, in different industries, I always tried to apply the same sort of logic. So I look at data as a source of insights, and a way that you can actually use it to make decision making easier, because you're doing it based on facts rather than opinion. One of the biggest problems is everyone's got an opinion, but if you can back that up with facts, it makes life that bit easier. So I am a massive advocate of data-driven decision making. That's one of the reasons why I started looking at my side project on the football data. I actually used that as a way of learning new technologies as well. So in various places i've worked i've actually looked at the technologies which we should be using, and then i've used my side project to actually prove what can be done. I then brought that into the business to actually say  why don't we try this. When you're starting to look at predictions, because very few companies actually start looking at predictive analytics and actually using data to predict what's going to happen, because most of the time everyone's just looking at what happened. Then they're trying to predict what's going to happen based on what they did in the last quarter. So I wanted to see whether there was some science behind that,  so I created my side project which has now been going for probably 12 years and it's followed me around. Every company i've ever worked in, I use every opportunity there is to talk about it. A lot of people don't like football, or soccer, but it is a case of when they start to see what we're doing with the data, then people start to get interested. So yeah that is a very brief intro.

Scott: That's really cool. Why fancy football to begin with? Was something you could have always been interested in, or was it just the topic that you thought you could apply it to you nicely?

Jamie: It's a sport which I believe you can use the data to actually help you so. I started looking at just normal football. I was just looking at football results to see if there was something which I could do to help me predict football results. So in the early days my big plan was to come up with this tool which I called the predictor, which would basically like give me an edge if I ever wanted to place a bet. So it was a case of I wanted to see how the bookie is coming up with the odds. So I just started looking at the data behind it. But the problem with sports data is always a bias because you're always drawn to certain things. It was in the early days when I suddenly realised that if you don't remove that bias, your data is never going to be good for decision making. So then I started seeing how I could look at the data, and almost look at the data based on the parameters of what I wanted to, rather than looking at two teams. So rather than looking at Liverpool V Man United, I would look at team A versus team B. The thing is with football there's so much data, there's just so much data around it. The squads, time of the match, day of the match, injuries, who the referee is. It just it just jumped out with me as a sport which had lots of data, but not too many people were actually analysing it.

Scott: Yeah that's really cool.

Oli;
There's often an emotional bias I guess with this kind of stuff especially when it comes to football, such an emotional or passionate sport, and especially with the fan base. So I can imagine that using that data can actually like you said taking away that Liverpool Man United factor of it, and actually just looking at it as team A team B and looking at from an analytical point of view as to who's the best team, who comes out stronger, is probably the best way to analyse this kind of data. Do you find that your side projects as you say, or the software that you've created, do you find that it's it can be adaptable to anything other than just football or is it if you built it specifically around just football?

Jamie: There's two parts to it. The first part, the original part which was the predictor and that was based on team. So it's looking at variables so any sport which has A versus B, it can be adapted towards. The second part is purely for fantasy football. So that's when I combine the two sets of data and then I just drill down, because quite often people have asked me you know why have I only done it on football. Over the years i've looked at it from a perspective of okay what about cricket? What about rugby? But the problem is these other sports aren't so rich with their data and if you take rugby fragments say the same teams are going to be successful year in year out. There's there's not too much change. So there wasn't much scope to actually see how I could give myself an advantage if I was playing the game a fancy game or whatever. I've even looked into some of the American sports. When you look at the American sports like the NFL, they're doing a lot more than what i'm doing, and they're doing a lot more than what we actually do over here. I'm looking at pulling in more data streams and then just seeing how the model performs at a team level.

Oli: Where do you get your data streams from then? Is it many different sources, or is it just from one source?

Jamie: I use quite a few different sources. There's one which I can't for the life of me remember the name if it. There's one website that produces CSV files, so I just get those. Then I use the premier league API's. The problem with the premier league API's is they do change them on a quite frequent basis. Because you've not got no control over it you spend the first couple of weeks changing. They change the API names, they change the data that's coming back, they generally strip out data because they are actually commercialising what they're doing, and there's other streams which you have to pay for, the data sources are getting narrower and narrower. There is quite a few different data sources that you can do. There's some which are paid, there's some that are free. It really depends on how far you want to go into it.

Oli: Right. Do you find that there's a difference between the free sources and the paid sources, in terms of  credibility?

Jamie: Yes! Also with some of the paid ones, especially with football where it is a global sport, what will happen is some of the paid ones will give you access to every single league there is.  The free ones will give you access to only one or two leagues from one or two countries, and you're only allowed to do so many calls per day. Even with the dates that I get, there's some fees which I classify as real-time data, but realtime-ish because I run the feeds every 15 minutes. Most of my only run a couple of times a week anyway.

Oli: What's the difference between running real-time data, and doing it say weekly for instance?

Jamie: When I started looking at the the API's that were coming from the premier league, basically what will happen is because they work in a game week, so what will happen is players will get points only when they play, so that's only one a game week. But there's there's activities that happen, especially with fantasy football, where you need to transfer players in and out. So what will happen is in a game week they will have total transferred in, and total transferred out. So in any game you you can see how popular a player. But the problem is that's an accumulative number throughout the whole week. So when I was doing my real time-ish, what I wanted to do was actually see what was happening on a smaller basis to actually see if I can use that to give an advantage. Because one of the things that happens with when players get transferred in and out, that forms their transfer value. So players can go up and down based on a couple of other things but transfers is one of them. So what I would do is, I have a API  I have a job, a python job that runs, goes to the API and brings the data down. I put into a SQL database every 15 minutes. Then I just look at the numbers in that 15 minute window and that gives me a number of what's been happening in that 15 minutes. Then I could just aggregate it up throughout the day, so I can see what players are hot and which players are cold throughout the day. I can determine whether I want to transfer it transfer them in before their prices go up, or wait to see what happens. Because if there's an international friendly for argument's sake, you may transfer someone in to get the best price and then they get injured and then you've lost advantage. So it's basically just trying to use any source of data you can, to give you the slightest advantage.

Oli: Yeah, definitely. With the transfer windows, because from my understanding of when i've played fantasy premier league,  obviously there's a period of a transfer window that you can have before the games and stuff like that. So do you find that the best data to use is is almost immediately after that game on Sunday where it's closed off for the week and then transfers reopen. Or is it um is there a specific time basically when you when this data is like most valuable?

Jamie: At this moment in time, I have not got a definitive answer to that. I've tried different tactics. Literally what I would do is I will have my list of transfers for the next four or five weeks, because you're looking at the fixtures and you're looking at the difficulties in those fixtures, and you're looking for other things that's happening like international break. So sometimes I will do it as soon as the the game week is closed off, then I will transfer with that. The problem you got with early transfers, is if they have got a game whether it's a European game on international, then they could get injured. But the advantage of doing it early is any price increases, doesn't catch you out. At this moment in time,  there's many schools of thought, and this is where twitter is great as well, because some people say transfer early some people say transfer late. I just find that I will transfer when I feel it's right. So I would just look at the data, monitor the data, look at what's happening. If you look at this international break for argument sake, what what's happened is so many players have been injured, and they've got Covid so it's really thrown everything into array.

Oli: Yeah that's quite interesting. How has Covid affected the data then?

Jamie: Now, I don't think Covid's affected the data, apart from the fact of when a player gets it. I think if you look at the first part of our season up to lockdown, so March. Then when they resumed, if you look at the form pre-lockdown and post lockdown, there was definitely a downturn in clean sheets. So defenders and goalkeepers would get less, more goals being scored. Even at the start this year they was getting record amounts of goals. Now, I still think that the the fans play a bigger part than what a lot of people give it credit for. So I think what's happening if you're a big club like if you're playing at Anfield, you've got a crowd behind you. If you're not a top six side and you go there, the crowd's gonna get on your back, so that could have an adverse effect on the players. With no crowd, it's almost like a friendly. So it's almost evened out. Hence why if you look at it this year, everyone is beating everyone. So it's sort of like it's it's played havoc with the data from the perspective of there's other things to take into consideration because there is no crowds, technically it's a case of you can't really say would this game they're definitely going to win.

Oli: I heard you said Anfield then, so I presume you're a Liverpool fan right?

Jamie: I am.

Oli: Good man. So do you think that your data can factor in sort of the emotional response that the crowds can give for instance. I mean, I know obviously it can't to a certain extent, it can't factor in the amount of emotion and drive that  it will give an individual player, but maybe over a period of time when for instance Liverpool play in front of Anfield, and in front of a crowd like the cop, and they they've won periodically at Anfield over, I can't remember how many games now, maybe that data can get factored in. Is that something that that you can take on board?

Jamie: Yeah, there's definitely a home and away, or there was up until Covid. There was definitely home and away split, not so much this season and i believe it's the crowds that's causing. But it was one of those things and another piece of data which i've started to collect that's slightly more manual, because it's not in any feed is the actual crowd sizes. So that would just give me another variable to work with. So then I will have the day of the game, the time of the game, the crowd, the referee. So it's different things I can actually put in there. It's creating the ever longer variable list that we can look at to try and get a more favorable outcome by looking at the data.

Oli: It's crazy!

Scott: What do you do when there's a period like this where where we've got Covid now, and obviously this year's been pretty strange in terms of the stats. What do you do about taking that data going forward. Would you use that data set, or would it throw things off and you'd probably throw away, or would you use something like the crowd size to factor that in. How would you tend to work that into the future?

Jamie: Once this is over then I could go back and actually analyse it in more detail, but we would actually use it to see what effect it really did have on it. Now one of the effects it's having is where there's more games in a short period of time, you're starting to get more injury. So what it's doing is all of a sudden you're seeing data sets growing, based on the fact that uh injuries are becoming more prevalent. Now technically with the fantasy football because when you collect that data, they will tell you whether someone's playing or not, and it will give you the reason. So you would get a reason  saying he's injured because he's got a muscle strain you know, but now he's he's not playing because he's in self-isolation. So it's starting to enrich the data so you can actually start looking at it. But with the data whenever we're looking at it, even though i've got the full data set over many years, I try to narrow it down as well to look at it. So when we're looking at form, we may only look at the last four to six games home away to actually see how players are doing. If you take this season for argument sake, can you take spurs and you know like with Harry Kane and Son. If you look at Son, he's sort of like those two are the top two scorers, but it's you know a large portion of Son's points have come away from home rather than at home. So it's when you start looking at the data starting to throw things in that you wouldn't have always expected, because you would expect home points being more than away points. Then you'd also start to to see certain players not doing so well in such a short space of time. When it is a case of for three or four away games on the chart someone was doing really well. But then that's really contentious because when you start looking over the home and away form, that is very temporary. If that's something you can't prove over a large period of time.

Oli: Do you find that you can see patterns within this data for for different players or for different teams. Is there any usual patterns within rise and dip of form for instance or anything like that?

Jamie: There is certain patterns. It's almost like over the last couple of years, over the last three seasons two players that spring to mind. It's two seasons ago when social first came in, and then Pogba really started playing well. Last year when they bought Bruno,  he did really well when he first came in. So the difference between those was one was a new manager, one was a new player in the team, so it was a case of you're starting to see things like influences which are affecting a player, which going back to the Pogba there's the new manager bounce, which you know everyone that saw says is it true. Is it not same group of players, they play well in a short period of time. They do very well but then they drift off and then their long term stats are pretty much the same as what they would have been. But you could get a short advantage by when a new manager comes in picking a player from that team where he's coming in because for some reason they play out their skins. So there is certain patterns, but generally you see it after the second game week based on the fact that all all of a sudden he's doing well.

Oli: I'm guessing these kind of patterns as well can almost predict a mentality of a player. I mean you're never going to be able to fully say what type of mentality each player has, but I mean I guess this kind of data can help understand that mentality of each individual player. Can you kind of see that from these patterns?

Jamie: With what i'm starting to do is look at additional data sources outside, to see if that has an effect on the players. So not just looking at the football data so for argument's sake if you look at in the news at the moment where you've got Rashford, how much of that is giving him a lift or not. He's quite up and down in his performances anyway so we're starting to look at getting feeds from twitter, and seeing what the sentiment is for these players based on the non-football activities as well as the football activities, to see if we can actually draw a pattern from that, and see if outside influences are actually affecting their in-game play. It's one of those things. Once you start looking at it, you start looking at what data you can get and being like, as I always say to people i've built the whole solution all on my own. It's a full end to end solution and it's one of those things with any software solution, it takes some time to actually create it. So i've got a backlog as long as my arm of stuff I want to put in there, but it's just a case of having the time to actually see when I can do it. Because the more variables, the bigger the data set, the more data points, the more accurate the insights are going to be.

Oli: Yeah absolutely. I'm guessing this is one of those things that can just continue to grow and grow and grow. You can bring in so many different sources of data to analyse every source every sort of like section of football, and every sort of game, and every and how every player interacts and etc. Do you use any sort of um AI technology tool to help you with the processing of this?

Jamie: No. Not AI, but there was some sort of predictive stuff that was using within python, to actually see if we can take all these variables and output either a predicted score, or the likelihood that they were going to do well. In fantasy football you've got the XG and AX which is expected goals and expected assists, which everyone really looks at because that's bait. The fact is like these companies that come out of XG and XA they are getting the premium API's from, or the really good stats websites, and so they're getting the players movements on the pitch and they're  seeing where they were on the pitch, when they took a shot, and that feeds into their XG and XA. That's way beyond what i've got at the moment, but we did come up a couple of different predictive python algorithms to actually see if we can predict the likelihood of how many points a player would score, based on the variables that we had.

Scott: That's amazing, That's really cool. It's just so much data to think about, it's an incredible level that some of these go to I guess. You said you've been doing this for for 12 years, now was it 12 years was that right?

Jamie: Yeah yes probably over 12 years.

Scott: That's amazing I mean has it has it evolved much in that time? Is it just been a case of getting more data on top, or is it has the kind of software underneath change along along the way as well?

Jamie: It's evolved massively. If I think back to 12 years ago when I first started looking at it, I was simply using excel. I was just filling stuff on an excel spreadsheet and see what it would give me. I've worked on Motorola for 16 years, and they devised six sigma and so they were very much into their statistical analysis. They were very much a company  driven by the fact that more data you have, the better you can be your insights, and the more successful you could be. So that's why I really first started looking at it. So my the first iteration was literally just an excel spreadsheet, then I moved on to access databases, because then I could build myself a little front end where I can input stuff into. Then I moved into a SQL server and then I at one point I was using integration services to extract the data, but again I was using flat files, then we were putting that into analysis services cubes. Now reporting services sat on the front end. That was good fun, taught me a lot. Again the progression in the technology was based on where I was going with my career. Then we then at one I removed all the analysis services stuff because I believe that was just too much for what I needed. Then it was just using SQL server databases, still using integration services, then I replaced integration services with python to use for my ETL processes. It was on the cloud for about a year and a bit, so built a solution in azure. I was using sort of like VM's again using python on VM's in SQL server databases, then I could that would just run in the background. I'm back on python and SQL servers, just the main two parts, and power BI is the front end. Power  BI is the visualisation tool as you've seen the other week. So it's come quite a way, and a lot of that is based on the fact that I wanted to learn the technology. So I used the side project as a way of learning the new technology in a safe environment, because if I messed it up, I only messed my own stuff up which really allowed me to sort of experiment more.

Scott: That's really cool! How much did the Cloud stuff, was that a benefit to what you're doing. I mean you said you've moved off that, was there reasons for moving off? Is it just just cost?

Jamie: Yeah. The company I was working for at the time, we were creating a e-commerce website, so I had an MSDN license. So I had like £150 a month to spend on Azure and they said I could spend what I wanted, they said you could do what you like, because it was it was my subscription. Hence why it allowed me to explore more about VM's and PAAS DB's, functions logic apps, stream analytics. So strangely enough I actually  built a solution for one of the companies I was working for in azure, based on what I built with my football data. So i'd used all the same components and that was a real-time data collection system, so I used exactly the same components but obviously I was just looking at a different place. So it really enabled me to actually learn a skill and use the skill.

Scott: Yeah what great way to do it absolutely. So one thing I wanted to ask Jamie was, I hear this thing a lot about correlation doesn't necessarily  translate to causation. Is that right? I guess you must see that in some of the data you're looking at, how much of that is a factor and is it easy to spot without knowing what you're looking, at and knowing the domain of football as well as you do?

Jamie: I think it's a case of you can use that to to put the narrative of what you want to produce. So it's one of those things, it's like when they say there's more shark attacks when they're selling more ice creams on the beach. It's sort of like, you can use that as a way of giving your narrative the edge and what you're saying. So you've always got to be careful of it and it is quite easy to be caught out by it. It's one of those where football's quite hard because there's so many variables, and so many things that can affect something, that you're almost felt like saying but okay, if I change a variable what would the outcome be. So it's almost like you're testing your your data based on the fact that you want to make sure that the correct information is is there. Sometimes it is a case of like we will use Liverpool as an example, only because they're particularly bad in defense at the moment, and it's one of those they're conceding loads of goals. Is it because Van Dyke is injured? Most people say yeah that's definitely what's happening, but if you look at the latter part of our season they were conceding all the time, and that was post covid. So it was a case of well actually is the fact that Van Dyke not being in defense causing a major issue, was it the fact there's no fans that's causing the issue. This is when I was saying earlier about Twitter, why you Twitter's a wonderful place for putting whatever you want on there. You could create a following based on your narrative. You can say right okay well Van Dykes not playing so Liverpool's going to do really bad in defense. So everyone starts selling, he comes back but that's not going to change it, because the seven counts was just causing it because they were bad when he was there. So this one knows, i'm forever checking the data and seeing if there's something else which could be giving me the true story. So that the insights i'm getting out are giving me an advantage, rather than what i'm believing is almost like what everyone else is saying.

Oli: So you find social media quite a credible data source then?

Jamie: There's two main types of people on social media. I love social media,  I love twitter for the football stuff. But you see a lot of accounts where they go ITK which is in the know. So what will happen is on a Thursday or Friday you will get these ITK accounts which will basically say  such and such isn't playing this weekend. So what happens is everyone gets rid of that player buys another player in, guess what that's not true. So there's the two types of people. There are the ones that really want to help people, and then there's the ones that really want to scupper people, so I take it with a pinch of salt. There's some really good accounts by the way, there's some absolutely fantastic ones which are really credible and they're really really useful, but there's there is a lot of people that almost want to trip people up.

Oli: Yeah, yeah, I can imagine. So it's almost that sort of Wikipedia approach isn't it. You don't quite know what uh you're getting whether it's true or not. I can imagine again like with anything though it's sort of testing that data from that source to see if it is credible, and then if it is you continue to use it, and then I guess you can get that sort of trust.

Jamie: Yeah because at one point I was collecting data from Twitter when I was on Azure, because I could do it on a on a regular basis. I was using it to create sentiment analysis based on players or what people were saying, but I could never get it to really give me anything which was solid. So I decided that was one data source that I wasn't going to use at that moment in time. I will revisit it but what I wanted to do was try and build up some type of credibility rating as well as a sentiment analysis. So you can see who's saying what, and to see whether what they were saying before it happened and then what happened. Then you come over credibility rating based on the fact that you know 90 percent of what this person was saying didn't come true. So i'm not going to listen to them, or 90 percent of this did come true so I can listen to it. Again it's just expanding the data set to actually make it better. It's a bit like when people buy these hot tips of horse racing. How many of them are really hot tips?

Oli: Yeah absolutely. Are there any data sources that you're interested in exploring that you haven't done yet. Is there anything that's top of your list that you really want to start using or start exploring about how credible it is, and how useful that data would be.

Jamie: It's the social media. This is something I really want to go back and revisit. I think that could really lend itself to some good information, because it goes back to credibility. If one person is saying 70 chances may not be true. If you've got a thousand people saying something, the chances are that it's more likely to be true. So using the power of the crowd. I definitely want to revisit that and see what I can gain from it. There's other things as well, the weather data.I've been going back and forth with the weather data for quite a while because I do believe that plays a part. Again I haven't been able to prove it or disprove it yet, so that's definitely another area where I looked at the data for a while. I want to go back but the freebie API's and the weather people are not brilliant. So it's a case of weighing up to see whether I do want to actually start paying for some of these API's, and how much it would cost against what i'm getting out of it. Because if it's not too expensive, then then it's worth paying.

Oli: I'm quite interested in how the transfer window and transfers have an effect on the data, and how it's used and the end result of using it for a fantasy premier league for instance, and doing your transfers and stuff. So how does the transfer window affect that does it? Do you find much difference within like the data coming in, and can you measure for instance from social media? I know there's a few accounts who have that information beforehand so does that kind of factor into your data or is it just as and when the player comes in?

Jamie: So within my data model, the players are the equivalent of slowly changing dimension. So what will happen is most of year the player will just have a single entry into a table because they're a single team, but if you look at if players move, either after the season started or in the January transfer window, they would have two entries. So what I can actually start seeing is what they did before against what they did after. So there's  two players that spring to mind which have definitely improved from moving to those other teams and the stats backed that up. Again another thing to consider is where they're going from and to in, relation to what's happening with that team at that moment in time. So the January transfer window, you generally find that the players are almost brought in as a panic buy if you like, whereas the players that bought in at the start this season are more likely to have been brought in because there's been a long-term future for them. That's still not saying because sometimes they will buy a player who can't be released until January, so that's different than just buying a player in January because you've got an injury crisis. So you generally do see a change in the data, based on certain types certain types of players. Again it depends on whether the positions that they're going into. Sometimes it's a case of a player will be bought from one to the other they will be played out in position. So the data sort of gives them a slightly inflated points total based on where they are, because they're playing further forward for argument's sake.

Oli: Right okay. From I don't know i'm just as an example Diogo Jota for instance, last season at Wolves. Everybody could see he was a good player I don't know if your data ever showed that he was probably a good player. From a statistical analysis they were and I know that Liverpool actually use that model of statistical analysis for their transfer. I know he had good stats beforehand., but coming into say a better team with players who have higher stats around him, do you find that you know you often see those stats become elevated when those players come into that new team?

Jamie: With Jota that was quite an interesting one, because it was going to be a case of he was brought in to be the fourth player in the front three. So from a premiership perspective, he was always going to get less game time than the front three, but he's been bought in for the long term because of his age. He just so happens to have actually lit up Europe, and because he's in a team with better players he's playing There's two things which I do when i'm looking at the fantasy footballs. The eye test and the stats test. So when you look at the eye test, the problem with some eye test is like with Jota, you're looking at the European games, not the premier league. So what's happening is you're basically saying well actually stats are not as good as someone else, but when you see him play you think well he is that much better. So it's one of those where you're sort of like you're combining what you're seeing, against what you're seeing on paper. Then it's a case of his price is lower, so he's worth bringing him because he will get game time. He won't get full matches, but then his points per million is going to be that much better because when he does come in he does score. That's one example of if you just went by the stats you you could miss out.

Oli: Yeah. I'm also kind of thinking about Chicharito back in the day for Man United that super sub sort of thing you know, goals per game time for instance. Maybe him playing for a full game times if someone else was injured, and he plays for that full game time, he didn't get as many goals as when he was brought on as that super sub for the last 10 minutes he was scoring more goals. So do you do you often see your data throwing out those kind of patterns as well?

Jamie: You can see those patterns. What I do is because you get the bonus point system within fantasy league, what I actually is monitor the bonus points against the non-bonus points. So if a player plays 60 minutes they're going to get bonus points. So super subs are gonna  lose out in certain areas, but not in others. So then it's a case of well actually it's their price range becomes the pinch point of whether you're gonna go for it, because it's one of those where if you've got a player that comes on and it's very explosive it may be worth having in certain fixtures. So then you would look at the fixture difficulty where you can say well he's not gonna start because this is their best starting lineup which they always start with, so then you'll start to look at the minutes that the players are having, and then you're saying well actually in these fixtures I believe he's going to come on. So the game could be tight, and that's why he's going to come on. So then you would pick him. So it's a combination of you're looking at the points, you're looking at the bonus point system, you're looking at how many goals you're scoring, how many assists he's getting in for what he's getting. Some of the data which I look at is almost like the player form. Now with the player form that goes down into the data a lot lower, because if you look at headline figures, you would always choose the same players week in week out. The problem is with data, I say the problem with is with the data, it's the problem with how you can interpret the data, is if someone's on a hundred points and the next person's on 70. That person 100 points could have a lot of bad games, before that person is on the 70 is going to hit him. So again when I look at the data, not only do I look at the headline figures, I also window my data to say right okay, what did he do in the last four game weeks. What did he do in the previous four game weeks. So you've got a roll in window where you can actually start to see what people are doing. Stuff like super subs and stuff like come into it. Because any player that's owned by less than three percent I classify as a differential, and that's one of those unwritten rule.  Again now what you generally find the players that are subs are generally in that three percent, so then it will be a case of something like Chicago, you would actually look at is he a differential, who are United going to be playing, is he worth getting in because sometimes you need that difference.

Oli: Cool. I mean there's many different fancy premier leagues out there. I find that often their pricing structure is different, and how they work different. Do you find that your model fits all of them or is it just one particular fantasy premier league?

Jamie: It really only fits the fancy premier league at the moment.  There were there was a group of us that created a league in the premier league last year, and when the premier league stopped, one of them said let's go and join the Bundesliga. So we went in the Bundesliga and that is run totally different, in the fact you could change your cap to mid game week as long as you haven't played. The problem is the rules that apply are different. Now with the actual way that the data comes in, they're all done on game weeks. So in one respect it does fit them all, but the way you use the data and interpret and play the game will be different because the rules are different. So it's the fact that some leagues will get they will give you points based on accuracy of passing, how many interceptions you did. So they're  more complex. The premier league, a lot of people say it's not the best, but I think it gives me the most enjoyment. It's probably one of the biggest outlets because it's got over seven and a half million global players. It just fits in that area where there's loads of data, and the fact is that it's easy to play, the rules are quite straightforward, there's a great community out. So i've not really tried to fit what i've got with other leagues.

Oli: Cool. So obviously the whole point the data is based around is the premier league, but then do you factor in the fact that some of these teams within the premier league, and the players in the premier league, also play international friendlies also play champions league, Europa league, FA cup. So how do you track that?

Jamie: I literally just manually input the games which were played, because it's like that old adage the teams that play in Europa league always do bad the following weekend. So i'm just recording what teams have played when, and what type of competition it's in. The thing is because even though the data which I collect for fantasy football is only on the premier league, I collect data from about 20 leagues globally, and the good thing with tracking at least 20 leagues globally to European leagues. When I start putting in the data for European competitions that gives me how those other teams are doing. Now with the real football which are being tracked from 20 different leagues, i'm only looking at the headline figures, so what was the score. I do keep home team, away team,  half time score, full time score, obviously half time result, full-time result. There's also data from about I think there's about 10 different bookmakers on what odds they give that match. So I can actually look at what the bookmakers were offering against the actual results, to actually see their accuracy, which then that helps me look at the teams which are most likely to do well based on historical data. Then that feeds down into when i'm playing fantasy football because then I can actually start saying that these are the teams which I need to watch out for. Anyone that watches any form of premier league will know that it's probably only seven or eight teams you're going to pick most your players from anyway, the rest of them is a little bit hit and miss.

Oli: Yeah definitely, definitely. You'll probably get one or two of the old players that someone might pick and you know to get lucky. Yeah, no generally I think what was what was my one last week I think it's um Pukki from Norwich. He just came in and scored a ton of goals from day one. I know he was good in the championship. Do you ever factor that in then, like players coming through from the championship. If they've done well, will they do well into the premier league, or is it is it just something that you know as soon as that player comes from the championship into the premier league, that  it's kind of just like new player data.

Jamie: It's this new player data from the perspective of what we're getting, but I do record some data from when I was in the championship based on what they've got. I also do a team level. So again because  i've got 20 leagues worth of data from the last 30 years. So if we just look at the the Premier League, that enables me to actually look at when a team comes up, how well do they do against other teams, and when do they start to drop off if they do drop off some teams just bad from the start. So not only we are looking at the new team data but also the players are coming up. Hence why I went for Patrick Bamford from game week one. He had a great season last season. But the reason I do that, is not so much for fantasy football, but it was when I was building a predictor tool I was looking at if you was going to place a bet, when should you place a bet. It goes back to my days from working in Motorola with statistical analysis. So i'd look at the outliers and I readjusted it every year. I believe the outliers in the premiership are the first six in the last six games the season which gives you 26 games which is more predictable than the other 12. The reason being, teams coming up generally have got a slight advantage than other teams because they may have brought in new players which means those teams are slightly different, which means that the premiership teams from the previous season that are playing them it takes them a little bit of time to get used to them. Also, the first six games of the season player fitness isn't so great after six games, and players are sort of like their peak fitness really could be slightly more than that. Then at the end of the season, a lot of things have already been decided so the only teams that are really fighting for it, the ones in the relegation zone, so you start to get results which you wouldn't expect. So the last six games the season if you if i was going to place a bet, I would probably look at games which you wouldn't expect, because they're more likely to come up with a weird result - but even if you look at like like this season for argument's sake, if you look at you know Liverpool have conceded seven city seven to five, and that's in that first window but they conceded against teams that hadn't come up. But it's still that there's the data, it's not just telling me things from a player perspective and from a team perspective but it's also giving me a time perspective so it's basically highlighting the fact there is outliers - so watch out for them because they will catch you out. Mind you, if I would have put Liverpool getting beat 7-2 by Aston Villa, i'd probably be a rich man by now.

Oli: Yeah, painful that yeah but i can imagine that this year is actually probably quite interesting from a data analysis point of view because it's so different from every other year and I think obviously Covid has a massive part to play in that, and like you said it it's not that it's the crowd not actually being at the game and not having a presence at the game, it's almost like that home/away advantage has been taken away. Have you found that your data is like almost completely changed compared to all the other years? You could probably, I don't know, you could probably guess if a team came to like Aston Villa for instance - hopefully they'll get beaten four or five nill, but now it's kind of like that home and away feelings being taken away. You don't have Anfield, you don't have the cop. When you know Old Trafford for instance, you don't have the crowd there, so i can imagine that that that data is now changing quite a lot from what you're seeing.

Jamie: Yeah, because you would normally put a higher weight in on a home team but this season if you look at the first few games this season it was the away teams had the advantage, and it was almost like this when you watch them on tv, it's got that - you know, you watch a game of football it's like watching a friendly. There's no crowd, the artificial noise is abysmal, and so if you don't watch it with the artificial noise where you can hear the players. It's almost like got that "oh i'm watching a friendly", so i'm not sure if that actually is relayed on to the players and so it's a fact is that rather than being really sort of like you know absolutely buzzing to play at home. A home game in front of your home supporters, they're not there so it really sort of like levels the field and that that has been shown where the home advantage hasn't been there.

Scott: There's so many factors aren't there, and i would imagine part of the 12 years working on this has just been understanding, like looking at the data and then like taking all these facts in. There's just too much to think about from the sounds of it.

Jamie: Yeah, and that that's the interesting thing because the thing is the football data is you you never get to a point where it's perfect because with football, every season things change. The squads change, managers change, and so it's the fact is just when you think you got to a point where you think right this is really going to work now, something happens and then something changes it. The thing with fantasy football is you never see the same people winning it year after year. You see the same people in the top one percent, but it is the fact is that there is just so much to take into account, and i think that's the problem and everything that you take into account will change because teams change, you know. And football is quite cyclic because it is a case of a team will start at the top for quite a while and then they will start to get worse as their manager and their players get older and then there'll be new kids on the block, and at the moment i think football's great because everyone is beating everyone and you look how good spurs are, you look at how good man city are, you look how good Liverpool are, there's almost like a riches of teams. So it's not like just like when Arsenal and Man United was winning everything - United predominantly.

Oli: How have you found that the evolution of software over the past few years has impacted it? I know you were brushing on this earlier like you started out in excel and now you're here, i mean i feel like within the last five years especially like software is just booming especially all this sort of freemium software that we're getting at the moment as well and there must be like a sheer quantity of coming in and then you know it's going to evolve more and more where AI is going to get more involved. How do you see that impacting your solution that you have at the moment? Do you see it making it better in a way or different in any way?

Jamie: I think it would definitely be better, but i think it's not just the software because i think the thing is that we're using things like python. It's free to use, and there's a big community out there, and people start sharing ideas with each other, and this is where the social media becomes really good because then people do start to share information, and you can make whatever you've got good. Because it is one of those things - there is the prestige of doing well in the fancy league, but it's the fact is that it's not like you're going to win millions and so people will collaborate, and it's when i talk to different people about it and what i'm doing, they sort of say "so have you looked at this, have you looked at that". So i think it's the software is becoming easier to use, there's better packages out there like with python for argument's sake, you use vs code or pycharm. Then there's more things to share, there's loads of things on github where people are sharing - they're making their stuff accessible so it's becoming more and more collaborative. Things with football in the data science world is, you know, teams and now  professional football clubs are starting to employ data analysts and data scientists, and so it's becoming more prevalent in the fact that the more data you have the more you can do. But i think it is a case of also the more data you have,  you can flood yourself for too much data, and then you start to have data which isn't meaningful.

Scott: Do you think that the importance of data engineering, data science is being realised by companies at the moment? Is that trend starting to to take hold now?

Jamie: I still don't think companies are using data correctly. So if i go back to when i started at Motorola back in the early 90s, everything was measured, but it was measured to actually get some insights from it. So we wouldn't just be measuring for the sake of measuring. We would know what we wanted to do - we wanted to use that data become more efficient and so we would. And we saw sites using statistical process control, whereby your higher and lower limits would change based on what was going through - because if you had one item being made and it failed then it would just be a case of that would be a 100% failure but that's not a true 100% failure as such, and it's just the numbers are so low. And i think it's a case of companies over the years they just collect data but i don't think then they're collecting it to to say they're collecting it - they're not collecting it to actually say we are collecting this data because of whatever. So i think that's why it is just however long ago it was ten years ago everyone had to have a data lake, so Hadoop was massive so it wasn't the case of what you were collecting, it was how much you were collecting. So people just collected petabytes of data - but why - is because you were collecting petabytes of data It's that data literacy to actually look at what you're collecting to actually see what it's going to tell you, and that's i think the difference between data driven companies, and data companies that are not driven by their insights. So i think we're coming to an interesting time and i think over recent years when the new job titles come out like with data engineers, data scientists. It's a case of being a data engineer, one of the problems you've got is that you know, i need to move data from here to here so that someone else could do something with it - but it's the case that i need to understand what they're going to do with it so that i can produce the best data set for them, so when they come to do it it minimises their time. Because they don't want to be doing any form of data engineering, so i think you end up with silos ,where you end up with the data engineer will do something and can then pass over, rather than working in the horizontal work stream whereby you put together a team of people to go and get some data, to move the data, transform the data, visualise the data. If you work in a horizontal where everyone understands what they're doing and why they're doing it it then becomes a really good data-driven insight.

Scott: Yeah, that's some really good insight, because i mean you see a lot of these jobs floating around at the moment like you say, with data science being the hot topic at the moment. But i think it's probably still being reserved for the the large companies isn't it - or the ones that are, like you say, truly data-driven at their core. It'll be an interesting few years to see how that plays out, i guess.

So, i think we're probably going to be wrapping things up here. Anything else you wanted to to cover off Jamie before we close down for today?

Jamie: No i think we've we've covered it all off. I hope that was insightful, i hope you got something out of it.

Scott: Totally, a super interesting project, and it's really good to see how data can be used on something that's really accessible and easy for a lot of us to understand as well. But it sounds like a ton of work's gone into it and really interesting to see where that goes over the next next 12 years as well.

Jamie: Yeah, it's one thing i will say, because when i'm speaking to people and they're, just touching on what you just said about data scientists and stuff, that it's when i speak to people and they want to make their next step in their career, and they said i want to learn the cloud. When are you going to learn the cloud? Well i don't know what to do - and then you say well, do a side project. If you just do a learning course on Azure for argument's sake you are just going to be doing something which is out of a textbook, and it may or may not interest you, but if you actually read that textbook and then you do a side project. And i always say to people if you do something like sentiment analysis and twitter you will then learn how to work APIs, how to actually consume data from APIs how to get data from those APIs, and put into a data source, you could use SQL server you could use Postgress you could use cosmos db and then you would learn how to visualise that data. It gives you then, you start to build a full appreciation to what data is and how you can use it and that's why, like people are amazed i spent 12 years working on my side project but i wouldn't be here if it wasn't for my side project. This enabled me to actually show people and it's amazing how many interviews i've been in and my side project has come up and in certain instances, i've been in an interview and they've sort of like said - so you know they do the normal questions and then they sort of say to you "yeah you can answer these questions, but can you really do it?" Because of most of the time i've always got my laptop on i've been in interviews and i and i've said i've done something and then they sort of said can we see it so i said yeah. I've actually gone through it there in the interview, so it's been a really useful tool to actually not just learn a technology but show you can actually use it.

Scott: Absolutely yeah. I couldn't agree more - i think so many of these projects, i know in the past i've struggled with with side projects when it sometimes seems like there's no point to what you're creating, but like you say at some times almost the point should just be to learn something new potentially - like that should be good enough reason just to just to go grab something off the shelf and use it.

Part One

Part Two

Part Three