An Ambitious Project

Mithrandir

Key Player
Good afternoon friends.

As some of you are aware, I'm currently a graduate student in computer science, concentrating in data analytics and machine learning. Most of you probably know that I'm very interested in footie data and in the past enjoyed doing explanatory analysis.

For my graduate degree, I need to complete a "capstone" research project. Given that, it seemed logical to base my project off something that I actually give a shit about, which led to the formulation of my ambitious project.

I want to create a system that can automatically scan a footie stats database in order to collect relevant data and store the information. From there, the system will use the data combined with manually generated metrics derived from that data to develop a predictive expected goals scored vs expected goals conceded model. I'll probably start using standard regression and see how that turns out. It will probably be better to use some more advanced machine learning model, but this will be developed later down the line. Using some other machine learning method (Bayesian Deep Learning?), is almost certainly going to be necessary in the long-run to outperform human predictors.

Once I get to this point, it's time to heavily test and improve the model on new, actual matches. I envision getting the model workable, and then potentially entering my system into the Prediction League (next year, not this year), to see how it performs relative to knowledgeable football fans (that's you lot), with an eye towards continued improvement of the predictive model.

If I get to that point, I would like to add an additional module to the system. I would like to add a system that can scan and record the information from betting sites about each game the system has predicted. From there, it would compare this betting data with the predictive model and make recommendations to maximize expected winnings.

I'm hoping to have the standard regression model ready for next season. To get to this point, I'm going to have to brush up on my Python a bit to create a couple of good web crawlers and program the predictive model itself. This will take some time, since I still am working full-time, but having something in place by next summer seems very doable. Once that model is in place, I will work on developing a better machine learning model alongside it.

I apologize for the random, technical, meandering post. I've been thinking about this for a while and wanted to type out my thoughts. As I mentioned, it's definitely an ambitious project, but I think it's definitely possible to do with enough time.
 

ivoralljack

Grizzled Veteran
Staff member
Hugely interesting, Mith. It's way above my sphere of knowledge but I wish you every success with your project. And, of course, I'm sure there'd be no shortage of offers on here should you want assistance with anything related to football.
 

Jackflash

Midfield General
Staff member
Mammoth task Mith,but wish you every success with it, pretty sure this will be of interest to our resident statistican Crojack.
Once again the very best for a successful project.
 

CroJack

Key Player
...statistician Crojack.
Statistician CroJack? No, mate, I'm definitely not a statistician. Just a fan who doesn't want to be biased. Statistics is good when it comes to analysing teams and players' performances and predicting trends, but, in my opinion, nothing can beat watching football games. There are still some important things statistics can't do, like measuring real quality of chances, shots, blocks, saves, positioning...just to name few things.
 

CroJack

Key Player
I want to create a system that can automatically scan a footie stats database in order to collect relevant data and store the information. From there, the system will use the data combined with manually generated metrics derived from that data to develop a predictive expected goals scored vs expected goals conceded model.
That's great mate :).

I'm writing an article on a different xG (expected goals) model than the current ones. My model is not based on statistical data (Opta's shots database) but on exact data, something like goal line technology.
The story behind my decision to take up the challenge and create a new xG model is this one:

A week ago I asked people who do some serious football statistics analysis what is this Ayew's chance worth in xG:

IMG_20190906_193423.png

The answer I got was interesting. According to InfoGoal, Ayew's chance has xG 0.05 and according to WhyScout xG 0.15. The number for some other xG models is 0.08 and 0.09. All of this is telling me that they are guessing. 10% difference from the lowest and the highest number is simply too much. Why?

Let's say Swans have five chances like the Ayew's one per game. 5x46 games is 230 chances and 10% is 23 xG per season. This is a too big error margin. With other words, the current xG models are crap.

Another interesting thing when it comes to the current xG models is that they all do xG calculations from shots. And that's a huge flaw. Let's say a player is in an excellent position to shoot, but he hesitates and gets closed down by two opposition defenders. In that case, the chance has gone and is not mentioned in xG statistical data. I'd argue that teams produce many more xG per game than the current models register.

I think that people who have created different xG models have overlooked the most obvious and most important thing when it comes to xG calculations.
 

Mithrandir

Key Player
My model is not based on statistical data (Opta's shots database) but on exact data, something like goal line technology.
I completely agree that this is a better method. I think the ideal prediction system would access a proposed database that held the video recordings for all matches and built up its data holistically and thus developed its model as an observationally data-driven model.

I would love to try to design something like that at some point actually. It would be much more of a massive undertaking, but you could really experiment with some interesting learning algorithms in a system like that. I don't think I have access to the proper computational and storage resources necessary for that. I'd probably have to work something out with a cloud service. That's if I could even get access to a bunch of match footage.

But yes I agree with you, the ideal prediction system is observationally data-driven, rather than raw flat data-driven.
 

Ladygargar

Fox in the Box
Staff member
I completely agree that this is a better method. I think the ideal prediction system would access a proposed database that held the video recordings for all matches and built up its data holistically and thus developed its model as an observationally data-driven model.

I would love to try to design something like that at some point actually. It would be much more of a massive undertaking, but you could really experiment with some interesting learning algorithms in a system like that. I don't think I have access to the proper computational and storage resources necessary for that. I'd probably have to work something out with a cloud service. That's if I could even get access to a bunch of match footage.

But yes I agree with you, the ideal prediction system is observationally data-driven, rather than raw flat data-driven.
Good luck with all your endeavours my friend - I’m chuffed to see you back up there where you belong; even if I don’t have a clue about your ambitions I am forever on your side 🥰
 

Ladygargar

Fox in the Box
Staff member
Things have been interesting, testing, and busy - thank you for asking; I’m juggling caring, kids and business - my respite is all things Swans, I’m tickled pink to see the squad doing as well as they are, and having had a meeting with Trevor Birch I’m assured that we’ve turned an important corner, I can’t imagine having been invited to meet with the previous chairman; but I brought him some issues which affected our elder and more vulnerable fans and I have to say I was incredibly impressed by the gentleman - he’s a can do guy - and I respect that; we can do incredibly well under his leadership, he’s engaged, he’s interested and he’s invested in our club, suffice to say I’m a fan - he’s made of different stuff to what we’ve been used to, and I see great things ahead of us; I hope you are well and enjoying our successes so far and those yet to come 🙏🙏🙏
 
Top Bottom