Let’s look at how we could model what happens in a football match. We need to model this in a mathematically simple way, that’s not too wrong.
One way is to let $X$ and $Y$ be the goals scored by each team, and then assume that they are independent and Poisson distributed.
$$ X \sim \text{Pois}(\lambda_X), \ Y \sim \text{Pois}(\lambda_Y) $$
I believe I’ve actually seen this in some paper I read a long time ago – I’ll add a link in this blog post if I find it. For now, let’s look at the motivation. Why could this be a reasonable model?
Assume that in any minute (or indeed every second) one of three things happens: (1) The home team scores, (2) no team scores, (3) the away team scores. Add the following assumptions:
- We split the game into $N$ different parts, sized so that the probability of a goal in each part is equally big.
- The $N$ parts are independent. This means that the game is played exactly the same way if it’s 1-0 in the 90th minute as if it’s 0-0 after 10 minutes. (This assumption is clearly wrong, but let’s stick with it for now)
Now assume that the expected number of goals for the home team and the away team are $\lambda_X$ and $\lambda_Y$ respectively. If we let $N$ go to infinity, then we actually infer that $X$ and $Y$ must be Poisson distributed with these mean values.
Before thinking about better models, I think it would be interesting to check how good this model is. In particular:
- Can we find statistics that prove that this model is wrong (and quantify how wrong it is)
- This model should give a relation between the odds for home/draw/away and the number of goals scored. How accurate is that relation?
I’ll explore some of this in a future blog post.
Leave a Reply