Arkansas is a better defensive team right now than the season numbers indicate. That’s liable to put a monkey wrench in their simulations. As we showed against UF, we can switch to a defense that looks different than the defense that tallied most of our season numbers. It’s a tight matchup, but I wouldn’t put much faith in those simulations on the way to Vegas. If you picked straight chalk last year in the first round, you would have gone 26 - 6 with one miss being an #8-#9 matchup. So, 27-5 is no great accomplishment.
I’ve got a model that has a surprisingly good correlation (0.74) between predicted margin and actual scoring margin in our games. It picks the winner right in 30 of our 34 games. Three of the misses are in the last eight games, all predicted losses that turned out to be wins. My hypothesis is that our season numbers underestimate how good we are now, and that recent improvement in our level of play throws off the model relative to our season numbers.
I download the season stats of all our opponents and then train logistic regression models in R on every combination of three stats to classify our wins and losses, irrespective of margin of victory. I take the scores from the most accurate models and save those models to get a matchup score by averaging the individual scores of the models. I then use that averaged matchup score along with location of the game (H, A, N) and the Massey composite computer ranking of the opponent to predict the margin in the game with a simple linear regression. That composite model is then run on the stats of future opponents for a prediction.
In theory the model run from the Arkansas perspective gives some indication of how Arkansas has played against a team like Butler. However, the numbers have to be crunched from the opponent’s perspective as well in order to see how they have played against a team like Arkansas. Strangely the correlations I get for Arkansas are the highest I get for any team so far. The Arkansas model, which passes all the standard tests of statistical significance on training, indicates that we play well against a team like Butler and predicts a win. However, the model also predicts a win from the Butler perspective, but the Butler model passes none of the tests for statistical significance on training. When I average the predictions weighted by the correlation scores, I get a tossup. Both teams apparently possess qualities, according to their season stat lines, that the other has exploited on the season. One problem, however, may be that Big East and SEC stats just don’t have one-to-one correspondence. The conference stats are surprisingly different.
Vegas lines have a standard error of about 10 points against the spread in college basketball. This method is just something that I pieced together the last few weeks. My guess is that you have to use some serious elbow grease to do better than Vegas. It may be that random elements in games like officiating and hot and cold performances just won’t allow a significantly lower error than 10 points. That’s the noise limit. Most NCAAT matchups in which the seeds are fairly close are within 10 points on average, which would explain why predicting the tournament is so difficult. I don’t think any computer models are going to give you a lot of definitive answers with high confidence. Anybody claiming so is probably trying to generate clicks or fees.
I give Arkansas the slight edge based on the accuracy of the Arkansas model, but let’s just say I’m not putting any skin in the game based on my model’s results. I don’t take the results too serious, but I do believe that the model underestimating Arkansas lately is a good sign that the computer models in general probably probably underestimate Arkansas. The errors for my Butler model lately are predicted wins that turned out to be losses.
I use an analytic predictor from Ed Feng who typically gets a great read, and his can be found on a site called powerrank.com. I find the sites are trending towards Butler, as did Vegas and I get that we don’t match up well .
it was there before your Nostradamus commentary, has been there since Wednesday. I did not feel the need to share, just put it on today post game to point out that we met our match and the outcome was expected by analtics. Do what you will with that info. Analytics did not get the AZ drilling. Certainly not perfect, but I was shocked how many times we were predicted to lose. Most computer models had us losing more than winning