For a couple of weeks now, you have all put up with my ramblings about things like points per possession and listened to me throw around weird looking abbreviations like eFG% and TS%. The reaction to the statistical analysis and the overwhelming number of charts has been positive, and I truly appreciate the warm welcome that I have received.
In addition to these fancy statistical terms, you have also seen me refer to what I call "my ratings." I call them "my ratings" because I don't have a fancy name for them. But that's neither here nor there. More to the point, though, it occurred to me that I have yet to really explain what my ratings consist of. So that's what this post is going to do.
And I'll warn you right now: this may get a little more technical than you care for. I recognize this is not for everybody, so unless you really care about the math behind my ratings, feel free to hit the back arrow.
Much of my interest in statistical analysis happened as a baseball fan. I purchased MLB.TV for the first time in 2008, and when I did that I decided I wanted to start reading more about baseball, too. Eventually, I stumbled upon sabermetric-oriented blogs like FanGraphs, Beyond the Box Score, U.S.S. Mariner, and Lookout Landing and my interest was piqued. (Yes, I'm a Seattle Mariners fan. No, I do not want to talk about it.) Once I was well-versed in advanced baseball metrics, I immediately became interested in advanced metrics in other sports; namely, college football and basketball. Eventually, I got tired of just reading and I wanted to calculate these numbers myself, which led me to initially start blogging and then throw charts and stuff up on Twitter.
Now, I'm telling you about the baseball background because a large part of how I format my ratings is borrowed from baseball. If you look up player stats on FanGraphs you will see some stats have a "+" symbol next to the name of the stat. The "+" symbol means that stat is adjusted (for a variety of things, like ballparks) and put on a scale that makes 100 the league average, with anything above 100 being above average and anything below 100 being below average. You may recognize this from the Pre-Game Franalysis posts that I put up. For an example, here's a chart from before the Michigan game:
I always give the quick primer on how to read this chart, so I won't go into much detail here. But, basically, this chart says that Iowa's offense was 7% above the national average in effective field goal percentage (eFG%), 13% above average in turnover rate, 25% above average in offensive rebounding rate, and 25% above average in free throw rate before the Michigan game. I like scaling the numbers like this for previews because if I just told you Iowa had a 52.1% eFG% on the year, you might wonder if that was good or bad. I mean, what's the context? With the ratings you see above, you know right away if that number is good or bad because the comparison to the Division I average is built into it.
Furthermore, the chart title says that my ratings ranked Iowa's offense at #9 in the country (before the Michigan game), while the Wolverine defense was ranked #44 in the country. How did I come up with that? Well, let's talk about the overall team ratings.
The four factors are important, but I like to think of them as more descriptive than anything. They tell you just how a team is doing what they're doing. If a team is terrible at offense or defense, you can look at the four factors and explain why that unit is playing so bad. However, if I want to figure out just how good or bad a team's offense or defense is, the best way to do that is by looking at their offensive efficiency (OE) and their defensive efficiency. (DE) A team's OE and DE are just the number of points they are scoring or giving up per 100 possessions; essentially, points per possession (PPP) multiplied by 100 possessions. Anyway, my ratings are based off the OE and DE of each team.
So, let's start by taking a look at offense:
(click images to embiggen)
Here's a screenshot of what I see when I open my spreadsheet to the "Offense" tab. You will see everything is placed on the scale where 100 is the national average. You will also see the Hawkeyes ranked #7 here, at 15% above average on offense. If you look at the two far right columns, they both list statistics for OE. The first one is the OE for each team not adjusted for strength of schedule. The far right column (the one with the "+" symbol), meanwhile, is the OE for each team adjusted for strength of schedule. (OE+) So, with no adjustment, Iowa's OE is 12% above average until you adjust for their opponents this year, which raises it to an OE+ of 15% above average or 119.54 or 1.20 PPP.
Now, here's my "Defense" tab:
Using the Hawkeyes as an example again, my ratings have their defense ranked #14 overall, at 11% above average without the opponent adjustment added in and 12% above average after making the adjustment. That comes out to a 91.88 adjusted defensive efficiency (DE+) or 0.92 PPP allowed.
Knowing the OE+ and DE+ are nice because it allows me to put together interesting (to me, anyway) charts like this:
Here are all of the Big Ten teams plotted by this year's OE+ and DE+ ratings as of Tuesday, February 11th. Notice how Northwestern is the only team in the conference that has a below average unit this year (their offense), but they have helped make up for that with a very good defense. The Big Ten is tough this year, man.
Now, let's put this all together:
Again, the numbers above that have the "+" symbol next to them, are adjusted for strength of schedule, which means that the overall strength of a team is measured by their "Total+" rating. Iowa's Total+ rating is 114 or 14% above the national average, which is just the average of Iowa's adjusted OE (115) and DE. (112) That's good for #4 in the nation.
Okay, so now you know how I come up with the rankings you see in the Pre-Game Franalysis posts. Let's wrap this up by going over how I come up with win probabilities and projected scores.
Going back to baseball, I use the log 5 method for calculating win probabilities for individual games. The easiest way to explain that is that I first calculate a team's Pythagorean Expected Win Percentage, which basically tells me a team's expected win percentage based on their OE+ and DE+. So, for an Iowa example, the Hawkeyes' expected win percentage is 0.954 based on their 119.54 OE+ and their 91.88 DE+. That means Iowa's estimated real talent level is more of a 0.954 win percentage team than the current 0.750 mark they actually have this year. Again, the Big Ten is cannibalistic this year.
Once I've calculated the expected win percentage for all Division I teams, I can use the log 5 method to get an idea of how often a team like Iowa would be projected to win a game against any Division I opponent, based on both teams' expected win percentage. For an example, let's look at Iowa's remaining regular season schedule:
Looking at Iowa's next game against Penn State, we see that, even on the road, Iowa has a win probability of 87.58%. That is because of the large difference in the expected win percentages between the two teams, with Iowa's expected win percentage at 0.954 vs. Penn State's 0.694. Since the game is at Penn State, the numbers are slightly adjusted to favor the home team, which means that if this game was at Carver-Hawkeye Arena, Iowa would have a win probability north of 90%.
Finally, let's talk about calculating the projected scores for individual games. In order to calculate this, I have to project the expected OE+ (xOE+) and expected tempo (xTempo) for each team. (Like in advanced baseball stats, an "x" in front of a stat's name stands for "expected.) The xOE+ for each team can be calculated by multiplying team A's OE+ (scaled to 100) by team B's DE+ (scaled to 100) and finally by the league average efficiency. (104.3) So, while Iowa's OE+ is 119.54 on the season because Penn State's DE+ is 103.79 (1% above average), Iowa is expected to have an xOE+ of 111.04 on the road at Penn State. On the other side of the ball, Iowa's DE+ on the season is 91.88 and the Nittany Lions' OE+ is 111.45 (7% above average), so Penn State is projected to have an xOE+ of 96.45 against the Hawkeyes at home.
Now, we need to know the xTempo in order get a projected score. Iowa's tempo is 6% faster than the national average this season, while Penn State's is 1% above the national average. Multiplying those numbers together with the national average of 67.10 possessions per game this season, the numbers expect about 71 possessions between the Hawkeyes and the Nittany Lions. Thus, if we multiply each teams' expected OE+ times the number of possessions and divide by 100, we get the projected score based on 71 possessions. So, Iowa's 111.04 * 71 possessions / 100 = 79 points. As for Penn State, 96.45 * 71 possessions / 100 = 69 points.
Really, I don't calculate win probabilities and projected scores for games any different than Kenpom does. The main difference is that I calculate OE+ and DE+ different than his numbers do. And, for the record, I'm not claiming, and I have never claimed, that my numbers are better or superior. I always use Kenpom's numbers as a reference point, especially since this is my first year of calculating my own college basketball ratings. I just started calculating these because I'm weird and because I can only read so much about advanced statistics before I want to start diving into them myself. I should probably seek help, honestly.
Anyway, that's more than enough background on how I calculate the things that you see in the game previews. If you made it this far, congratulations! And, I'm sorry. Hopefully, you got something out of this!