Imagination has a great deal to do with winning.
– Mike Krzyzewski
While I realize that it may be sacrilege to quote Coach K while I write this from the city of Charlotte, I have been taught that wisdom has no boundaries. Just like winning…
I also offer to do another sacrilegious thing and edit the quote above – Imagination, and Data, has a great deal to do with winning (for those of you who know me, I am clearly biased).
Our tryst with sports started with one very romantic idea – pick the one sporting event in America where passions run high, play is pure, Cinderella becomes more than a fairytale, and data (yes data) is the secret weapon for many an amateur bracket builder – and maybe, just maybe, we could bring some semblance of order to this thing called March Madness.
And of course, as with every audacious act we have attempted, we wanted to do it with the best there is at the helm.
When Tim Chartier, one of the foremost minds (and hearts, for those lucky enough to be close to the man) in Math our generation has seen, decided to become our Chief Researcher this past summer, that very romantic idea starting taking shape as a tangible data analytics endeavor.
While we had successfully applied OPTIMUS, our uber-sexy and powerful analytics platform to problems ranging from Anti-Money Laundering to Population Health, we were still not sure how it would do when force fed with lots of sports data. Especially because if we screwed up, we risked inviting the wrath of many a basketball aficionado.
So we put together the essentials – a crack team (Sarah, Caitlin, Grant), ‘big’ data (14+ years of play by play data from NCAA March Madness), and the most complex of our machine learning engines (who knew that an at-scale, real-time relationship discovery & parallel inference software was actually perfect for analyzing patterns of college basketball play…NINE QUINTILLION OF THEM).
It helped that Dr. Chartier had a pre-established reputation as one of the most enviable minds to run numbers on March Madness and an incredibly keen mathematical understanding of the game (honed as a numbers coach for the awesome Davidson College basketball team).
What we added to this intuition was a lot more data, and for the very first time, an ability that allowed complete seasons, across conferences & time to be analyzed & viewed all at once.
“You don’t play against opponents, you play against the game of basketball.”—Bobby Knight
The most interesting part about the game of basketball, like any sport, is to view an upset in progress. Almost preternaturally, I think, we are pre-disposed to rooting for the underdog (unless I am the outlier here).
So we made our first ‘prediction’ the hardest – find patterns that identified and characterized “Cinderella” teams (please read Glass Slippers for a really cool peek into our software).
If you know anything about March Madness (aka “The Dance”), you know that every year there is one first round, low-ranked team that ruins everybody’s bracket—the Cinderella team. Lehigh (15) vs. Duke (2) in 2012, anyone?
With enough data and a way to analyze it all simultaneously, we had a strong feeling that OPTIMUS will start recognizing any/ all common attributes across all upset teams since 2002.
What we found was fascinating – seed, conference, number of tournament wins, bid (or ‘berth’) type, tempo, offensive efficiency, defensive efficiency, and score differential – were some of the most important factors that helped predict (and identify) upset teams.
Our research further revealed following to be specific examples of upset teams:
- Teams in smaller conferences that have important out-of-conference wins, or close losses
- Teams that had a win and a loss to the same opponent within one season
- Teams that struggled against less competitive opponents, but won against higher ranked teams
- Low seeded teams with strengths higher seeded teams lack
Having backed opinion with facts, we then applied our sophisticated inference algorithms to quickly create any tournament, team or ranking permutation we wanted instantaneously.
We were able to ask OPTIMUS things like, “Show me teams that beat the more competitive teams and struggled against the lower ranking teams during the tournament” OR, “Show me all close out-of-conference games.” This type of analysis, when applied to all 14 years of data, exposed natural similarities between teams and patterns of play.
We have, since, decided to translate these ‘discoveries’ into something we are calling TRESATA TOURNEY TIPS (debuting with this blog). While they will appear in many places (like the NCAA March Madness site – Tresata Tourney Tips), they will always be bared first right here, on our website.
We plan to offer a lot more fun tips as this year’s madness gets underway, including plans to share with you our picks for upsets (by round) and track all this new data for years forth…
As the best romances are never fleeting ….
INAUGURAL TRESATA TOURNEY TIPS (to whet your appetite)
- IT’S ALL ABOUT STAMINA. For teams ranked at 10 or higher, keep these stats in mind: Of teams with a 10 or higher seed, only 2 teams have ever won 4 games in the tournament (2.3%) and only 4 teams have ever won 3 games (4.5%). Of these teams, no team was higher than a 12 seed.
- DOES THE GLASS SLIPPER FIT? Consider this: 1/3 of Cinderella teams are ranked within the Top 30 offensively. 55% were ranked within the Top 50.
- DO TEAMS OUTSIDE OF THOSE WITH A TOP 3 RANKING EVER WIN? In the last 14 years, every team except one was a 1, 2, 3 seed. The only exception was Connecticut, who was seeded at 7.
- CONFERENCE CALL. Every winner has been within the top 8 best/strongest conferences. For teams in weaker conferences (conference rpi > 10), it is more difficult to tell how well they will play against stronger teams. To get a better sense of their strength as a team, look at the out of conference games that they play at the beginning of the season. Even if they lose, if it’s a close game or they limit the number of points scored by the other teams, that may indicate that they are a potential Cinderella team.
- WHERE WILL MY UPSETS BE? 76% of upsets are by 10, 11, or 12 seeds (27% by 12 seeds alone).