THE SECRET TO ANY WINNING BRACKET, AS WE ALL KNOW, IS 99% DATA ANALYSIS, AND 1% LADY LUCK!
Okay, we might have that slightly twisted (maybe 2% Lady Luck)…but while we do not control when the curvy lady may sing, what we can do (with great success) is use data to better understand the so-called unearthly phenomena that mysteriously result in the emergence of bracket busters…ones that only manifest themselves in the third month on the third rock from the sun and just in time for The Dance.
So what does that look like?
Well, while we cannot share with you the “nitty gritties” (technical term) of all the “smart stuff” we do, we did want to give you a sneak peek into some of the analysis that led us to create the “Tresata Tourney Tips.”
We first collected, then curated all NCAA game data from 2002 to 2015. Which means we basically had data from every NCAA Division 1 basketball game from the past 14 seasons over 70,000 regular-season and tourney NCAA games during that time span. Yes, some call it ‘big’ data, for us…well, data is data.
THE SET UP
Once locked and loaded into OPTIMUS (yes, the same uber-sexy, ultra-powerful analytics system Tresata built), we were able to deliver a capability quite unique – complete seasons across conferences and time could be analyzed, computed and viewed all at once.
We then worked with Dr. Chartier to come up with ways to uncover predictive patterns using this completely new relationship inference engine.
With enough data and a way to visualize it all simultaneously, OPTIMUS helped in identifying the common attributes of upset teams that have emerged since 2002.
THE FOUR POINTER
We found things like – seed, conference, number of tournament wins, bid (or ‘berth’) type, tempo, offensive efficiency, defensive efficiency, and score differential – are key factors in identifying an upset team. While not revolutionary, the ability to pivot against (see the interplay of) all those factors at the SAME time is where OPTIMUS transcends to the magical beast it really is.
OPTIMUS powered analysis started revealing following examples to picking upset teams:
- Teams in smaller conferences that have important out-of-conference wins, or close losses
- Teams that had a win and a loss to the same opponent within one season
- Teams that struggled against less competitive opponents, but won against higher ranked teams
- Low seeded teams with strengths higher seeded teams lacked
With these ideas, now confirmed, we started constructing more complex algorithms to predict which types of teams could be indicative of bracket upsets. Here’s how OPTIMUS does it:
- “OPTIMUS, Show me all the upset teams since 2002.”
Out of 4,751 teams since 2002, this is only a handful. Here, color indicates the number of tournament wins with green being the most wins and red being only 1 win. We see immediately that there were only 2 teams that won 4 games in the March Madness Tournament: VCU in 2011 and George Mason in 2006. Similarly, there were only 4 teams that won 3 games in the tournament: Kent in 2002, Dayton in 2014, Missouri in 2002, and Davidson in 2008. How can we further interrogate these “Cinderella” teams to, of course, prevent complete bracket destruction? Let’s look at the tournament games these “true upset teams” won.
- “OPTIMUS, Show me the games true upset teams won.”
Immediately we see the tournament networks of the upset teams in question: Dayton, George Mason, Davidson, VCU, Kent, and Missouri. Since we are now interested in specific games, ORION edges include “score-differential” labels, showing us whether the games were massive blowouts or last second nail biters. By looking at the individual tournament match ups, we can determine the type of teams that our “Cinderella” teams were capable of defeating. Did their opponents struggle defensively? Was the tempo of the game too fast or too slow? Could the amount of experience or height of the players determine their fates?
Maybe, for teams from historically smaller, less-competitive conferences, such as Dayton and Davidson, it would be more revealing to look at the in-season games they lost.
- “OPTIMUS, Show me all the games Davidson lost in the 2008 season.”
Davidson had 6 in season losses (indicated by the incoming edges): West Michigan, Charlotte, UCLA, NC State, Duke, and UNC. Also, we notice that none of these games were true blowouts. Davidson only had one loss greater than 10 points and 3 of them were to historically better teams in better conferences (UCLA, Duke, and UNC). Can we really judge a 10 seed based without considering close in-season losses like this? Not to mention this was the year of Steph Curry…
Taking this further, lets consider a new type of Cinderella team— teams that both won against and lost to the same opponent.
- “OPTIMUS, show me Nevada’s 2004 in season games.”
Nevada lost to 8 teams: Connecticut, Hawaii, Boise St, Rice, Portland, UTEP, SMU, and Pacific. However, they re-played 6 of these teams… and won. This type of “Cinderella” team has the potential to win; they are capable of doing so. Is Nevada deserving of the 10-seed based on losses they have proven aren’t reflective of their ability?
This brings up Nevada’s sister “Cinderella” team—the team who thrives against better teams but struggle against weaker ones.
- “OPTIMUS, Show me VCU’s 2011 tournament games.”
Now, the node color represents tournament seed with green being the largest and red being the smallest. In the 2011 tournament, VCU (11) beat a Kansas (1), Purdue (2), and Georgetown (6) by at least 10 points. However, in the Sweet Sixteen they beat Florida State (10) by only 1 point in overtime. Is there something about VCU that explains this?
- “OPTIMUS, Show me how to survive March Madness…”
The at-scale, real-time analysis made possible by OPTIMUS makes it clear that “Cinderella” teams are so much more than just a seed, in a good or bad conference with one star player.
Collecting, curating, and analyzing every piece of data that defines a team is the only way to beating all brackets and maybe, just maybe, winning Warren Buffet’s challenge!
AND…THE ONLY WAY OF KNOWING IF THE GLASS SLIPPER REALLY FITS…