#fancystats: Where do we go from here?

July 18, 2014, by


I want to preface this article with the fact that I am not a mathematician or statistician. I’m a lawyer. In fact, they lied to us in law school and told us we wouldn’t have to do math once we were out practicing. So even if you love my ideas, I have no real skill set to design or implement them. This is purely for conceptual discussion purposes.

Ok, with that out of the way, I wanted to talk about #fancystats for a minute. It’s becoming clear that organizations around the league are starting to recognize the usefulness and momentum that these types of statistics have, evidenced by more and more front offices disclosing their emphasis on integrating them into their management processes.

However, I think we can all agree that the concepts and statistical methodologies are rudimentary at best at this point. It’s also completely understandable. Baseball has led the way in the revolution of statistical analyses, but it has a massive advantage on all other sports: each play happens in a vacuum, and at most there are 2-4 players involved in any given play. This level of isolation makes it incredibly convenient to look at individual performance within that play and assign value to it. The causal relationship between each player on the field is limited, and unlike hockey, plays happening minutes prior have very little bearing on what you are measuring.

Facing this difficulty, logical and relevant concepts have been the focal point of hockey #fancystats, such as possession. Focusing on possession makes perfect sense on it’s face; if you have the puck and are shooting at the opposing goal, the other team cannot by definition be doing the same. If they aren’t possessing and shooting, they certainly are not scoring, which is a good thing.

An entire paradigm of statistical analyses have cropped up around this concept, Corsi, Fenwick, QualComp, Zone Starts, etc. When you put them together, it begins to paint a picture of what is going on when an individual player is on the ice. They are complicated calculations/relationships built around fairly simplistic concepts that make logical sense.

So, in contemplating where #fancystats might go from here, I started thinking about inverse relationships and context (welcome to my brain). Certain stats will find themselves fall out of relevance (PDO is already under attack), some will gain context and refinement, and others will emerge as useful tools. Conceptually speaking, it begs certain questions of weighted value and the value of efficiency.

What I mean by that is, sure, when you possess the puck, it can only be a good thing. It’s a good skill to have. Now, is it time to look deeper into the context of that possession? How do you measure and how much value would you place on counter attacking skills? Neutral zone turnovers leading to offensive chances? For defensemen, efficiency of the first pass in exiting the zone? Rate in which a forechecker either creates a turnover or forces an extended shift for the defensive unit, even if they aren’t possessing the puck? For that matter, true possession not related to shot count?

In addition to the difficulty of quantifying these concepts, how do you assign value to them? Can the central possession calculation itself provide more context than just shots for and against? I believe that the answers to these questions are the cocoon that hockey statistical analysis must climb out of in order to really be able to quantify this game in a truly meaningful way. There are plenty of guys out there more mathematically adept than I who I am confident can answer the bell.

Now, don’t even get me started on goaltender analysis. I actually read an article the other day advocating for the demise of GAA. I was down with this premise, so I read on. Then I realized he was advocating for it because it doesn’t tell us anything that save percentage doesn’t. He described GAA as save percentage, plus noise. Not to rag the author here, he is at least attempting a move in the right direction, but all save percentage is, is the most basic of mathematical calculations, plus noise.

We have talked around here about quality shot save percentage and other various ways forward for goalie analysis, but I’m coming to the belief that until we are able to measure positional efficacy and use it as the foundation of a quality-shot type analysis, we will never really have a meaningful statistical look at the position.

So, that’s what I think. How about you? You hear that, sports mathematicians? Go make this happen, and you will have a much more educated and well-analyzed game. Plus it will piss idiot beat writers off who don’t want to have to use their brains. Sounds like a plan to me.

"#fancystats: Where do we go from here?", 5 out of 5 based on 2 ratings.


  1. SalMerc says:

    As an avid Fantasy baseball enthusiast, I can agree that sabermetrics are many steps ahead of fancystats. For those of you who do not know what they are, here is the definition: Sabermetrics is the term for the empirical analysis of baseball, especially baseball statistics that measure in-game activity.

    That being said, the analysis of baseball is easier as the individuals can have tasks “measured” more readily that hockey. My suspicion is that fancy stats should morph into more of a group stat than an individual stat. A certain “pair of defenseman” have a defensive quotient lower than the average. The right wings on a team may avg more PPG than the left wings. Why? That is where I see some detailed investigation and new stats to emerge. A team with primarily lefty centers should provide RW’s with better passes. Does the stat prove it? Can we measure the effectiveness of centers on a team, when grouped with A, B or C. A being grinders, B being speed guys and C being sharpshooters.

    I can see these types of stats emerging, allowing GMs to assess the make-up of their team and do allow them to pinpoint the “type” of (center or RW) they need to push the PPG upwards.

    Just some ideas…

  2. Dave says:

    QoC is under attack as well. QoT has been proven to have the larger effect on performance. QoC more or less gives us a look at deployment.

    I’m at work, so I can’t access my bookmarks at the moment, but there are some other stats that are being developed:

    dCorsi: Expected vs actual Corsi, based on QoT, QoC, and ZS

    Adjusted SV% (I think that’s what it’s called): A more advanced SV% stat that looks at expected vs actual based on PP time, PK time, and team quality.

    Zone Adjusted Corsi/Fenwick: Adjusted for zone starts.

    dCorsi has some kinks to work out, but I think that will eventually be our best method of evaluating how a player is performing against expectations based on deployment and quality of minutes.

    There’s a long way to go here, but that’s because we don’t have access to a lot of information. Baseball makes every single play available online. Hockey only makes shot attempts, teammates/opponents on ice, and a few others available. The ones creating these stats are using what they have available to the best of their abilities.

    The true “surge” will come when more information is made available.

    • Seahorse says:

      Keeping the necessary stats is the real revolution. Making the algoritims is the easy part. Soccer can track every players distance and location and pass percentage. And tennis knows if the ball is in or out. The technology is there its about want and the market needs to be there to provide a service

  3. Rob says:

    Is there an encyclopedia of the definition of all the terms for a beginner?

  4. Puck Luck @Centerman21 says:

    I read that article on the way GAA should not be used as a stat for goaltenders. I won’t get into where it came from but it’s all dribble. As it stands We don’t have much statistical analysis on a goaltenders production. A goalie can have good .930 S% yet a 2.80 GAA which isn’t very good. This shows the team allows a lot of shots against.
    How about that regressed PDO? Not sure I agree with the name but it’s a tool none the less. I think it could be called rel PDO or PDO rel QoT. I like what it shows.

    • Dave says:

      GAA is largely team based, much like ERA in baseball. It’s useful, but not a telling stat.

      Haven’t done much research on regressed PDO yet. Will likely get into researching more available stats when the season gets closer.

  5. MikeD says:

    Until accurate telmatics are installed at virtually all arenas, #fancystats will be very limited in hockey. http://regressing.deadspin.com/mlb-announces-revolutionary-new-fielding-tracking-syste-1534200504
    This type of real time accurate tracking is what will be necessary in order for #fancystats to be meaningful. Hockey simply moves too fast and has too many simultaneously moving parts for anyone to track the sport in a meaningful way without computers and sensors.

  6. Ray says:

    I am quite familiar with sabremetrics and have been doing this stuff for many many years. And I am actually a mathematician. I have a question and also an expression of a clear need. The need is simple. The stats need to be put together. Poor Corsi but high QoC and lots of DZ starts versus good Corsi with poor QoC and lots of OZ starts. One has to be able to compare these players.

    The question is whether or not Corsi, Fenwick actually work. I want to compare 5 vs 5 Corsi to +/-. Most of the defects of these two metrics are the same – zone starts and how good the other players on the ice are. There are two differences that I can see. Corsi uses more data and therefore is less plagued by small sample size. Put simply, Corsi does a better job of measuring what it is trying to measure than +/- does. +/- also has an advantage; it is actually measuring what we really care about, not something that seems important.

    In baseball, ERA is more valued than W-L record. Why? There have been actual studies. If you want to predict a starting pitcher’s W-L record in 2015, ERA in 2014 does a better job than W-L in 2014.

    So, what does a better job of predicting +/- next season – this season’s Corsi or this season’s +/-?
    Of course, you have to correct +/- for icetime, but that is a simple matter. As far as I can see, the stat you want is the one that best predicts future +/-.

    • Dave says:

      Relating to your need, that’s what dCorsi is trying to do. Put all of that together into expected Corsi vs actual.

      In regards to whether or not they work: They do. There are studies done that show puck possession directly leads to playoff teams (I think only one team under 50% Corsi made the playoffs in the BTN era).

      In 5v5 close situations, you usually see teams in the top-10 make runs in the playoffs. It’s why the Rangers were such a dark horse this year, and why I picked them to make it to the SCF.

      • Ray says:

        Of course, +/- correlates even better with making the playoffs I would imagine. I can’t imagine a stat which predicts how much a team should score can beat on that counts goals.

        Here, however, is a very good test. Which stat correlates best with playoff success – in season +/- or in season Corsi. I.e., does the number of goals we think a team should have scored at the earlier time predict playoff success better than the number they actually scored?

        This is a tough test for Corsi though as basing things on an entire team cuts down the sample size error and disproportionately benefits +/-. If Corsi wins this test over several seasons, it is a clear winner.

        Thanks for the note on dCorsi.

        • Dave says:

          +/- is a relatively useless stat. I don’t even bother looking at it.

          • Ray says:

            A defenseman pinches four times, twice it leads to a shot on goal, once nothing happens, and the fourth results in an odd man rush the other way. Corsi says he’s a good player, +/- says he’s a bad player because one odd man rush is more important than two average shots.

            I think you are overreacting. Some people wanted to believe that +/- was the holy grail and it is not. However, there is a difference between being imperfect and being useless.

            Baseball stats are more advanced and what holds there is that wins follow from runs. It seems likely that hockey wins come from goals. And that is precisely what +/- measures.

            The statement that Corsi is better than +/- because Corsi measures puck possession and +/- is useless is not scientific. I’m not saying that Corsi isn’t better than +/-, just saying that it needs to be demonstrated.

            • SalMerc says:

              Getting our heads out of the math books for a second, I ask you what measurement can tell a coach/GM who should play with who. Hockey, more than other sports are 3 + 2 working together. Putting 5 plusside guys on the ice does not directly make the team better or the score lopsided in your favor. Is there a “O-line” measurement or a “D-line” measurement?

              • Dave says:

                There’s no measurement for that. Like all stats, you can’t just look at the paper and put a team together. There’s always a blend of stats (quantitative) and “eye test” (qualitative) analysis.

            • Dave says:

              Not really overreaching, GMs have stated that they don’t look at +/- anymore (one of those Elliotte Freidman 30 Thoughts articles he does).

  7. Ray says:

    Concerning GAA and Save %, it is naive to presume that a goaltender has no influence on the number of shots. Justin, you can probably list a dozen ways that the tender can have an effect. Goalies prone to mishandle get more low percentage shots, some goalies (Brodeur, Price) you don’t want to allow to handle the puck and you shoot less, does the goalie give up rebounds, clear it nicely, or hold it, and on and on.

    GAA really does measure more than save percentage does. It is unfortunately more influenced by the tendency of the defense to give up many shots. it is not really influenced by whether the defense makes blunders leading to golden opportunities any more than save pct is.

  8. Bloomer says:

    Winning puck battles and creating turnovers is a measure of success for NHL teams. You don’t just not need good fast skating, skilled players but players who play strong along the boards. The Kings beat the Rangers in the Stanley Cup final because they won the battles in the corners and the neutral zone.

    • Dave says:

      No one is saying it’s not. As mentioned above in the comments, that kind of data isn’t available to us.

  9. mikeyyy says:

    I’m a data guy.

    I do data analysis, and in big data we do what’s called meta tagging the data. It provides more granularity to the data itself so that trends pop out more .

    The problem has always been garbage in. Garbage out

    We need the base measurements to show statistical significance. And have a level of quality that lends to performing data manipulation, grouping, and advanced algorithmic analysis.

    Unfortunately there isn’t enough of that, ie shot quality, passing accuracy, time to clear zone. The list goes on and on.

    The important thing is that there needs to be more categorization of the data to give it meaning. We haven’t yet discovered as the As did those drivers and what they effect.

    Once the data is at a quality that can expose certain statistical trends, then we can rely more on it. At this point while fancy stats lets you see things from a macro level, the micro level which lets us evaluate individuals is not there yet.

    To me its like plus minus. Nice stat to know but no high level of statistical signifigance can be proven.

  10. Snake says:

    One point that hasn’t been made by the other sabermetrics guys is that there is a big difference between descriptive and predictive stats. To be fair, I haven’t looked at FancyStats other then what I read here. That said, using stats to tell a story of what happened after the fact is far different that predicting the future.

    Hockey is a read and react game. FancyStats have some value, but I really don’t think they add much of anything that you don’t know just from watching the game. As pretty much everyone said, this ain’t baseball. 🙂