Unlike many other parts of the entertainment industry, PC gaming is remarkably opaque when it comes to sales data. Valve Software’s Steam marketplace represents a significant portion of all PC game sales, but historically that data has been difficult or impossible to see and even more difficult to independently verify.
For the briefest of moments this week, we had a fabulously clear snapshot of the number of games Steam has actually sold. We still have the image, but it’s the last one we’re likely to get any time soon. Just a few days after a window into their organization was opened, Valve slammed it shut again.
Here’s how it all went down.
The story begins in 2014 when Ars Technica’s Kyle Orland performed an experiment. He sampled the stream of publicly available, vaguely anonymized data that Valve sent out about its millions of users. He called it the “Steam Gauge” and wrote about his methodology in a lengthy feature article.
Blogger and podcaster Sergey Galyonkin, who now works for Epic Games, read Orland’s story and set about building a website to make that data available to everyone. It was called Steam Spy, and Polygon wrote about it first in 2015. For roughly three years, the public had access to a steady stream of estimated player counts and sales histories for every single game on Steam. Galyonkin himself became a kind of minor celebrity, even delivering a talk at the Game Developers Conference about what he’d learned by parsing the data.
In April of this year, all that changed. Valve announced that it had made changes to how it broadcast user data. Many conjectured that the decision was made in order to comply with the European Union’s General Protection Data Requirements (GDPR) that became enforceable in May of this year. Regardless of their reasoning, it made Galyonkin’s service far less accurate than it had been in the past.
In response, an enterprising programmer and game designer named Tyler Glaiel proposed a new method. With the stream of user data that Valve had traditionally shared turned off, he surmised that you could instead use a game’s achievement data to do roughly the same thing. If a given percentage of owners of a given game had unlocked a particular achievement, you could work backward and uncover the number of people who owned the game.
On Medium, Glaiel explained his process:
This was brought up with a dev group I’m in, and it was quickly pointed out that if you get achievement data through steam’s API, you get 16 digits of precision instead! I set out to try and replicate barter.vg’s algorithm based on the description of it on their site, “Calculated by finding the lowest number of player that produces whole numbers of players for each achievement (percent achieved * all players)”.
So I got it working, with a simple brute force. Checked every possible whole number of sales up to a cap, and multiplied it by the achievement percentages. None of them exactly hit a whole number, so I had to set a threshold for what counts as a “whole number”. It worked for most games with less than a million sales.
With further refinement, Glaiel was able to provide data that was more accurate than Steam Spy had ever been. So, Galyonkin spun up his servers and got to work using that new method. He even worked with Ars Technica to check his work. The result is a massive CSV file containing sales data for every game on Steam that offers its users achievements.
But that’s the last time that the public will be able to use that particular exploit. Just a few days ago, Valve started rounding achievement data to the nearest whole number. Without data with 16 digits of precision out past the decimal point, Glaiel’s method falls apart.
Aaaaand Valve killed the achievement user numbers trick faster than you can say “GDPR was never the issue with SteamSpy” https://t.co/bHRXTgxRuq
— Rami Ismail (@tha_rami) July 4, 2018
For a list of the 1,000 best-selling games on Steam as of this week, generated by Galyonkin and using Glaiel’s novel method, head over to Ars Technica.