Live Games Have Evolving Performance
Running a live social game can be a lot like playing a game of Jenga; the more moves you make the higher the chance the whole thing may crash down. Earlier this year, noting an increase in user complaints, we realized that the web game I work was threatening to collapse.
Years ago, we had implemented a company standard performance tracking system. It generated an immense amount of logging, our numbers looked reasonable, and we had scarier fish to deal with, so we stopped monitoring performance closely. This was a huge mistake, and we had to act quickly to remedy it.
Once we took a deeper look into performance we were in for a rude awakening. Even though the game ran well for us in our development environment, it was actually running like a dog in the real world. The numbers we saw were significantly worse than expected.
Using profiling tools we discovered that while the game ran fine before our ‘sale’ page was displayed, it ran poorly any time after it was displayed.
The problem was actually two bugs combined with a user flow issue.
The first bug was that the sale page had been built in a horribly inefficient manner. The second bug was worse. Due to a flaw in our asset loader, we were never removing dialogs from memory after we closed them, so they just sat around using up resources.
And the user flow turned it from bad to terrible. When we run sales, one of the first things we do is show the sale page to the player, and since the bugs combined to negatively impact performance after the dialog was shown, we were effectively killing performance for the entire play session whenever we were running sales!
Once we understood the problem it was not difficult to resolve. We fixed the loader issue, and optimized the sale page. But the longer term effect on our game is not from those code fixes, but the mental reset it gave us:
1) Performance is never a solved problem. I don’t care how great your game ran when you launched it. As you add features and update content, you will need to fix and refocus on performance.
2) Any Key Performance Indicator that isn’t actively monitored is degrading. Ok, maybe not always. But you have to assume they are unless you can actively prove they aren’t.
3) Automatic tracking and alert systems are essential. If you rely on manually verified numbers you will miss the moment that they shift.
4) It never makes sense until it does. Don’t dismiss data points that make no sense, but keep coming up. You might just be looking at them wrong.
Previous Performance Tracking Dashboard
1) Whole session average performance
2) Rolling 30 second average frame rate
3) Percentile breakdowns to identify outliers
4) Session time variance breakdowns to identify performance degradation over time
5) Game sub section specifics to identify troubled areas of the game
6) A/B performance comparison capability for new content roll outs
7) Various game state comparisons such as in or out of fullscreen mode
Updated Performance Tracking Dashboard
Your player’s experience is the lifeblood of your product, and you should treat it with the same level of caution that you do your personal finances. Check it regularly and assume it is broken unless you can actively prove otherwise.