arutaki's personal site

Does Dota 2 DPC point matter?

I can say with 100% confidence that Dota 2 is my favorite game ever. I spent more than 2000 hours playing the game before I realized that not playing the game 8 hours nonstop do wonder to my perpetual back pain. Now that my primary computer is a Mac, I somehow feel bad about abusing my keyboard that stops me from PC gaming altogether. Despite all that, I still follow the pro scene religiously.

Dota 2 is not the biggest esports around. But it is notable for the annual The International (TI) which boast itself as esports tournament with the largest prize pool. In 2021, they awarded $40,018,400 for 90 players in 18 competing teams. If you look at the list of esports tournaments with highest prize pool, it is dominated by TI of various years eclipsing Fortnite which is a much bigger game from the size of its player base.

Even with such a large prize pool, many fans are concerned about the longevity of Dota 2 esports scene due to the players’ welfare. The nature of Dota 2 pro scene is a feast or famine one. The champion team earns around 45% of the prize pool compare to that 13% earned by the runner up. For other teams who are at the bottom rank in the tournament, they earn as low as 0.25% if they could not get pass through the group stage. It’s difficult to say whether the amount of money is sufficient or not as Dota 2 pro players came from different part of the worlds with different living standards, but we can all agree that there is such a different disparity even for those who compete at the most prestigious tournament in the field.

For the past 5 years, Valve as the developer and distributor of the game introduced Dota Pro Circuit (DPC). There have been two different systems of DPC, but essentially it is a way for teams to earn points and tournaments/leagues prize money throughout the year to gain DPC points. Teams that manage to acquire the most points in a year will get direct access to attend TI while others need to go for a lengthy qualifier to qualify. Teams that are more consistent throughout the year will generally have better chance to attend TI. Although this may vary due to the difference of region competitiveness and uneven DPC points awarded by achievements at the Majors (and Minors for the first two DPC).

Considering how significant doing well at TI is to the economy of the team, getting there in the first place is a top priority for most teams. Only the top team per region is guaranteed a place in TI through regional qualifiers. There is also wild card in some year but of course teams would not want to rely on it to get to TI. Therefore, it is pretty reasonable to assume that teams are trying their best to acquire DPC points as much as possible. But remember, getting to TI alone is not enough. They also need to be the top teams there to reap the most benefit of TI. It is natural to think that DPC points is a good proxy on how well a team would do at TI. After all, they have been doing well for the whole year, why not now?

To examine this question, I decided to plot teams’ TI achievement with respect to their earned DPC points. I take the data from when DPC started, with 4 TIs from 2018-2022 minus 2020 that was cancelled due to the pandemic. Since the data are not so many, I use Python’s request and Pandas’ get url command to manually extract the tables of DPC points from Liquipedia website. I also used Wikipedia to extract TI achievement because the get_url command does not read the TI table from Liquipedia (or maybe I am wrong? Let me know if you find differently).

In my mind, I wanted to have a single table where I have the name of the teams competing, the year they compete in, the DPC points they received, as well as their TI achievement. As any other data analysis project, there are necessary cleanings to do. Firstly, we limit the teams who compete in TIs. Many teams who competed in the DPC did not manage to go to TI, hence I omit them. Secondly, some organizations change name over the years. Just making sure that entries like ‘Thunder Predator’ and ‘Thunder Awaken’ refer to the same team. Also when Forward Gaming played in 2019 under the Newbee banner. Finally, there are also teams who did not earn any DPC points yet managed to go to TI in the older DPC system. In this case, I assigned them with 0 DPC points.

Once I got the table, the next step is to determine how to compare the two variables. One glaring issue about DPC and TI is that the system and the total prize change every year. Therefore, we are going to normalize them. The earned DPC points is normalized as “Relative Points”. A team’s relative points is calculated by dividing their earned DPC points divided by all the DPC points awarded to all teams that year. You can either portray TI achievement with ranks (1st, 2nd, etc.) or the prize wins. Due to tournament system, some teams will share the same rankings (and prize money) which I think add another problem of assigning the appropriate numerical value to the ranks for regression purpose. Therefore, I stick with using the fraction of winning allocated to them out of the total prize pool. I also use a simple linear regression to observe the relationship between them.

TI

Based on the simple plot above, the thing that catch my eyes first is how unevenly distributed the data points are. Most of the data points are clumped at the bottom left of the graph with a few of them spread sparsely in the bottom right or upper left of the graph. The uneven spread in the y-axis is easily explained due to the feast of famine nature of the TI winning distribution we talked earlier. The horizontal spread is more interesting. All data points with more than 0.1 relative points show that they are from TI 2018 and TI 2019. These two TIs use the older DPC system. And it seems that the system allows the good teams to dominate the DPC much more significantly compared to the new system.

Let us talk about the relationship then. The red line in the graph is the linear regression of these data points. I got a very small R2 = 0.0004. It shows roughly that the DPC points earned is not correlated with the team’s TI achievement. Interestingly, none of the teams that win TI ranks particularly high during the DPC circuit. The few DPC dominator data points even seem to bomb during TI. The most reasonable explanation would be teams that perform consistently throughout the year lose their competitive advantage during the “final exam”. Whether due to burn out or getting figured out by other teams. For whatever reason, I would still advise no one to not try hard to get into TI due to the risk of not joining the tournament at all.

One caveat in this analysis is that, maybe linear regression is not the right tools in this case. Considering how TI winner earns astronomically larger than most other participants, an “outlier” data is a feature in TI. I considered to do linear regression on the logarithmic value of their earnings. But then it will be meaningless to compare achievements in different years for the value is not normalized. Let me know if you have better suggestion. It is very likely that I have not done my research enough on proper statistical analysis.

The code and the notebook is available at my Github and Kaggle. You can see them in my “About” section.