Dataset Analysis: Olympics (COMM-202 Final Project)

The next Summer Olympics are coming up later this year. So, what can we learn from the past? In this post we will be looking at data from the 2020 Olympic Games, which were held in 2021 due to Covid, to make some predictions about the upcoming event. The dataset used includes information such as athlete names and birthdates, participating countries, and medals won. Using this information, we will evaluate the 2020 Olympics and make predictions about the upcoming games. You can further explore the data using this link.

First, we created a bubble chart to easily show the popularity of the different events. The Olympics are made up of many different sporting events, but as one can see, some have many more competitors than others – for example, the bubble for Athletics (which had over 2,000 athletes) is much larger than that of Cycling BMX Freestyle (which only had 32 athletes). If someone wanted to be an Olympian as easily as possible, they might try for one of the less popular sports (smaller bubbles), as their pure amount of competition would likely be lower.  

Second, we decided it would be beneficial to see which countries have the most medals. We created a bar graph of the ten countries that have the most medals overall. This means that gold, silver, and bronze medals were counted when creating this graph. It is possible, but unlikely, that a country has more gold medals than one on this graph but is not included due to the count of their overall medals. As seen on the graph, the countries with the most medals have relatively high populations. Each country’s total medals are also roughly split into thirds between the three medal types. For example, the USA has a total of 113 medals. If we divide the USA’s total number of medals by 3 (the amount of medal types) we get 37.67, a number close to the amount of each medal type. Because of these close numbers we can conclude that each medal type makes up roughly a third of the USA’s total medals won in these Olympics.  

Next, we created a color-coded map slider on the left to highlight the amount of medals each country had won relative to the others. In the map, countries with larger amounts of medal wins are shown in darker blue. From this, we can observe that larger countries tended to do better, as would be expected. One exception might be Australia, which has a significantly smaller population than the other countries that are shown in darker blue (CITE). We also see that India, one of the world’s most populous countries, won even less medals than smaller European nations. The darker-shaded countries most likely have not only a larger amount of athletes (due to their population), but also a larger amount of resources, coaching, etc. available to them. This hypothesis could be a reason for why Australia and India are outliers. The map on the right shows specifically the relative amount of gold medals each country won. Here, we see that most countries lose shading, whereas China and the U.S. are extremely bold, having a larger share of the gold medals.  

To further research the correlation between population and medals we made a pictogram of how many athletes each country had in the Olympics. This visualization is restricted to the ten countries with the most total medals, the same group seen in the bar graph displaying medals per country. It has been established that countries with a higher population generally have more athletes going to the Olympics, but how do the numbers of athletes compare to the medals won? The USA, which had the highest number of total medals, also has the highest amount of athletes in the Olympics at 634. The country with the second most amount of athletes is Japan, but they are fifth in total medals. The country with the second most amount of medals, China, has the fourth most amount of athletes in these Olympics. So, the countries with more athletes generally win more medals but we cannot predict their exact placement. 

We were also curious whether countries that tended to have more athletes tended to actually do better in the Olympics. That is, did they tend to win proportionately more medals? As shown on the previous maps, we know that countries like the US, China, and Britain tended to win more medals; they also had some of the highest numbers of athletes participating in the Games. Since which we had already hypothesized that the large Olympic participation/success could be contributed to the availability of better training, rather than just the availability of athletes, we wondered if there would be a correlation between how many athletes a country had and its ratio of medals won to total competitors. However, as this scatterplot shows, it turns out that there is no significant relationship between these two statistics. In fact, countries with less athletes actually tended to have a higher ratio simply because each athlete who won a medal counted more. For instance, the highest ratios were from countries that only sent 2-5 athletes and happened to have one or more medal wins. Overall, it appears that the amount of medals a country wins is about directly proportional to the number of athletes, but not the medals-to-athletes ratio. 

Another factor that might influence someone’s chance of being a successful Olympian is age. Because athletes are usually younger, we wondered at what age most of them competed at the Olympic level. In this first visualization, we created a series of box plots to analyze the age of athletes from the top 10 countries. As can be seen, most are in the 20s range. The Netherlands had a slightly higher average age, whereas Japan’s was slightly lower. Australia, notably, also had several older outliers. Based off of this chart, if someone wanted to train for the Olympics, and they were from or moving to a top 10 country, they would likely want to be ready for whatever Olympics will occur during their mid-20s. But, if they are a little older, they might stand more of a chance if they lived in a country like the Netherlands or Australia. 

To expand upon this evaluation, we created a treemap initially grouped by age and further divided into male and female. This treemap includes every age that competed in these Olympics, the oldest being 67 and the youngest being 12. Some athletes did not have a birth date in this dataset, their data is not included in this visualization. This data is consistent with that shown in the box plots. The most common ages competing are in their 20’s. The most common athlete age overall is 24, with other ages in their 20’s close behind. As for the male and female split between athletes, it appears to be fairly equal. This is most likely due to the fact that most disciplines in the Olympics are split into male and female categories for competition. There are a few sports that do not make this distinction, like equestrian, but it is so few that they have little effect on the overall split of male and female athletes. 

Our evaluation of the 2020 Olympics data gives us insight as to what predictions can be made for the upcoming 2024 Olympics. We can anticipate the US to have a large amount of both athletes and medals (maybe even the largest), as well as China and Russia. We can also predict that athletes who are in their mid-20s might have the most success. While we do not know what the upcoming Games will hold, this data analysis provides a starting point for what we might expect to see this summer.