ATP Tennis Record Analysis
Use data from https://github.com/JeffSackmann/tennis_atp to analyze ATP players’ records.
Perform all data analysis using pandas.
Tasks
1. Using the GitHub Repository
- Clone the repository from GitHub.
- Carefully read the documentation in the repository to understand the meaning of each dataset and field.
2. Analyze atp_players.csv
- What columns (fields) are included in this file?
- How many total players are listed?
- Among all players, how many have incomplete
dob (date of birth) information?
- Group players by
ioc (nationality). Which countries have the most players?
- How many players are from TPE and TWN respectively?
- Among the players from TPE, what surnames are included?
3. Analyze atp_matches_202[1-4].csv
- How many matches have TPE players won and lost?
- How many distinct tournament levels (
tourney_level) are there?
- In how many matches is the winner’s height greater than / equal to / less than the loser’s height?
- For each
tourney_level, analyze the relationship between height and match outcome.
- Round player ages (
.round()) and:
- Use
value_counts() to count the ages of winners.
- Compute win rates by player age.
- List the top 10 players with the most wins.
- List the top 10 players with the highest win rates.
- Create a table showing the number of matches between players of different ages:
- The index should be the winner’s age, and the columns should be the loser’s age.
- Each cell should represent the number of matches played.
- Only consider players aged 19–28.
- Create another table showing the win rate between players of different ages:
- Index = player’s age; columns = opponent’s age; values = win rate.
- Identify which age groups have win rates above 49% against every age group (within ages 19–28).
4. Your Own Questions