Python数据分析基础
- Preparation
- Exercise 1-GroupBy
-
-
- Introduction:
- Step 1. Import the necessary libraries
- Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv).
- Step 3. Assign it to a variable called drinks.
- Step 4. Which continent drinks more beer on average?
- Step 5. For each continent print the statistics for wine consumption.
- Step 6. Print the mean alcoohol consumption per continent for every column
- Step 7. Print the median alcoohol consumption per continent for every column
- Step 8. Print the mean, min and max values for spirit consumption.
-
- Exercise 2-Occupation
-
-
- Introduction:
- Step 1. Import the necessary libraries
- Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/u.user).
- Step 3. Assign it to a variable called users.
- Step 4. Discover what is the mean age per occupation
- Step 5. Discover the Male ratio per occupation and sort it from the most to the least
- Step 6. For each occupation, calculate the minimum and maximum ages
- Step 7. For each combination of occupation and gender, calculate the mean age
- Step 8. For each occupation present the percentage of women and men
-
- Exercise 3-Regiment
-
-
- Introduction:
- Step 1. Import the necessary libraries
- Step 2. Create the DataFrame with the following values:
- Step 3. Assign it to a variable called regiment.
- Step 4. What is the mean preTestScore from the regiment Nighthawks?
- Step 5. Present general statistics by company
- Step 6. What is the mean each company's preTestScore?
- Step 7. Present the mean preTestScores grouped by regiment and company
- Step 8. Present the mean preTestScores grouped by regiment and company without heirarchical indexing
- Step 9. Group the entire dataframe by regiment and company
- Step 10. What is the number of observations in each regiment and company
- Step 11. Iterate over a group and print the name and the whole data from the regiment
-
- Conclusion
Preparation
下面是练习题的数据集,尽量下载下来使用。下面习题的连接不一定能打开。
https://github.com/justmarkham/pandas-videos/tree/master/data
Exercise 1-GroupBy
Introduction:
GroupBy can be summarizes as Split-Apply-Combine.
Step 1. Import the necessary libraries
代码如下:
import pandas as pd
Step 2. Import the dataset from this address.
Step 3. Assign it to a variable called drinks.
代码如下:
drinks = pd.read_csv('drinks.csv', ',')
drinks
输出结果如下:
country | beer_servings | spirit_servings | wine_servings | total_litres_of_pure_alcohol | continent | |
---|---|---|---|---|---|---|
0 | Afghanistan | 0 | 0 | 0 | 0.0 | AS |
1 | Albania | 89 | 132 | 54 | 4.9 | EU |
2 | Algeria | 25 | 0 | 14 | 0.7 | AF |
3 | Andorra | 245 | 138 | 312 | 12.4 | EU |
4 | Angola | 217 | 57 | 45 | 5.9 | AF |
5 | Antigua & Barbuda | 102 | 128 | 45 | 4.9 | NaN |
6 | Argentina | 193 | 25 | 221 | 8.3 | SA |
7 | Armenia | 21 | 179 | 11 | 3.8 | EU |
8 | Australia | 261 | 72 | 212 | 10.4 | OC |
9 | Austria | 279 | 75 | 191 | 9.7 | EU |
10 | Azerbaijan | 21 | 46 | 5 | 1.3 | EU |
11 | Bahamas | 122 | 176 | 51 | 6.3 | NaN |
12 | Bahrain | 42 | 63 | 7 | 2.0 | AS |
13 | Bangladesh | 0 | 0 | 0 | 0.0 | AS |
14 | Barbados | 143 | 173 | 36 | 6.3 | NaN |
15 | Belarus | 142 | 373 | 42 | 14.4 | EU |
16 | Belgium | 295 | 84 | 212 | 10.5 | EU |
17 | Belize | 263 | 114 | 8 | 6.8 | NaN |
18 | Benin | 34 | 4 | 13 | 1.1 | AF |
19 | Bhutan | 23 | 0 | 0 | 0.4 | AS |
20 | Bolivia | 167 | 41 | 8 | 3.8 | SA |
21 | Bosnia-Herzegovina | 76 | 173 | 8 | 4.6 | EU |
22 | Botswana | 173 | 35 | 35 | 5.4 | AF |
23 | Brazil | 245 | 145 | 16 | 7.2 | SA |
24 | Brunei | 31 | 2 | 1 | 0.6 | AS |
25 | Bulgaria | 231 | 252 | 94 | 10.3 | EU |
26 | Burkina Faso | 25 | 7 | 7 | 4.3 | AF |
27 | Burundi | 88 | 0 | 0 | 6.3 | AF |
28 | Cote d'Ivoire | 37 | 1 | 7 | 4.0 | AF |
29 | Cabo Verde | 144 | 56 | 16 | 4.0 | AF |
... | ... | ... | ... | ... | ... | ... |
163 | Suriname | 128 | 178 | 7 | 5.6 | SA |
164 | Swaziland | 90 | 2 | 2 | 4.7 | AF |
165 | Sweden | 152 | 60 | 186 | 7.2 | EU |
166 | Switzerland | 185 | 100 | 280 | 10.2 | EU |
167 | Syria | 5 | 35 | 16 | 1.0 | AS |
168 | Tajikistan | 2 | 15 | 0 | 0.3 | AS |
169 | Thailand | 99 | 258 | 1 | 6.4 | AS |
170 | Macedonia | 106 | 27 | 86 | 3.9 | EU |
171 | Timor-Leste | 1 | 1 | 4 | 0.1 | AS |
172 | Togo | 36 | 2 | 19 | 1.3 | AF |
173 | Tonga | 36 | 21 | 5 | 1.1 | OC |
174 | Trinidad & Tobago | 197 | 156 | 7 | 6.4 | NaN |
175 | Tunisia | 51 | 3 | 20 | 1.3 | AF |
176 | Turkey | 51 | 22 | 7 | 1.4 | AS |
177 | Turkmenistan | 19 | 71 | 32 | 2.2 | AS |
178 | Tuvalu | 6 | 41 | 9 | 1.0 | OC |
179 | Uganda | 45 | 9 | 0 | 8.3 | AF |
180 | Ukraine | 206 | 237 | 45 | 8.9 | EU |
181 | United Arab Emirates | 16 | 135 | 5 | 2.8 | AS |
182 | United Kingdom | 219 | 126 | 195 | 10.4 | EU |
183 | Tanzania | 36 | 6 | 1 | 5.7 | AF |
184 | USA | 249 | 158 | 84 | 8.7 | NaN |
185 | Uruguay | 115 | 35 | 220 | 6.6 | SA |
186 | Uzbekistan | 25 | 101 | 8 | 2.4 | AS |
187 | Vanuatu | 21 | 18 | 11 | 0.9 | OC |
188 | Venezuela | 333 | 100 | 3 | 7.7 | SA |
189 | Vietnam | 111 | 2 | 1 | 2.0 | AS |
190 | Yemen | 6 | 0 | 0 | 0.1 | AS |
191 | Zambia | 32 | 19 | 4 | 2.5 | AF |
192 | Zimbabwe | 64 | 18 | 4 | 4.7 | AF |
193 rows × 6 columns
Step 4. Which continent drinks more beer on average?
代码如下:
drinks.groupby('continent').beer_servings.mean()
输出结果如下:
continent
AF 61.471698
AS 37.045455
EU 193.777778
OC 89.687500
SA 175.083333
Name: beer_servings, dtype: float64
Step 5. For each continent print the statistics for wine consumption.
代码如下:
drinks.groupby('continent').wine_servings.describe()
输出结果如下:
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
continent | ||||||||
AF | 53.0 | 16.264151 | 38.846419 | 0.0 | 1.0 | 2.0 | 13.00 | 233.0 |
AS | 44.0 | 9.068182 | 21.667034 | 0.0 | 0.0 | 1.0 | 8.00 | 123.0 |
EU | 45.0 | 142.222222 | 97.421738 | 0.0 | 59.0 | 128.0 | 195.00 | 370.0 |
OC | 16.0 | 35.625000 | 64.555790 | 0.0 | 1.0 | 8.5 | 23.25 | 212.0 |
SA | 12.0 | 62.416667 | 88.620189 | 1.0 | 3.0 | 12.0 | 98.50 | 221.0 |
Step 6. Print the mean alcoohol consumption per continent for every column
代码如下:
drinks.groupby('continent').mean()
输出结果如下:
beer_servings | spirit_servings | wine_servings | total_litres_of_pure_alcohol | |
---|---|---|---|---|
continent | ||||
AF | 61.471698 | 16.339623 | 16.264151 | 3.007547 |
AS | 37.045455 | 60.840909 | 9.068182 | 2.170455 |
EU | 193.777778 | 132.555556 | 142.222222 | 8.617778 |
OC | 89.687500 | 58.437500 | 35.625000 | 3.381250 |
SA | 175.083333 | 114.750000 | 62.416667 | 6.308333 |