项目分析数据:https://www.kaggle.com/osmi/mental-health-in-tech-survey,这是有关科技工作者心理健康数据的分析项目,数据是CSV格式的。数据格式部分如下:
需求:统计各个国家存在的心理健康问题的男女人数
代码:
import numpy as np import pandas as pd #open csv read data as dataframe df=pd.read_csv('./survey.csv') #create a new column(count) to record the number of gender df['count']=0 #select people who have mental health problem df=df[df['mental_health_consequence']=='Yes'] #discard the case-sensitive df['Gender']=df['Gender'].str.lower() #select data where gender='male' or gender='female' df2=df[(df['Gender']=='male' )|( df['Gender']=='female')] #select count of different genders from different countries group=df2['count'].groupby([df2['Country'],df2['Gender']]).count() print(group)
输出结果如下:
Country Gender
Australia female 3
male 5
Belgium male 2
Bulgaria male 1
Canada female 3
male 11
Colombia male 1
Croatia male 1
Finland male 2
France male 1
Georgia male 1
Germany female 1
male 4
Greece male 2
Hungary female 1
India female 1
male 2
Ireland female 1
male 10
Israel male 1
Italy male 1
Japan male 1
Netherlands male 3
New Zealand male 2
Philippines male 1
Portugal male 1
Russia male 1
Singapore male 1
Slovenia male 1
South Africa male 1
Spain female 1
Switzerland male 1
United Kingdom female 5
male 33
United States female 33
male 86
需求2:统计各个国家存在的心理健康问题的平均年龄
df3=df[(df['Age']>0)&(df['Age']<100)] group2=df3['Age'].groupby(df3['Country']).mean() print(group2)
输出2:
Country
Australia 31.500000
Bahamas, The 8.000000
Belgium 30.000000
Bulgaria 26.000000
Canada 29.875000
Colombia 26.000000
Croatia 43.000000
Finland 27.000000
France 26.000000
Georgia 20.000000
Germany 32.000000
Greece 36.500000
Hungary 27.000000
India 24.000000
Ireland 35.272727
Israel 27.000000
Italy 37.000000
Japan 49.000000
Netherlands 33.000000
New Zealand 36.750000
Philippines 31.000000
Portugal 27.000000
Russia 28.000000
Singapore 39.000000
Slovenia 19.000000
South Africa 61.000000
Spain 30.000000
Switzerland 30.000000
United Kingdom 31.571429
United States 33.582353