大学生恋爱情况分析

## 01数据导入
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.read_csv('D:\\pytest\mkc\\singledata\\data.csv', encoding = 'gbk')
print df.columns
df.head(2)
Index([u’恋爱次数’, u’年级’, u’性别’, u’追过人数’, u’被追人数’, u’每周自习时间’, u’每周娱乐时间’, u’每周睡觉时间’, u’每周运动时间’, u’每月话费’, u’学生组织个数’, u’班干部’, u’党员’, u’足球’, u’篮球’, u’乒乓球’, u’羽毛球’, u’跑步’, u’台球’, u’唱歌’, u’主持’, u’舞蹈’, u’乐器’, u’其他才艺’, u’家乡’, u’成绩水平’, u’生活费_百元’, u’寝室同学情况’, u’身高’, u’体重’, u’眼镜’, u’颜值’], dtype=’object’)
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
恋爱次数年级性别追过人数被追人数每周自习时间每周娱乐时间每周睡觉时间每周运动时间每月话费乐器其他才艺家乡成绩水平生活费_百元寝室同学情况身高体重眼镜颜值
05大三2437245510451一线城市125116856.0不戴眼镜4
14大一156104091001一线城市3920115847.0戴眼镜9

2 rows × 32 columns

## 02 数据预处理
df.info()
### 值替换 为了便于研究,将汉字转为英语单词,有的值用到在替换
df[u"性别"].replace([u"女",u"男"],["female","male"],inplace=True)
df[u'年级'].replace([u'大一', u'大二', u'大三', u'大四'], ['freshman', 'sophomore', 'junior', 'senior'], inplace = True)
df[u"眼镜"].replace([u"戴眼镜",u"不戴眼镜"],["wear","not_wear"],inplace=True)
df
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
恋爱次数年级性别追过人数被追人数每周自习时间每周娱乐时间每周睡觉时间每周运动时间每月话费乐器其他才艺家乡成绩水平生活费_百元寝室同学情况身高体重眼镜颜值
05juniorfemale2437245510451一线城市125116856.0not_wear4
14freshmanfemale156104091001一线城市3920115847.0wear9
23seniormale13141650201003三线城市8520118078.0not_wear8
32seniorfemale03101460201005农村3010116854.0wear5
41juniormale0321542201001一线城市5045118285.0not_wear10
51sophomoremale12278497384县级市3030117767.0wear7
61juniorfemale25323575505农村5010116157.0wear7
75seniorfemale08841636302二线城市4540117665.0wear5
84sophomoremale2327257611454县级市2621117577.0wear8
94seniorfemale1735225710365农村3515116553.0wear5
104seniorfemale310507449402二线城市5020116854.0wear6
114seniorfemale16204603494县级市6020116350.0not_wear6
124sophomorefemale043030490601一线城市3030116645.0wear10
134seniorfemale062919556401一线城市4314115946.0not_wear8
143seniormale3336572503三线城市2521117888.0wear10
153seniormale532019444792二线城市3625117290.0wear5
163seniorfemale16362060141003三线城市4232116048.0not_wear5
173seniormale471020567502二线城市7520117057.0not_wear5
183seniorfemale15027600504县级市1320115853.0not_wear8
193seniorfemale0524805602二线城市30100116050.0not_wear5
203seniorfemale111350604673三线城市4020115750.0wear6
213seniormale351532634712二线城市2114117968.0not_wear8
223sophomoremale62202060101002二线城市9125118393.0wear6
232seniorfemale041710587603三线城市1022116657.0wear7
242seniorfemale022094961003三线城市1520116057.5wear8
252juniormale24547018154县级市289118075.0wear4
262juniorfemale01030103510283三线城市3020117050.0wear5
272juniorfemale01239562381一线城市5015116848.0not_wear6
282juniormale1330195041002二线城市6016118075.0wear5
292juniormale21212218412二线城市3914117066.0wear7
2630freshmanfemale05435493312二线城市10030016350.0wear8
2640freshmanfemale0083503383三线城市10013016453.0wear10
2650freshmanfemale1172441383三线城市3115015952.5not_wear5
2660sophomorefemale031010600503三线城市015015348.0not_wear5
2670seniormale4320125016805农村607016555.0not_wear8
2680juniormale0031253814372二线城市710017360.0wear7
2690freshmanmale21206497394县级市4720015549.0not_wear7
2700freshmanmale0068522304县级市2015017552.0wear10
2710seniorfemale001420503403三线城市5015015650.0wear5
2720seniorfemale133417505635农村309015649.0not_wear10
2730freshmanmale011011505804县级市10015017160.0wear8
2740sophomoremale00363561215农村6610015549.0not_wear10
2750sophomorefemale0314145571004县级市80100016757.0not_wear8
2760seniorfemale1435175571003三线城市4725016148.0not_wear9
2770juniorfemale053021580311一线城市49016752.0wear5
2780seniorfemale109165701005农村10015015847.0wear2
2790sophomorefemale0572552502二线城市2010016853.0not_wear5
2800sophomoremale102020460362二线城市5020018161.0not_wear2
2810freshmanfemale05103568202二线城市134016545.0not_wear7
2820sophomorefemale1328555203三线城市5022017053.0not_wear5
2830sophomorefemale04201452201一线城市508016055.0not_wear8
2840juniorfemale603214623474县级市4023016265.0wear7
2850seniorfemale25144249121002二线城市29100016160.0wear7
2860seniormale1050164271002二线城市30290189101.0wear5
2870seniorfemale021015502232二线城市50100016550.5wear7
2880sophomoremale22301502603三线城市7318017668.0wear8
2890seniorfemale024017552503三线城市20100016049.0wear4
2900sophomoremale001020560502二线城市55100018490.0wear3
2910sophomoremale02514504582二线城市5012017564.0not_wear5
2920sophomorefemale002416516202二线城市40100016264.0wear5

293 rows × 32 columns

## 03描述性分析 1,恋爱次数分析
grouped = df.groupby([u'恋爱次数']).count()[u'年级']# 不懂
grouped
恋爱次数 0 82 1 110 2 54 3 27 4 18 5 2 Name: 年级, dtype: int64

print '平均恋爱次数: ', df[u'恋爱次数'].mean()
print '恋爱次数的中位数: ', df[u'恋爱次数'].median()
grouped.plot(kind = 'bar', color = 'gray', align = 'center')
plt.xlabel('Times')
plt.ylabel('counts(person)')
平均恋爱次数: 1.30034129693 恋爱次数的中位数: 1.0
grouped1 = df.groupby([u'被追人数']).count()[u'年级']
grouped1
被追人数 0 75 1 47 2 33 3 44 4 17 5 31 6 13 7 5 8 6 9 3 10 19 Name: 年级, dtype: int64
print "平均被追人数:",df[u'被追人数'].mean()
print "被追人数的中位数:",df[u'被追人数'].median()
grouped1.plot(kind = 'bar', color = 'gray', align = 'center')
plt.xlabel('Times')
plt.ylabel('counts(person)')
平均被追人数: 2.88737201365 被追人数的中位数: 2.0
grouped2 = df.groupby([u'追过人数']).count()[u'年级']
grouped2
追过人数 0 134 1 82 2 44 3 18 4 6 5 3 6 4 7 1 10 1 Name: 年级, dtype: int64
print "平均追过人数:",df[u'追过人数'].mean()
print "追过人数的中位数:",df[u'追过人数'].median()
grouped1.plot(kind = 'bar', color = 'gray', align = 'center')
plt.xlabel('Times')
plt.ylabel('counts(person)')
平均追过人数: 1.03754266212 追过人数的中位数: 1.0 ## 探索性分析(1) 1,恋爱次数与性别关系 先统计男女比列
grouped0 = df.groupby([u'性别']).count()[u'年级']
grouped0
性别 female 149 male 144 Name: 年级, dtype: int64
grouped0.plot(kind = 'bar', color = 'r')
plt.ylabel('numbers')
grouped3 = df.groupby([u'恋爱次数', u'性别']).count()[u'年级']
grouped3
恋爱次数 性别 0 female 38 male 44 1 female 54 male 56 2 female 29 male 25 3 female 11 male 16 4 female 15 male 3 5 female 2 Name: 年级, dtype: int64
grouped3.plot(kind = 'bar', stacked = 'True', 
              color = ['g','b'], label = 'female')#为毛color不对啊
plt.xlabel('Times&sex')
plt.ylabel('counts(person)')
plt.legend()
#grouped4 = df.groupby([u'被人追数', u'性别']).count()[u'年级']
grouped4=df.groupby([u'被追人数',u'性别']).count()[u"年级"]
grouped4
被追人数 性别 0 female 25 male 50 1 female 12 male 35 2 female 15 male 18 3 female 20 male 24 4 female 12 male 5 5 female 27 male 4 6 female 11 male 2 7 female 4 male 1 8 female 5 male 1 9 female 3 10 female 15 male 4 Name: 年级, dtype: int64
grouped4.plot(kind="bar",stacked=True,color=['r', 'g'],label=("female"))
plt.xlabel('Times&sex')
plt.ylabel('counts(person)')
plt.legend()
grouped5=df.groupby([u'追过人数',u'性别']).count()[u"年级"]
grouped5
追过人数 性别 0 female 92 male 42 1 female 37 male 45 2 female 16 male 28 3 female 2 male 16 4 female 1 male 5 5 male 3 6 female 1 male 3 7 male 1 10 male 1 Name: 年级, dtype: int64
grouped5.plot(kind="bar",stacked=True,color=['r', 'g'],label=("female"))
plt.xlabel('Times&sex')
plt.ylabel('counts(person)')
plt.legend()
grouped5=df.groupby([u'被追人数',u'颜值']).count()[u"年级"]
grouped5
被追人数 颜值 0 0 8 1 3 2 6 3 4 4 3 5 20 6 8 7 7 8 4 9 4 10 8 1 2 1 4 7 5 16 6 6 7 9 8 4 9 3 10 1 2 2 1 4 4 5 7 6 6 7 5 8 5 9 4 10 1 3 3 1 4 2 5 12 .. 5 3 1 4 2 5 5 6 4 7 8 8 5 9 2 10 3 6 0 1 5 1 6 1 8 7 9 3 7 5 3 6 1 7 1 8 5 4 6 1 9 1 9 7 2 10 1 10 1 1 3 1 4 1 5 3 6 1 7 2 8 1 9 5 10 4 Name: 年级, Length: 74, dtype: int64
grouped5.plot(kind="bar",stacked=True,color=['r', 'g'],label=("female"))
plt.xlabel('Times&sex')
plt.ylabel('counts(person)')
#plt.legend()
## 04探索性分析(2) 那些没追过别人和没被别人追过的孩子又会有啥不一样的呢
zero=df[df[u"恋爱次数"]==0]#筛选出恋爱次数为0的同学
zero.head(5)
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
恋爱次数年级性别追过人数被追人数每周自习时间每周娱乐时间每周睡觉时间每周运动时间每月话费乐器其他才艺家乡成绩水平生活费_百元寝室同学情况身高体重眼镜颜值
760seniormale005045700703三线城市3013117059.0wear0
770juniormale0054563503三线城市107117170.0wear0
780juniorfemale0064601303三线城市6012116061.0wear5
790seniorfemale3216115610381一线城市318116250.0wear6
800seniormale003525425405农村5510117572.0wear5

5 rows × 32 columns

1、恋爱与男女比例关系

先统计没有恋爱经验的男女比例

grouped7=zero.groupby([u"性别"]).count()[u"年级"]
grouped7.plot(kind="bar",color="gray")
plt.xlabel('sex')
plt.ylabel('counts(people)')
grouped8 = zero.groupby([u'年级']).count()[u'性别']
grouped8
年级 freshman 19 junior 12 senior 33 sophomore 18 Name: 性别, dtype: int64
grouped8.plot(kind = 'bar', color = 'gray')
plt.xlabel('grade')
plt.ylabel('counts(peopel)')
### 统计男性身高
grouped9 = df[df[u"性别"] == "male"]
grouped9
.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }
恋爱次数年级性别追过人数被追人数每周自习时间每周娱乐时间每周睡觉时间每周运动时间每月话费乐器其他才艺家乡成绩水平生活费_百元寝室同学情况身高体重眼镜颜值
23seniormale13141650201003三线城市8520118078.0not_wear8
41juniormale0321542201001一线城市5045118285.0not_wear10
51sophomoremale12278497384县级市3030117767.0wear7
84sophomoremale2327257611454县级市2621117577.0wear8
143seniormale3336572503三线城市2521117888.0wear10
153seniormale532019444792二线城市3625117290.0wear5
173seniormale471020567502二线城市7520117057.0not_wear5
213seniormale351532634712二线城市2114117968.0not_wear8
223sophomoremale62202060101002二线城市9125118393.0wear6
252juniormale24547018154县级市289118075.0wear4
282juniormale1330195041002二线城市6016118075.0wear5
292juniormale21212218412二线城市3914117066.0wear7
312sophomoremale203525512403三线城市207117462.0not_wear4
332juniormale18181856121004县级市5220117566.0wear9
342seniormale117245161001一线城市5127116962.0wear4
352seniormale3303565602二线城市40100118073.0wear10
362seniormale22202491415农村519117258.0wear7
382seniormale208305011003三线城市1822118368.0wear9
441seniormale26505547542二线城市214118780.0wear8
451juniormale21165634754县级市5824116851.0wear4
481freshmanmale1021594254县级市5016117157.0wear5
491juniormale122511497494县级市708117465.0not_wear7
521freshmanmale111812535493三线城市4814118061.0wear6
551juniormale14501401453三线城市215118472.0wear4
571juniormale1120235010402二线城市4010118362.0not_wear7
601freshmanmale1020104061003三线城市8010118290.0not_wear8
621juniormale01288538503三线城市6219116858.0wear7
631seniormale33930566255农村1009117675.0wear3
681sophomoremale12922704512二线城市6340117052.0wear9
701sophomoremale13406423503三线城市4520118380.0wear6
2303juniormale61108636301一线城市10020017555.0wear5
2323seniormale10114050101002二线城市5060017364.0wear5
2333seniormale135040540622二线城市111018572.0wear10
2342juniormale2024366414403三线城市5012018462.0wear6
2362freshmanmale1133105411403三线城市3415017488.0wear4
2381sophomoremale0221144010202二线城市1015018565.0wear5
2411freshmanmale03618437202二线城市7410018172.0wear10
2421freshmanmale0114175518501一线城市10020018671.0wear5
2431seniormale20015556402二线城市116018585.0wear4
2441freshmanmale222031401301一线城市9824018065.0wear6
2451seniormale2115495910353三线城市5810018186.0wear9
2461juniormale13107561363三线城市5020017065.0wear5
2471juniormale341133704164县级市10025017562.0wear10
2491sophomoremale102031561504县级市5020018160.0wear10
2551seniormale10502560303三线城市5512017679.0wear2
2571seniormale11728421303三线城市3010018073.0wear6
2581seniormale121050550302二线城市8015017985.0wear5
2590freshmanmale0120165061002二线城市5015017961.0not_wear8
2620freshmanmale002027548363三线城市1006017259.0wear0
2670seniormale4320125016805农村607016555.0not_wear8
2680juniormale0031253814372二线城市710017360.0wear7
2690freshmanmale21206497394县级市4720015549.0not_wear7
2700freshmanmale0068522304县级市2015017552.0wear10
2730freshmanmale011011505804县级市10015017160.0wear8
2740sophomoremale00363561215农村6610015549.0not_wear10
2800sophomoremale102020460362二线城市5020018161.0not_wear2
2860seniormale1050164271002二线城市30290189101.0wear5
2880sophomoremale22301502603三线城市7318017668.0wear8
2900sophomoremale001020560502二线城市55100018490.0wear3
2910sophomoremale02514504582二线城市5012017564.0not_wear5

144 rows × 32 columns

age_train_p=grouped9[u"身高"]
age_train_p
age_train_p[age_train_p < 170]
34     169
45     168
62     168
81     168
99     169
121    130
122    168
138    163
173    169
202    164
213    167
267    165
269    155
274    155
Name: 身高, dtype: int64
ages=np.arange(150,200,5) #150~200岁,每10厘米一段(年龄最小153,最大为192)

age_cut=pd.cut(age_train_p,ages) #待分组值,分组条件

age_cut_grouped=age_train_p.groupby(age_cut).count()

age_cut_grouped.plot(kind="bar",color="gray")

plt.xlabel('height')

plt.ylabel('counts(person)')
<matplotlib.text.Text at 0xbbf8c88>

!这里写图片描述png

df.iloc[121]
恋爱次数          2
年级           大三
性别            男
追过人数          0
被追人数         10
每周自习时间        0
每周娱乐时间        4
每周睡觉时间       50
每周运动时间        5
每月话费        100
学生组织个数        1
班干部           是
党员            否
足球            否
篮球            否
乒乓球           否
羽毛球           否
跑步            否
台球            否
唱歌            是
主持            是
舞蹈            否
乐器            是
其他才艺          否
家乡        2二线城市
成绩水平         20
生活费_百元      100
寝室同学情况        0
身高          130
体重           30
眼镜         不戴眼镜
颜值           10
Name: 121, dtype: object

按身高分组

grouped11 = df[u'性别'].groupby(df[u'身高'])
grouped11.count()
身高
130     2
150     1
153     1
154     4
155     4
156     9
157     3
158     9
159     3
160    18
161    10
162    11
163    13
164     6
165    12
166     7
167     5
168    24
169     4
170    19
171     4
172    15
173     8
174     6
175    20
176     6
177     6
178     7
179    10
180     9
181    10
182     4
183    11
184     4
185     3
186     1
187     2
189     1
192     1
Name: 性别, dtype: int64

按性别分组,统计不同性别的身高信息,

方法有mean: 平均
max: 最大值
min: 最小
count: 统计个数

grouped11 = df[u'身高'].groupby(df[u'性别'])
grouped11.mean()
性别
女    163.006711
男    175.861111
Name: 身高, dtype: float64

扎心了 老铁

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值