WBPC乳腺癌预后诊断数据集(Ⅰ)—数据分析源码汇总

引言

 尝试使用jupyter notebook 作为工具,对WBPC预后诊断数据集进行了一些相关的描述性统计分析,源码如下,不足之处望读者多加指正。

#引用约定
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
#数据导入
data=pd.read_csv('data_2.csv')
data.head()
idoutcometimeradius_meantexture_meanperimeter_meanarea_meansmoothness_meancompactness_meanconcavity_mean...perimeter_worstarea_worstsmoothness_worstcompactness_worstconcavity_worstconcave points_worstsymmetry_worstfractal_dimension_worstdiameter of the excised tumor in centimetersLymph node status
0119513N3118.0227.600000117.501013.00.0948900.1036000.1086...139.701436.00.1195000.1926000.31400.11700.2677000.081135.05
18423N6117.9922.300979122.801001.00.1184000.1426420.3001...184.602019.00.1622000.6656000.71190.26540.4601000.118903.02
2842517N11621.3717.440000137.501373.00.0883600.1189000.1255...159.101949.00.1188000.3449000.34140.20320.4334000.090672.50
3843483N12311.4220.38000077.58386.10.1027740.1426420.2414...98.87567.70.1439210.3645670.68690.25750.3222510.173002.00
4843584R2720.2914.340000135.101297.00.1003000.1328000.1980...152.201575.00.1374000.2050000.40000.16250.2364000.076783.50

5 rows × 35 columns

#查看标签
col=data.columns
print(col)
Index(['id', 'outcome', 'time', 'radius_mean', 'texture_mean',
       'perimeter_mean', 'area_mean', 'smoothness_mean', 'compactness_mean',
       'concavity_mean', 'concave points_mean', 'symmetry_mean',
       'fractal_dimension_mean', 'radius_se', 'texture_se', 'perimeter_se',
       'area_se', 'smoothness_se', 'compactness_se', 'concavity_se',
       'concave points_se', 'symmetry_se', 'fractal_dimension_se',
       'radius_worst', 'texture_worst', 'perimeter_worst', 'area_worst',
       'smoothness_worst', 'compactness_worst', 'concavity_worst',
       'concave points_worst', 'symmetry_worst', 'fractal_dimension_worst',
       'diameter of the excised tumor in centimeters', 'Lymph node status'],
      dtype='object')
#提取想要分析的结果
y=data.outcome
x=data.drop(['id','outcome','time'],axis = 1)
x.head()
radius_meantexture_meanperimeter_meanarea_meansmoothness_meancompactness_meanconcavity_meanconcave points_meansymmetry_meanfractal_dimension_mean...perimeter_worstarea_worstsmoothness_worstcompactness_worstconcavity_worstconcave points_worstsymmetry_worstfractal_dimension_worstdiameter of the excised tumor in centimetersLymph node status
018.0227.600000117.501013.00.0948900.1036000.10860.070550.18650.063330...139.701436.00.1195000.1926000.31400.11700.2677000.081135.05
117.9922.300979122.801001.00.1184000.1426420.30010.147100.24190.078710...184.602019.00.1622000.6656000.71190.26540.4601000.118903.02
221.3717.440000137.501373.00.0883600.1189000.12550.081800.23330.060100...159.101949.00.1188000.3449000.34140.20320.4334000.090672.50
311.4220.38000077.58386.10.1027740.1426420.24140.105200.25970.062743...98.87567.70.1439210.3645670.68690.25750.3222510.173002.00
420.2914.340000135.101297.00.1003000.1328000.19800.104300.18090.058830...152.201575.00.1374000.2050000.40000.16250.2364000.076783.50

5 rows × 32 columns

#对数据进行描述性统计分析
ax=sns.countplot(y,label="Count")
N,R=y.value_counts()
print('Number of recur',R)
print('Number of nonrecur:', N)
Number of recur 46
Number of nonrecur: 148

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-flbhNVju-1590400044000)(output_4_1.png)]

#进行描述性数值分析
des=x.describe()
#绘制相似性热力图

feature_mean=list(data.columns[2:12])
feature_se=list(data.columns[12:22])
feature_worst=list(data.columns[22:32])
feature_other=list(data.columns[32:34])
corr=data[feature_mean].corr()
plt.figure(figsize=(14,14))
sns.heatmap(corr,annot=True)
<matplotlib.axes._subplots.AxesSubplot at 0x26a5d0e56d8>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Oc1Do23F-1590400044005)(output_6_1.png)]

#数据可视化
#绘制对比分析统计图
data_dia=y
data=x
data_n_2=(data-data.mean())/(data.std())#标准化
data=pd.concat([y,data_n_2.iloc[:,0:10]],axis=1)
data=pd.melt(data,id_vars='outcome',
            var_name='features',
            value_name='value')
plt.figure(figsize=(10,10))
sns.violinplot(x='features',y='value',hue='outcome',data=data,split=True,inner='quart')
plt.xticks(rotation=90)
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), <a list of 10 Text xticklabel objects>)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-bv1NYQfb-1590400044007)(output_7_1.png)]

#绘制箱线图
plt.figure(figsize=(10,10))
sns.boxplot(x='features',y='value',hue='outcome',data=data)
plt.xticks(rotation=90)
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), <a list of 10 Text xticklabel objects>)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-WJZHG2as-1590400044010)(output_8_1.png)]

data_dia=y
data=x
data_n_2=(data-data.mean())/(data.std())#标准化
data=pd.concat([y,data_n_2.iloc[:,10:20]],axis=1)
data=pd.melt(data,id_vars='outcome',
            var_name='features',
            value_name='value')
plt.figure(figsize=(10,10))
sns.violinplot(x='features',y='value',hue='outcome',data=data,split=True,inner='quart')
plt.xticks(rotation=90)
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), <a list of 10 Text xticklabel objects>)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-4vRkiqIy-1590400044012)(output_9_1.png)]

#绘制箱线图
plt.figure(figsize=(10,10))
sns.boxplot(x='features',y='value',hue='outcome',data=data)
plt.xticks(rotation=90)
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), <a list of 10 Text xticklabel objects>)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Jb0uzgAa-1590400044014)(output_10_1.png)]

data_dia=y
data=x
data_n_2=(data-data.mean())/(data.std())#标准化
data=pd.concat([y,data_n_2.iloc[:,20:30]],axis=1)
data=pd.melt(data,id_vars='outcome',
            var_name='features',
            value_name='value')
plt.figure(figsize=(10,10))
sns.violinplot(x='features',y='value',hue='outcome',data=data,split=True,inner='quart')
plt.xticks(rotation=90)
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), <a list of 10 Text xticklabel objects>)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-uWEUX3Ri-1590400044015)(output_11_1.png)]

#绘制箱线图
plt.figure(figsize=(10,10))
sns.boxplot(x='features',y='value',hue='outcome',data=data)
plt.xticks(rotation=90)
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), <a list of 10 Text xticklabel objects>)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-UkVaHUTh-1590400044016)(output_12_1.png)]

data_dia=y
data=x
data_n_2=(data-data.mean())/(data.std())#标准化
data=pd.concat([y,data_n_2.iloc[:,30:32]],axis=1)
data=pd.melt(data,id_vars='outcome',
            var_name='features',
            value_name='value')
plt.figure(figsize=(10,10))
sns.violinplot(x='features',y='value',hue='outcome',data=data,split=True,inner='quart')
plt.xticks(rotation=90)
(array([0, 1]), <a list of 2 Text xticklabel objects>)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-DK3xXkhJ-1590400044017)(output_13_1.png)]

#绘制箱线图
plt.figure(figsize=(10,10))
sns.boxplot(x='features',y='value',hue='outcome',data=data)
plt.xticks(rotation=90)
(array([0, 1]), <a list of 2 Text xticklabel objects>)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-kwBLXlVA-1590400044019)(output_14_1.png)]

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

胸中有数-数分版

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值