WBPC乳腺癌预后诊断数据集（Ⅰ）—数据分析源码汇总

最新推荐文章于 2024-07-02 18:18:54 发布

胸中有数-数分版

最新推荐文章于 2024-07-02 18:18:54 发布

阅读量2.8k

点赞数 2

文章标签： python 数据分析数据可视化 WPBC数据集乳腺癌预后诊断

本文链接：https://blog.csdn.net/Zengmeng1998/article/details/106338615

版权

引言

尝试使用jupyter notebook 作为工具，对WBPC预后诊断数据集进行了一些相关的描述性统计分析，源码如下，不足之处望读者多加指正。

#引用约定
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

#数据导入
data=pd.read_csv('data_2.csv')
data.head()

	id	outcome	time	radius_mean	texture_mean	perimeter_mean	area_mean	smoothness_mean	compactness_mean	concavity_mean	...	perimeter_worst	area_worst	smoothness_worst	compactness_worst	concavity_worst	concave points_worst	symmetry_worst	fractal_dimension_worst	diameter of the excised tumor in centimeters	Lymph node status
0	119513	N	31	18.02	27.600000	117.50	1013.0	0.094890	0.103600	0.1086	...	139.70	1436.0	0.119500	0.192600	0.3140	0.1170	0.267700	0.08113	5.0	5
1	8423	N	61	17.99	22.300979	122.80	1001.0	0.118400	0.142642	0.3001	...	184.60	2019.0	0.162200	0.665600	0.7119	0.2654	0.460100	0.11890	3.0	2
2	842517	N	116	21.37	17.440000	137.50	1373.0	0.088360	0.118900	0.1255	...	159.10	1949.0	0.118800	0.344900	0.3414	0.2032	0.433400	0.09067	2.5	0
3	843483	N	123	11.42	20.380000	77.58	386.1	0.102774	0.142642	0.2414	...	98.87	567.7	0.143921	0.364567	0.6869	0.2575	0.322251	0.17300	2.0	0
4	843584	R	27	20.29	14.340000	135.10	1297.0	0.100300	0.132800	0.1980	...	152.20	1575.0	0.137400	0.205000	0.4000	0.1625	0.236400	0.07678	3.5	0

5 rows × 35 columns

#查看标签
col=data.columns
print(col)

Index(['id', 'outcome', 'time', 'radius_mean', 'texture_mean',
       'perimeter_mean', 'area_mean', 'smoothness_mean', 'compactness_mean',
       'concavity_mean', 'concave points_mean', 'symmetry_mean',
       'fractal_dimension_mean', 'radius_se', 'texture_se', 'perimeter_se',
       'area_se', 'smoothness_se', 'compactness_se', 'concavity_se',
       'concave points_se', 'symmetry_se', 'fractal_dimension_se',
       'radius_worst', 'texture_worst', 'perimeter_worst', 'area_worst',
       'smoothness_worst', 'compactness_worst', 'concavity_worst',
       'concave points_worst', 'symmetry_worst', 'fractal_dimension_worst',
       'diameter of the excised tumor in centimeters', 'Lymph node status'],
      dtype='object')

#提取想要分析的结果
y=data.outcome
x=data.drop(['id','outcome','time'],axis = 1)
x.head()

	radius_mean	texture_mean	perimeter_mean	area_mean	smoothness_mean	compactness_mean	concavity_mean	concave points_mean	symmetry_mean	fractal_dimension_mean	...	perimeter_worst	area_worst	smoothness_worst	compactness_worst	concavity_worst	concave points_worst	symmetry_worst	fractal_dimension_worst	diameter of the excised tumor in centimeters	Lymph node status
0	18.02	27.600000	117.50	1013.0	0.094890	0.103600	0.1086	0.07055	0.1865	0.063330	...	139.70	1436.0	0.119500	0.192600	0.3140	0.1170	0.267700	0.08113	5.0	5
1	17.99	22.300979	122.80	1001.0	0.118400	0.142642	0.3001	0.14710	0.2419	0.078710	...	184.60	2019.0	0.162200	0.665600	0.7119	0.2654	0.460100	0.11890	3.0	2
2	21.37	17.440000	137.50	1373.0	0.088360	0.118900	0.1255	0.08180	0.2333	0.060100	...	159.10	1949.0	0.118800	0.344900	0.3414	0.2032	0.433400	0.09067	2.5	0
3	11.42	20.380000	77.58	386.1	0.102774	0.142642	0.2414	0.10520	0.2597	0.062743	...	98.87	567.7	0.143921	0.364567	0.6869	0.2575	0.322251	0.17300	2.0	0
4	20.29	14.340000	135.10	1297.0	0.100300	0.132800	0.1980	0.10430	0.1809	0.058830	...	152.20	1575.0	0.137400	0.205000	0.4000	0.1625	0.236400	0.07678	3.5	0

5 rows × 32 columns

#对数据进行描述性统计分析
ax=sns.countplot(y,label="Count")
N,R=y.value_counts()
print('Number of recur',R)
print('Number of nonrecur：', N)

Number of recur 46
Number of nonrecur： 148

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-flbhNVju-1590400044000)(output_4_1.png)]

#进行描述性数值分析
des=x.describe()

#绘制相似性热力图

feature_mean=list(data.columns[2:12])
feature_se=list(data.columns[12:22])
feature_worst=list(data.columns[22:32])
feature_other=list(data.columns[32:34])
corr=data[feature_mean].corr()
plt.figure(figsize=(14,14))
sns.heatmap(corr,annot=True)

<matplotlib.axes._subplots.AxesSubplot at 0x26a5d0e56d8>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Oc1Do23F-1590400044005)(output_6_1.png)]

#数据可视化
#绘制对比分析统计图
data_dia=y
data=x
data_n_2=(data-data.mean())/(data.std())#标准化
data=pd.concat([y,data_n_2.iloc[:,0:10]],axis=1)
data=pd.melt(data,id_vars='outcome',
            var_name='features',
            value_name='value')
plt.figure(figsize=(10,10))
sns.violinplot(x='features',y='value',hue='outcome',data=data,split=True,inner='quart')
plt.xticks(rotation=90)

(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), <a list of 10 Text xticklabel objects>)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-bv1NYQfb-1590400044007)(output_7_1.png)]

#绘制箱线图
plt.figure(figsize=(10,10))
sns.boxplot(x='features',y='value',hue='outcome',data=data)
plt.xticks(rotation=90)

(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), <a list of 10 Text xticklabel objects>)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-WJZHG2as-1590400044010)(output_8_1.png)]

data_dia=y
data=x
data_n_2=(data-data.mean())/(data.std())#标准化
data=pd.concat([y,data_n_2.iloc[:,10:20]],axis=1)
data=pd.melt(data,id_vars='outcome',
            var_name='features',
            value_name='value')
plt.figure(figsize=(10,10))
sns.violinplot(x='features',y='value',hue='outcome',data=data,split=True,inner='quart')
plt.xticks(rotation=90)

(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), <a list of 10 Text xticklabel objects>)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-4vRkiqIy-1590400044012)(output_9_1.png)]

#绘制箱线图
plt.figure(figsize=(10,10))
sns.boxplot(x='features',y='value',hue='outcome',data=data)
plt.xticks(rotation=90)

(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), <a list of 10 Text xticklabel objects>)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Jb0uzgAa-1590400044014)(output_10_1.png)]

data_dia=y
data=x
data_n_2=(data-data.mean())/(data.std())#标准化
data=pd.concat([y,data_n_2.iloc[:,20:30]],axis=1)
data=pd.melt(data,id_vars='outcome',
            var_name='features',
            value_name='value')
plt.figure(figsize=(10,10))
sns.violinplot(x='features',y='value',hue='outcome',data=data,split=True,inner='quart')
plt.xticks(rotation=90)

(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), <a list of 10 Text xticklabel objects>)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-uWEUX3Ri-1590400044015)(output_11_1.png)]

#绘制箱线图
plt.figure(figsize=(10,10))
sns.boxplot(x='features',y='value',hue='outcome',data=data)
plt.xticks(rotation=90)

(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), <a list of 10 Text xticklabel objects>)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-UkVaHUTh-1590400044016)(output_12_1.png)]

data_dia=y
data=x
data_n_2=(data-data.mean())/(data.std())#标准化
data=pd.concat([y,data_n_2.iloc[:,30:32]],axis=1)
data=pd.melt(data,id_vars='outcome',
            var_name='features',
            value_name='value')
plt.figure(figsize=(10,10))
sns.violinplot(x='features',y='value',hue='outcome',data=data,split=True,inner='quart')
plt.xticks(rotation=90)

(array([0, 1]), <a list of 2 Text xticklabel objects>)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-DK3xXkhJ-1590400044017)(output_13_1.png)]

#绘制箱线图
plt.figure(figsize=(10,10))
sns.boxplot(x='features',y='value',hue='outcome',data=data)
plt.xticks(rotation=90)

(array([0, 1]), <a list of 2 Text xticklabel objects>)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-kwBLXlVA-1590400044019)(output_14_1.png)]

胸中有数-数分版

关注

2
点赞
踩
29

收藏

觉得还不错? 一键收藏
打赏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫