数据集——宫颈癌预测

机器学习

年龄性伴侣数量怀孕数量抽烟吸烟(年)烟(包/年)激素避孕药激素避孕药(年)性传播疾病(数量)是否癌症
18410000000
15110000000
34110000000
5254137371301
463400011500
42320000000
51361343.40000
26130001200
45150000001
443?11.2669729092.80000
44340001200
27130001800
454600011000
44220001500
43250000000
403200011500
414300010.2500
43380001300
422?0001720
402?0000000
432400011500
413400011010
401100010.2520
401200011501
40330001300
44310000000
39520000000
39240000000
3731130.040000
376100010.2500
413300012200
402200011900
373511.2669729090.51320212811010
373300010.500
382200010.500
37350001100
39240001100
37?100010.5800
39140000000
36230001200
37?50000000
372?0000010
363311.2669729092.41900
36330000000
3733112611300
361400010.2500
3623???1500
40230000000
41240000000
373300010.2500
363400010.2500
36210000000
365300010.500
353411890000
361300011100
414500011500
35540000000
3336171.610.2500
35260001100
35340001720
34330001500
3535119191400
35110001500
331200011220
382400011601
37330001300
34330001500
3642121210000
35320000020
355200010.3300
364300011500
34?3000???0
342200010.3300
35220001200
35221150.32???0
35220001100
34550001100
33310001200
35361132.61710
352300010.500
331300010.1600
31331160.80000
32220001200
32320001800
36230000010
34110001400
36220001500
355200011000
32320001200
33120000000
3616000???0
35350001400
33430001800
331200010.500
33320000000
3142000???0
355?1151511400
35120001200
3115000???0
33320001900
343700010.0800
30540000010
3132000???0
383400011000
332300011000
34441820000
304200011300
3351140.51320212812.28220052100
322200011000
3221000??00
32310001800
32721195.7???0
30320001920
23520001200
343400010.6600
34330000000
31320000000
3033??????0
302200011100
28330001300
33140000000
30530001600
31340001400
295?0001220
303400011200
313411010000
31220000000
29240000000
29110000000
30260001500
28260001500
30?31223.30010
302200010.0800
33421143.510.0800
303300011220
27320000000
3123000??00
29430001100
29110000000
302300010.500
30120000000
2031000???0
313311612???0
2812000???0
30210001100
313610.50.0251600
304300010.500
29450000010
33350001100
3135000???0
28220001300
30120000000
26320001500
26220001200
27230001600
28421112.751600
27120001700
29?20001500
32220000000
27220001700
28230001500
28220001600
28130001300
28540001300
29?2000???0
30320001510
29120001700
272?0000000
29430000000
18340001500
263300010.2510
262211.2669729090.5132021281400
29220001900
29330000010
2835112120000
26130001100
2711000???0
27320001100
2744000???0
30120001900
26220001100
27210000000
253400010.2500
28230000000
261200010.2500
282?0001900
25152???0020
283311261710
27520001301
252200015?0
28830001100
29320001200
27240001200
26101190.513202128???0
254300011.500
26220001600
28330000000
265?0001100
27120000000
27220000000
26720000000
2644120.20000
2722171.41330
282200010.4220
21120000000
253200010.0800
28220001100
26320001100
274100010.6700
25320001900
25540001300
27110000000
2633155??10
28361142.11720
30130001400
251100010.0800
293311021600
26120001400
28340001100
2341140.5132021281200
25?2000???0
25?20001300
3013000??10
231200010.1600
24240000010
2863000???0
26110000000
2543170.70020
3123000???0
27340001600
23210001100
26350000000
23310001100
242200010.1600
25?2000???0
252100010.500
24230001410
2421000??10
29450001600
2832180.81800
24130001300
24210001600
301100010.0800
2561000???0
25330000000
22110001100
28220001700
231200010.6700
25220001700
223100010.2500
252200010.2500
231200010.500
2421161.21800
23120000000
25310000000
252200010.2500
2441000???0
22310000020
215100010.4200
23811107.51800
28240001500
21210000000
23220000000
232711.2669729090.513202128???0
2222151.251600
21??0000000
21210001100
2223000???0
23230001300
26120000000
21110001100
2432000???0
23??000???0
24320000000
24210001500
21310000000
21430001300
20110000000
22220000000
2221??????0
23220000000
2141000???0
21110001400
25330001300
2211000???0
26220001200
21420000000
22330001100
232211.2669729090.5132021281500
26221331200
21121150.751400
23120001100
222300010.0800
21110000000
23120000000
23110001720
24310000000
2112???0000
241300010.2500
2331000???0
23230001500
20?10000000
2343110.11700
25320000000
23?2000???0
205110.50.5132021280000
263100010.6600
2311000???0
25230001500
2342000??20
232100010.5800
24440001400
23230000000
24530001100
21120001300
20320001400
2422000???0
201100010.2500
23210001600
2312000???0
22310001100
2221000???0
20220001100
222200013?0
33?40001300
201100010.2510
211100010.2500
20210000000
2041000???0
231500010.2500
2111000???0
271400012.28220052100
31150001100
22340001500
21220001100
29240000001
22120001700
482713281700
2053143???0
202100010.0800
19310001100
1911000???0
212100010.500
2011120.2???0
18110000000
224100010.5?0
1941000???0
192200010.500
19120000000
21320001100
2012000???0
19420001400
19410000000
2122000???0
45240000000
19340000020
18110000000
2222000??00
21110001200
37240001400
24120001300
204100010.1600
201200010.4200
181100010.3300
19320000000
202?00010.7500
193100010.2500
19110000000
20?10001300
18110000000
21410000000
2441192.251100
215311.2669729090.51320212810.7500
232200010.500
21521210010
22130001100
41320001420
202413310.500
18210000000
21210000000
1831120.00310.5800
1932000???0
25510000000
192300010.5800
18120000020
21330001100
1823173.51100
22210000000
2033140.20010
17110000000
202100010.0800
17?2000???0
19110000000
19310000000
202111370020
18110000000
1831130.45??00
19220001320
18?2000???0
211300010.500
172100010.3300
21130001500
183100010.5800
17310000000
18220000000
1721000???0
17110000000
17220000040
1722000???0
17210000000
202300010.4200
17110000000
1821000???0
35350000000
2031110.150000
18320000000
18120000000
19?2000???0
17120000020
1941000???0
182100010.2500
263?00010.3300
182210.50.0510.3300
17220000000
20210000020
16210000000
18711551220
20110001400
1611110.250000
181200010.0800
194100010.4200
18120000000
31310000001
16110001100
203200010.2500
17210000000
1631000???0
16110000000
20320000000
18520001100
15110000000
15110000000
16110000000
1511000???0
181100010.1600
1621000???0
1831161.20000
17310000000
173100010.2510
18110001100
162200010.6600
1611???0000
1711000???0
1511000???0
15110000000
18120000010
16110000000
153111.2669729090.513202128???0
16210000000
14210000000
14?1000???0
15320000000
17510000000
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
好的,对于这个任务,我们可以使用Z-score方法进行离群值检测和处理。具体步骤如下: 1. 加载数据集 首先,我们需要加载UCI宫颈癌数据集。这个数据集可以从 https://archive.ics.uci.edu/ml/datasets/Cervical+cancer+%28Risk+Factors%29# 上下载。 下载后,我们可以使用pandas库将其转换为DataFrame格式,具体代码如下: ```python import pandas as pd data = pd.read_csv('risk_factors_cervical_cancer.csv', na_values='?') ``` 2. 数据预处理 接下来,我们需要对数据进行预处理。由于数据集中存在缺失值,我们需要使用fillna方法将其填充为0。同时,我们需要将所有的特征值转换为数值类型,具体代码如下: ```python data = data.fillna(0) data = data.apply(pd.to_numeric, errors='coerce') ``` 3. 离群值检测和处理 使用Z-score方法进行离群值检测和处理的具体步骤如下: - 对于每个特征,计算其平均值和标准差; - 对于每个样本,计算其Z-score值; - 对于所有Z-score值大于阈值的样本,将其删除或者替换为平均值。 具体代码如下: ```python from scipy import stats threshold = 3 # 设置阈值 for col in data.columns: if col != 'Dx': mean = data[col].mean() std = data[col].std() z = stats.zscore(data[col]) data = data[(z < threshold) | (data[col].isnull())] data[col] = data[col].fillna(mean) ``` 这段代码中,我们遍历了所有的特征,并计算了每个特征的平均值和标准差。然后,使用stats.zscore函数计算了每个样本的Z-score值,并将Z-score值大于阈值的样本删除或者替换为平均值。 最终,我们得到了经过离群值处理后的数据集。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值