2021-03-19:task2_数据分析

本文主要介绍了数据分析过程,包括加载数据科学和可视化库,检查数据集的缺失值和异常,通过图像展示了预测值的分布,并利用pandas_profiling生成详细的数据报告,帮助理解数据特性。
摘要由CSDN通过智能技术生成

代码实例

  1. 载入各种数据科学与可视化库
import warnings
warnings.filterwarnings('ignore')
# import missingno as msno
import pandas as pd
from pandas import DataFrame,Series
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
!pip install missingno
Requirement already satisfied: missingno in e:\app\anaconda\envs\python3.7\lib\site-packages (0.4.2)
Requirement already satisfied: numpy in e:\app\anaconda\envs\python3.7\lib\site-packages (from missingno) (1.19.4)
Requirement already satisfied: seaborn in e:\app\anaconda\envs\python3.7\lib\site-packages (from missingno) (0.11.1)
Requirement already satisfied: matplotlib in e:\app\anaconda\envs\python3.7\lib\site-packages (from missingno) (3.3.3)
Requirement already satisfied: scipy in e:\app\anaconda\envs\python3.7\lib\site-packages (from missingno) (1.6.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in e:\app\anaconda\envs\python3.7\lib\site-packages (from matplotlib->missingno) (2.4.7)
Requirement already satisfied: cycler>=0.10 in e:\app\anaconda\envs\python3.7\lib\site-packages (from matplotlib->missingno) (0.10.0)
Requirement already satisfied: pillow>=6.2.0 in e:\app\anaconda\envs\python3.7\lib\site-packages (from matplotlib->missingno) (8.1.0)
Requirement already satisfied: python-dateutil>=2.1 in e:\app\anaconda\envs\python3.7\lib\site-packages (from matplotlib->missingno) (2.8.1)
Requirement already satisfied: kiwisolver>=1.0.1 in e:\app\anaconda\envs\python3.7\lib\site-packages (from matplotlib->missingno) (1.3.1)
Requirement already satisfied: six in e:\app\anaconda\envs\python3.7\lib\site-packages (from cycler>=0.10->matplotlib->missingno) (1.15.0)
Requirement already satisfied: pandas>=0.23 in e:\app\anaconda\envs\python3.7\lib\site-packages (from seaborn->missingno) (1.2.0)
Requirement already satisfied: pytz>=2017.3 in e:\app\anaconda\envs\python3.7\lib\site-packages (from pandas>=0.23->seaborn->missingno) (2020.5)
  1. 载入数据集和训练集
train_data = pd.read_csv('./train.csv')
train_data.head()
id heartbeat_signals label
0 0 0.9912297987616655,0.9435330436439665,0.764677... 0.0
1 1 0.9714822034884503,0.9289687459588268,0.572932... 0.0
2 2 1.0,0.9591487564065292,0.7013782792997189,0.23... 2.0
3 3 0.9757952826275774,0.9340884687738161,0.659636... 0.0
4 4 0.0,0.055816398940721094,0.26129357194994196,0... 2.0
test_data = pd.read_csv('./testA.csv')
test_data.head()
id heartbeat_signals
0 100000 0.9915713654170097,1.0,0.6318163407681274,0.13...
1 100001 0.6075533139615096,0.5417083883163654,0.340694...
2 100002 0.9752726292239277,0.6710965234906665,0.686758...
3 100003 0.9956348033996116,0.9170249621481004,0.521096...
4 100004 1.0,0.8879490481178918,0.745564725322326,0.531...
  • 观察首尾数据
train_data.head().append(train_data.tail())
id heartbeat_signals label
0 0 0.9912297987616655,0.9435330436439665,0.764677... 0.0
1 1 0.9714822034884503,0.9289687459588268,0.572932... 0.0
2 2 1.0,0.9591487564065292,0.7013782792997189,0.23... 2.0
3 3 0.9757952826275774,0.9340884687738161,0.659636... 0.0
4 4 0.0,0.055816398940721094,0.26129357194994196,0... 2.0
99995 99995 1.0,0.677705342021188,0.22239242747868546,0.25... 0.0
99996 99996 0.9268571578157265,0.9063471198026871,0.636993... 2.0
99997 99997 0.9258351628306013,0.5873839035878395,0.633226... 3.0
99998 99998 1.0,0.9947621698382489,0.8297017704865509,0.45... 2.0
99999 99999 0.9259994004527861,0.916476635326053,0.4042900... 0.0
test_data.head().append(test_data.tail())
id heartbeat_signals
0 100000 0.9915713654170097,1.0,0.6318163407681274,0.13...
1 100001 0.6075533139615096,0.5417083883163654,0.340694...
2 100002 0.9752726292239277,0.6710965234906665,0.686758...
3 100003 0.9956348033996116,0.9170249621481004,0.521096...
4 100004 1.0,0.8879490481178918,0.745564725322326,0.531...
19995 119995 1.0,0.8330283177934747,0.6340472606311671,0.63...
19996 119996 1.0,0.8259705825857048,0.4521053488322387,0.08...
19997 119997 0.951744840752379,0.9162611283848351,0.6675251...
19998 119998 0.9276692903808186,0.6771898159607004,0.242906...
19999 119999 0.6653212231837624,0.527064114047737,0.5166625...
train_data.shape
(100000, 3)
test_data.shape
(20000, 2)
train_data.describe()
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值