本章内容介绍数据重构,主要根据对于数据的理解进行有利于我们的数据重新整理。
import numpy as np
import pandas as pd
第二章 数据重构
数据的合并
任务二:使用concat方法:将数据train-left-up.csv和train-right-up.csv横向分别合并为两张表,并在上下合并为一张表,并保存这张表为result_up
text_left_up = pd.read_csv("data02/train-left-up.csv")
text_left_down = pd.read_csv("data02/train-left-down.csv")
text_right_up = pd.read_csv("data02/train-right-up.csv")
text_right_down = pd.read_csv("data02/train-right-down.csv")
text_up = pd.concat([text_left_up,text_right_up],axis = 1)
text_up.head()
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked |
---|
0 | 1.0 | 0.0 | 3.0 | Braund, Mr. Owen Harris | male | 22.0 | 1.0 | 0.0 | A/5 21171 | 7.2500 | NaN | S |
---|
1 | 2.0 | 1.0 | 1.0 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1.0 | 0.0 | PC 17599 | 71.2833 | C85 | C |
---|
2 | 3.0 | 1.0 | 3.0 | Heikkinen, Miss. Laina | female | 26.0 | 0.0 | 0.0 | STON/O2. 3101282 | 7.9250 | NaN | S |
---|
3 | 4.0 | 1.0 | 1.0 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1.0 | 0.0 | 113803 | 53.1000 | C123 | S |
---|
4 | 5.0 | 0.0 | 3.0 | Allen, Mr. William Henry | male | 35.0 | 0.0 | 0.0 | 373450 | 8.0500 | NaN | S |
---|
text_down = pd.concat([text_left_down,text_right_down],axis = 1)
text_down.head()
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked |
---|
0 | 440 | 0 | 2 | Kvillner, Mr. Johan Henrik Johannesson | male | 31.0 | 0 | 0 | C.A. 18723 | 10.500 | NaN | S |
---|
1 | 441 | 1 | 2 | Hart, Mrs. Benjamin (Esther Ada Bloomfield) | female | 45.0 | 1 | 1 | F.C.C. 13529 | 26.250 | NaN | S |
---|
2 | 442 | 0 | 3 | Hampe, Mr. Leon | male | 20.0 | 0 | 0 | 345769 | 9.500 | NaN | S |
---|
3 | 443 | 0 | 3 | Petterson, Mr. Johan Emil | male | 25.0 | 1 | 0 | 347076 | 7.775 | NaN | S |
---|
4 | 444 | 1 | 2 | Reynaldo, Ms. Encarnacion | female | 28.0 | 0 | 0 | 230434 | 13.000 | NaN | S |
---|
text = pd.concat([text_up,text_down])
text.head()
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked |
---|
0 | 1.0 | 0.0 | 3.0 | Braund, Mr. Owen Harris | male | 22.0 | 1.0 | 0.0 | A/5 21171 | 7.2500 | NaN | S |
---|
1 | 2.0 | 1.0 | 1.0 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1.0 | 0.0 | PC 17599 | 71.2833 | C85 | C |
---|
2 | 3.0 | 1.0 | 3.0 | Heikkinen, Miss. Laina | female | 26.0 | 0.0 | 0.0 | STON/O2. 3101282 | 7.9250 | NaN | S |
---|
3 | 4.0 | 1.0 | 1.0 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1.0 | 0.0 | 113803 | 53.1000 | C123 | S |
---|
4 | 5.0 | 0.0 | 3.0 | Allen, Mr. William Henry | male | 35.0 | 0.0 | 0.0 | 373450 | 8.0500 | NaN | S |
---|
text.to_csv('result.csv')
任务四:使用DataFrame自带的方法join方法和append:完成任务二和任务三的任务
text_up = text_left_up.join(text_right_up)
text_down = text_left_down.join(text_right_down)
text = text_up.append(text_down)
text.head()
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked |
---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1.0 | 0.0 | A/5 21171 | 7.2500 | NaN | S |
---|
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1.0 | 0.0 | PC 17599 | 71.2833 | C85 | C |
---|
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0.0 | 0.0 | STON/O2. 3101282 | 7.9250 | NaN | S |
---|
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1.0 | 0.0 | 113803 | 53.1000 | C123 | S |
---|
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0.0 | 0.0 | 373450 | 8.0500 | NaN | S |
---|
任务五:使用Panads的merge方法和DataFrame的append方法:完成任务二和任务三的任务
text_up = pd.merge(text_left_up,text_right_up,left_index=True,right_index=True)
text_down = pd.merge(text_left_down,text_right_down,left_index=True,right_index=True)
text = text_up.append(text_down)
text.head()
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked |
---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1.0 | 0.0 | A/5 21171 | 7.2500 | NaN | S |
---|
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1.0 | 0.0 | PC 17599 | 71.2833 | C85 | C |
---|
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0.0 | 0.0 | STON/O2. 3101282 | 7.9250 | NaN | S |
---|
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1.0 | 0.0 | 113803 | 53.1000 | C123 | S |
---|
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0.0 | 0.0 | 373450 | 8.0500 | NaN | S |
---|
换一种角度看数据
stack函数
text2 = text.stack()
text2.to_csv('result2.csv')
df2 = pd.read_csv('result2.csv')
df2.head()
| Unnamed: 0 | Unnamed: 1 | 0 |
---|
0 | 0 | PassengerId | 1 |
---|
1 | 0 | Survived | 0 |
---|
2 | 0 | Pclass | 3 |
---|
3 | 0 | Name | Braund, Mr. Owen Harris |
---|
4 | 0 | Sex | male |
---|