2020.08.23 Datewhale组队学习 数据分析03 数据重构01


本章内容介绍数据重构,主要根据对于数据的理解进行有利于我们的数据重新整理。

import numpy as np
import pandas as pd

第二章 数据重构

数据的合并

任务二:使用concat方法:将数据train-left-up.csv和train-right-up.csv横向分别合并为两张表,并在上下合并为一张表,并保存这张表为result_up

text_left_up = pd.read_csv("data02/train-left-up.csv")
text_left_down = pd.read_csv("data02/train-left-down.csv")
text_right_up = pd.read_csv("data02/train-right-up.csv")
text_right_down = pd.read_csv("data02/train-right-down.csv")
text_up = pd.concat([text_left_up,text_right_up],axis = 1)
text_up.head()
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
01.00.03.0Braund, Mr. Owen Harrismale22.01.00.0A/5 211717.2500NaNS
12.01.01.0Cumings, Mrs. John Bradley (Florence Briggs Th...female38.01.00.0PC 1759971.2833C85C
23.01.03.0Heikkinen, Miss. Lainafemale26.00.00.0STON/O2. 31012827.9250NaNS
34.01.01.0Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01.00.011380353.1000C123S
45.00.03.0Allen, Mr. William Henrymale35.00.00.03734508.0500NaNS
text_down = pd.concat([text_left_down,text_right_down],axis = 1)
text_down.head()
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
044002Kvillner, Mr. Johan Henrik Johannessonmale31.000C.A. 1872310.500NaNS
144112Hart, Mrs. Benjamin (Esther Ada Bloomfield)female45.011F.C.C. 1352926.250NaNS
244203Hampe, Mr. Leonmale20.0003457699.500NaNS
344303Petterson, Mr. Johan Emilmale25.0103470767.775NaNS
444412Reynaldo, Ms. Encarnacionfemale28.00023043413.000NaNS
text = pd.concat([text_up,text_down])
text.head()
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
01.00.03.0Braund, Mr. Owen Harrismale22.01.00.0A/5 211717.2500NaNS
12.01.01.0Cumings, Mrs. John Bradley (Florence Briggs Th...female38.01.00.0PC 1759971.2833C85C
23.01.03.0Heikkinen, Miss. Lainafemale26.00.00.0STON/O2. 31012827.9250NaNS
34.01.01.0Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01.00.011380353.1000C123S
45.00.03.0Allen, Mr. William Henrymale35.00.00.03734508.0500NaNS
text.to_csv('result.csv')

任务四:使用DataFrame自带的方法join方法和append:完成任务二和任务三的任务

text_up = text_left_up.join(text_right_up)
text_down = text_left_down.join(text_right_down)
text = text_up.append(text_down)
text.head()
# text.to_csv('result.csv')
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.01.00.0A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.01.00.0PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.00.00.0STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01.00.011380353.1000C123S
4503Allen, Mr. William Henrymale35.00.00.03734508.0500NaNS

任务五:使用Panads的merge方法和DataFrame的append方法:完成任务二和任务三的任务

text_up = pd.merge(text_left_up,text_right_up,left_index=True,right_index=True)
text_down = pd.merge(text_left_down,text_right_down,left_index=True,right_index=True)
text = text_up.append(text_down)
text.head()
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.01.00.0A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.01.00.0PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.00.00.0STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01.00.011380353.1000C123S
4503Allen, Mr. William Henrymale35.00.00.03734508.0500NaNS

换一种角度看数据

stack函数

text2 = text.stack()
text2.to_csv('result2.csv')
df2 = pd.read_csv('result2.csv')
df2.head()
Unnamed: 0Unnamed: 10
00PassengerId1
10Survived0
20Pclass3
30NameBraund, Mr. Owen Harris
40Sexmale
已标记关键词 清除标记
©️2020 CSDN 皮肤主题: 游动-白 设计师:上身试试 返回首页