1.理论部分
1.重命名函数rename
reviews.rename(columns={'points': 'score'})
reviews.rename(index={0: 'firstEntry', 1: 'secondEntry'})
reviews.rename_axis("wines", axis='rows').rename_axis("fields", axis='columns')
2.合并函数有concat(),join()和merge(),其中merge的功能可被join代替
concat函数将相同的数据类型数据合并
canadian_youtube = pd.read_csv("../input/youtube-new/CAvideos.csv")
british_youtube = pd.read_csv("../input/youtube-new/GBvideos.csv")
pd.concat([canadian_youtube, british_youtube])
join将部分相同的数据类型的数据合并
left = canadian_youtube.set_index(['title', 'trending_date'])
right = british_youtube.set_index(['title', 'trending_date'])
left.join(right, lsuffix='_CAN', rsuffix='_UK')
2.实践部分
1.region_1
and region_2
are pretty uninformative names for locale columns in the dataset. Create a copy of reviews
with these columns renamed to region
and locale
, respectively.
renamed = reviews.rename(columns={'region_1':'region','region_2':'locale'})
2.Set the index name in the dataset to wines
.
reindexed = reviews.rename_axis("wines",axis = 'index')
3.合并两个数据文件
gaming_products = pd.read_csv("../input/things-on-reddit/top-things/top-things/reddits/g/gaming.csv")
gaming_products['subreddit'] = "r/gaming"
movie_products = pd.read_csv("../input/things-on-reddit/top-things/top-things/reddits/m/movies.csv")
movie_products['subreddit'] = "r/movies"combined_products = pd.concat([gaming_products, movie_products])
Create a DataFrame
of products mentioned on either subreddit.
combined_products = pd.concat([gaming_products, movie_products])
4.合并
powerlifting_meets = pd.read_csv("../input/powerlifting-database/meets.csv")
powerlifting_competitors = pd.read_csv("../input/powerlifting-database/openpowerlifting.csv")
Both tables include references to a MeetID
, a unique key for each meet (competition) included in the database. Using this, generate a dataset combining the two tables into one.
powerlifting_combined = powerlifting_meets.set_index("MeetID").join(powerlifting_competitors.set_index("MeetID"))