数据科学【三】:dataframe基本操作(二)

数据科学【三】:dataframe基本操作(二)

google api

使用google book api (https://developers.google.com/books/docs/overview)获得数据。也就是拼接url查询信息。可能需要翻墙。

示例:查询相应主体的书籍

注意:json.loads()返回的是json对象

import requests
import json

"""
    Google Books Api
    See: https://developers.google.com/books/
"""

def get(topic=""):
    BASEURL = 'https://www.googleapis.com/books/v1/volumes'
    headers = {'Content-Type': 'application/json'}

    response = requests.get(BASEURL + "?q=" + topic, headers=headers)

    if response.status_code == 200:
        return json.loads(response.content.decode('utf-8'))

    return response

python = get("Python")
data_science = get("Data Science")
data_analytics = get("Data Analysis")
machine_learning = get("Machine Learning")
deep_learning = get("Deep Learning")

json转dataframe

使用json_normalize函数。


def json2df(book_json):
    return pd.json_normalize(book_json['items'])

python_df = json2df(python)
data_science_df = json2df(data_science)
data_analytics_df = json2df(data_analytics)
machine_learning_df = json2df(machine_learning)
deep_learning_df = json2df(deep_learning)

python_df.to_csv("python.csv", index=False)
data_science_df.to_csv("data_science.csv", index=False)
data_analytics_df.to_csv("data_analytics.csv", index=False)
machine_learning_df.to_csv("machine_learning.csv", index=False)
deep_learning_df.to_csv("deep_learning.csv", index=False)

dataframe重命名列

使用dataframe对象的rename函数。


python_df.rename(columns={'volumeInfo.title':'Title', 'volumeInfo.authors':'Authors'}, inplace=True)
data_science_df.rename(columns={'volumeInfo.title':'Title', 'volumeInfo.authors':'Authors'}, inplace=True)
data_analytics_df.rename(columns={'volumeInfo.title':'Title', 'volumeInfo.authors':'Authors'}, inplace=True)
machine_learning_df.rename(columns={'volumeInfo.title':'Title', 'volumeInfo.authors':'Authors'}, inplace=True)
deep_learning_df.rename(columns={'volumeInfo.title':'Title', 'volumeInfo.authors':'Authors'}, inplace=True)


dataframe添加列

使用concat函数。
示例:为每个主题的dataframe添加一个“主题”列。

python_df['Topic'] = pd.Series(['Python']*python_df.shape[0])
data_science_df['Topic'] = pd.Series(['Data Science']*data_science_df.shape[0])
data_analytics_df['Topic'] = pd.Series(['Data Analysis']*data_analytics_df.shape[0])
machine_learning_df['Topic'] = pd.Series(['Machine Learning']*machine_learning_df.shape[0])
deep_learning_df['Topic'] = pd.Series(['Deep Learning']*deep_learning_df.shape[0])

all_df = pd.concat([python_df, data_science_df, data_analytics_df, machine_learning_df, deep_learning_df])

dataframe转csv

使用to_csv函数。

all_df.to_csv("all_topics.csv", index=False)

根据值筛选dataframe行

使用str加string函数

示例:获得所有title 包含data的行

# your code here
contain_data_df = all_df[all_df['Title'].str.lower().str.contains("data")]
contain_data_df.head()
kindidetagselfLinkTitlevolumeInfo.subtitleAuthorsvolumeInfo.publishedDatevolumeInfo.descriptionvolumeInfo.industryIdentifiers...volumeInfo.categoriessaleInfo.listPrice.amountsaleInfo.listPrice.currencyCodesaleInfo.retailPrice.amountsaleInfo.retailPrice.currencyCodesaleInfo.buyLinksaleInfo.offersaccessInfo.epub.acsTokenLinkTopicaccessInfo.pdf.acsTokenLink
7books#volume6omNDQAAQBAJ8i8xnUEyo14https://www.googleapis.com/books/v1/volumes/6o...Python Data Science HandbookEssential Tools for Working with Data[Jake VanderPlas]2016-11-21For many researchers, Python is a first-class ...[{'type': 'ISBN_13', 'identifier': '9781491912......[Computers]59.99USD59.99USDhttps://play.google.com/store/books/details?id...[{'finskyOfferType': 1, 'listPrice': {'amountI...NaNPythonNaN
0books#volumevfi3DQAAQBAJQ0KYt+x/bgkhttps://www.googleapis.com/books/v1/volumes/vf...R for Data ScienceImport, Tidy, Transform, Visualize, and Model ...[Hadley Wickham, Garrett Grolemund]2016-12-12"This book introduces you to R, RStudio, and t...[{'type': 'ISBN_13', 'identifier': '9781491910......[Computers]NaNNaNNaNNaNNaNNaNNaNData ScienceNaN
1books#volumeTFpVDwAAQBAJMqNMwqcnbUkhttps://www.googleapis.com/books/v1/volumes/TF...Data ScienceNaN[John D. Kelleher, Brendan Tierney]2018-04-13A concise introduction to the emerging field o...[{'type': 'ISBN_13', 'identifier': '9780262535......[Computers]NaNNaNNaNNaNNaNNaNNaNData ScienceNaN
2books#volume6omNDQAAQBAJ86otz4UmKREhttps://www.googleapis.com/books/v1/volumes/6o...Python Data Science HandbookEssential Tools for Working with Data[Jake VanderPlas]2016-11-21For many researchers, Python is a first-class ...[{'type': 'ISBN_13', 'identifier': '9781491912......[Computers]59.99USD59.99USDhttps://play.google.com/store/books/details?id...[{'finskyOfferType': 1, 'listPrice': {'amountI...NaNData ScienceNaN
3books#volumexb29DwAAQBAJQKJ7stkk3Achttps://www.googleapis.com/books/v1/volumes/xb...Introduction to Data ScienceData Analysis and Prediction Algorithms with R[Rafael A. Irizarry]2019-11-20Introduction to Data Science: Data Analysis an...[{'type': 'ISBN_13', 'identifier': '9781000708......[Mathematics]NaNNaNNaNNaNNaNNaNhttp://books.google.com/books/download/Introdu...Data Sciencehttp://books.google.com/books/download/Introdu...

5 rows × 52 columns

map加lambda表达式

示例:筛选作者姓或名首字母为E的所有行


author_e_df = all_df[all_df['Authors'].map(lambda row: any(map(lambda x: x.split()[0][0]=='E' or x.split()[1][0]=='E', row)))]
author_e_df.head()
kindidetagselfLinkTitlevolumeInfo.subtitleAuthorsvolumeInfo.publishedDatevolumeInfo.descriptionvolumeInfo.industryIdentifiers...volumeInfo.categoriessaleInfo.listPrice.amountsaleInfo.listPrice.currencyCodesaleInfo.retailPrice.amountsaleInfo.retailPrice.currencyCodesaleInfo.buyLinksaleInfo.offersaccessInfo.epub.acsTokenLinkTopicaccessInfo.pdf.acsTokenLink
7books#volumexDszEAAAQBAJju27MhIAQrMhttps://www.googleapis.com/books/v1/volumes/xD...Build a Career in Data ScienceNaN[Emily Robinson, Jacqueline Nolis]2020-03-06Summary You are going to need more than techni...[{'type': 'ISBN_13', 'identifier': '9781638350......[Computers]28.99USD28.99USDhttps://play.google.com/store/books/details?id...[{'finskyOfferType': 1, 'listPrice': {'amountI...http://books.google.com/books/download/Build_a...Data ScienceNaN
7books#volumefBPEAgAAQBAJzEKoyMUn6e8https://www.googleapis.com/books/v1/volumes/fB...Beginning Statistics with Data AnalysisNaN[Frederick Mosteller, Stephen E. Fienberg, Rob...2013-11-20This introduction to the world of statistics c...[{'type': 'ISBN_13', 'identifier': '9780486782......[Mathematics]24.95USD14.72USDhttps://play.google.com/store/books/details?id...[{'finskyOfferType': 1, 'listPrice': {'amountI...http://books.google.com/books/download/Beginni...Data Analysishttp://books.google.com/books/download/Beginni...
3books#volumeNP5bBAAAQBAJevXnYOuFPGYhttps://www.googleapis.com/books/v1/volumes/NP...Introduction to Machine LearningNaN[Ethem Alpaydin]2014-08-29The goal of machine learning is to program com...[{'type': 'ISBN_13', 'identifier': '9780262028......[Computers]NaNNaNNaNNaNNaNNaNNaNMachine LearningNaN
4books#volumeAGQ4DQAAQBAJ4KxWueVGUyIhttps://www.googleapis.com/books/v1/volumes/AG...Machine LearningThe New AI[Ethem Alpaydin]2016-10-07A concise overview of machine learning—compute...[{'type': 'ISBN_13', 'identifier': '9780262529......[Computers]NaNNaNNaNNaNNaNNaNNaNMachine LearningNaN
5books#volumeLrT4DwAAQBAJRCyLWjQWJQQhttps://www.googleapis.com/books/v1/volumes/Lr...Introduction to Deep LearningNaN[Eugene Charniak]2019-01-29A project-based guide to the basics of deep le...[{'type': 'ISBN_13', 'identifier': '9780262039......[Computers]NaNNaNNaNNaNNaNNaNNaNDeep LearningNaN

5 rows × 52 columns

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值