数据科学【三】:dataframe基本操作(二)
google api
使用google book api (https://developers.google.com/books/docs/overview)获得数据。也就是拼接url查询信息。可能需要翻墙。
示例:查询相应主体的书籍
注意:json.loads()
返回的是json对象
import requests
import json
"""
Google Books Api
See: https://developers.google.com/books/
"""
def get(topic=""):
BASEURL = 'https://www.googleapis.com/books/v1/volumes'
headers = {'Content-Type': 'application/json'}
response = requests.get(BASEURL + "?q=" + topic, headers=headers)
if response.status_code == 200:
return json.loads(response.content.decode('utf-8'))
return response
python = get("Python")
data_science = get("Data Science")
data_analytics = get("Data Analysis")
machine_learning = get("Machine Learning")
deep_learning = get("Deep Learning")
json转dataframe
使用json_normalize
函数。
def json2df(book_json):
return pd.json_normalize(book_json['items'])
python_df = json2df(python)
data_science_df = json2df(data_science)
data_analytics_df = json2df(data_analytics)
machine_learning_df = json2df(machine_learning)
deep_learning_df = json2df(deep_learning)
python_df.to_csv("python.csv", index=False)
data_science_df.to_csv("data_science.csv", index=False)
data_analytics_df.to_csv("data_analytics.csv", index=False)
machine_learning_df.to_csv("machine_learning.csv", index=False)
deep_learning_df.to_csv("deep_learning.csv", index=False)
dataframe重命名列
使用dataframe对象的rename
函数。
python_df.rename(columns={'volumeInfo.title':'Title', 'volumeInfo.authors':'Authors'}, inplace=True)
data_science_df.rename(columns={'volumeInfo.title':'Title', 'volumeInfo.authors':'Authors'}, inplace=True)
data_analytics_df.rename(columns={'volumeInfo.title':'Title', 'volumeInfo.authors':'Authors'}, inplace=True)
machine_learning_df.rename(columns={'volumeInfo.title':'Title', 'volumeInfo.authors':'Authors'}, inplace=True)
deep_learning_df.rename(columns={'volumeInfo.title':'Title', 'volumeInfo.authors':'Authors'}, inplace=True)
dataframe添加列
使用concat
函数。
示例:为每个主题的dataframe添加一个“主题”列。
python_df['Topic'] = pd.Series(['Python']*python_df.shape[0])
data_science_df['Topic'] = pd.Series(['Data Science']*data_science_df.shape[0])
data_analytics_df['Topic'] = pd.Series(['Data Analysis']*data_analytics_df.shape[0])
machine_learning_df['Topic'] = pd.Series(['Machine Learning']*machine_learning_df.shape[0])
deep_learning_df['Topic'] = pd.Series(['Deep Learning']*deep_learning_df.shape[0])
all_df = pd.concat([python_df, data_science_df, data_analytics_df, machine_learning_df, deep_learning_df])
dataframe转csv
使用to_csv
函数。
all_df.to_csv("all_topics.csv", index=False)
根据值筛选dataframe行
使用str加string函数
示例:获得所有title 包含data的行
# your code here
contain_data_df = all_df[all_df['Title'].str.lower().str.contains("data")]
contain_data_df.head()
kind | id | etag | selfLink | Title | volumeInfo.subtitle | Authors | volumeInfo.publishedDate | volumeInfo.description | volumeInfo.industryIdentifiers | ... | volumeInfo.categories | saleInfo.listPrice.amount | saleInfo.listPrice.currencyCode | saleInfo.retailPrice.amount | saleInfo.retailPrice.currencyCode | saleInfo.buyLink | saleInfo.offers | accessInfo.epub.acsTokenLink | Topic | accessInfo.pdf.acsTokenLink | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
7 | books#volume | 6omNDQAAQBAJ | 8i8xnUEyo14 | https://www.googleapis.com/books/v1/volumes/6o... | Python Data Science Handbook | Essential Tools for Working with Data | [Jake VanderPlas] | 2016-11-21 | For many researchers, Python is a first-class ... | [{'type': 'ISBN_13', 'identifier': '9781491912... | ... | [Computers] | 59.99 | USD | 59.99 | USD | https://play.google.com/store/books/details?id... | [{'finskyOfferType': 1, 'listPrice': {'amountI... | NaN | Python | NaN |
0 | books#volume | vfi3DQAAQBAJ | Q0KYt+x/bgk | https://www.googleapis.com/books/v1/volumes/vf... | R for Data Science | Import, Tidy, Transform, Visualize, and Model ... | [Hadley Wickham, Garrett Grolemund] | 2016-12-12 | "This book introduces you to R, RStudio, and t... | [{'type': 'ISBN_13', 'identifier': '9781491910... | ... | [Computers] | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Data Science | NaN |
1 | books#volume | TFpVDwAAQBAJ | MqNMwqcnbUk | https://www.googleapis.com/books/v1/volumes/TF... | Data Science | NaN | [John D. Kelleher, Brendan Tierney] | 2018-04-13 | A concise introduction to the emerging field o... | [{'type': 'ISBN_13', 'identifier': '9780262535... | ... | [Computers] | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Data Science | NaN |
2 | books#volume | 6omNDQAAQBAJ | 86otz4UmKRE | https://www.googleapis.com/books/v1/volumes/6o... | Python Data Science Handbook | Essential Tools for Working with Data | [Jake VanderPlas] | 2016-11-21 | For many researchers, Python is a first-class ... | [{'type': 'ISBN_13', 'identifier': '9781491912... | ... | [Computers] | 59.99 | USD | 59.99 | USD | https://play.google.com/store/books/details?id... | [{'finskyOfferType': 1, 'listPrice': {'amountI... | NaN | Data Science | NaN |
3 | books#volume | xb29DwAAQBAJ | QKJ7stkk3Ac | https://www.googleapis.com/books/v1/volumes/xb... | Introduction to Data Science | Data Analysis and Prediction Algorithms with R | [Rafael A. Irizarry] | 2019-11-20 | Introduction to Data Science: Data Analysis an... | [{'type': 'ISBN_13', 'identifier': '9781000708... | ... | [Mathematics] | NaN | NaN | NaN | NaN | NaN | NaN | http://books.google.com/books/download/Introdu... | Data Science | http://books.google.com/books/download/Introdu... |
5 rows × 52 columns
map加lambda表达式
示例:筛选作者姓或名首字母为E的所有行
author_e_df = all_df[all_df['Authors'].map(lambda row: any(map(lambda x: x.split()[0][0]=='E' or x.split()[1][0]=='E', row)))]
author_e_df.head()
kind | id | etag | selfLink | Title | volumeInfo.subtitle | Authors | volumeInfo.publishedDate | volumeInfo.description | volumeInfo.industryIdentifiers | ... | volumeInfo.categories | saleInfo.listPrice.amount | saleInfo.listPrice.currencyCode | saleInfo.retailPrice.amount | saleInfo.retailPrice.currencyCode | saleInfo.buyLink | saleInfo.offers | accessInfo.epub.acsTokenLink | Topic | accessInfo.pdf.acsTokenLink | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
7 | books#volume | xDszEAAAQBAJ | ju27MhIAQrM | https://www.googleapis.com/books/v1/volumes/xD... | Build a Career in Data Science | NaN | [Emily Robinson, Jacqueline Nolis] | 2020-03-06 | Summary You are going to need more than techni... | [{'type': 'ISBN_13', 'identifier': '9781638350... | ... | [Computers] | 28.99 | USD | 28.99 | USD | https://play.google.com/store/books/details?id... | [{'finskyOfferType': 1, 'listPrice': {'amountI... | http://books.google.com/books/download/Build_a... | Data Science | NaN |
7 | books#volume | fBPEAgAAQBAJ | zEKoyMUn6e8 | https://www.googleapis.com/books/v1/volumes/fB... | Beginning Statistics with Data Analysis | NaN | [Frederick Mosteller, Stephen E. Fienberg, Rob... | 2013-11-20 | This introduction to the world of statistics c... | [{'type': 'ISBN_13', 'identifier': '9780486782... | ... | [Mathematics] | 24.95 | USD | 14.72 | USD | https://play.google.com/store/books/details?id... | [{'finskyOfferType': 1, 'listPrice': {'amountI... | http://books.google.com/books/download/Beginni... | Data Analysis | http://books.google.com/books/download/Beginni... |
3 | books#volume | NP5bBAAAQBAJ | evXnYOuFPGY | https://www.googleapis.com/books/v1/volumes/NP... | Introduction to Machine Learning | NaN | [Ethem Alpaydin] | 2014-08-29 | The goal of machine learning is to program com... | [{'type': 'ISBN_13', 'identifier': '9780262028... | ... | [Computers] | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Machine Learning | NaN |
4 | books#volume | AGQ4DQAAQBAJ | 4KxWueVGUyI | https://www.googleapis.com/books/v1/volumes/AG... | Machine Learning | The New AI | [Ethem Alpaydin] | 2016-10-07 | A concise overview of machine learning—compute... | [{'type': 'ISBN_13', 'identifier': '9780262529... | ... | [Computers] | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Machine Learning | NaN |
5 | books#volume | LrT4DwAAQBAJ | RCyLWjQWJQQ | https://www.googleapis.com/books/v1/volumes/Lr... | Introduction to Deep Learning | NaN | [Eugene Charniak] | 2019-01-29 | A project-based guide to the basics of deep le... | [{'type': 'ISBN_13', 'identifier': '9780262039... | ... | [Computers] | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Deep Learning | NaN |
5 rows × 52 columns