DataCamp的intermediate python学习笔记(001)

最新推荐文章于 2022-08-05 00:02:10 发布

小小白的新手之路

最新推荐文章于 2022-08-05 00:02:10 发布

阅读量1.3k

点赞数

分类专栏： datacamp学习笔记文章标签：数据分析索引

本文链接：https://blog.csdn.net/weixin_42002280/article/details/107624217

版权

datacamp学习笔记专栏收录该内容

1 篇文章 0 订阅

订阅专栏

DataCamp DataScientist系列之intermediate python的学习笔记(001)

个人感悟：接触python是从2017年1月开始的，中间的学习之路也是断断续续的，学了忘，忘了学。前几天通过datacamp重拾python的基本操作，发现很有必要将一些要点记录下来，形成系统的笔记。嗯，所以这篇笔记就这么诞生了。以后还要写sql的学习笔记，一个人学习太艰难了！加油！坚持写下去！

1.Dictionaries & Pandas
2.Logic, Control Flow and Filtering
3.Loops
4.需要再深入了解的函数

1.Dictionaries & Pandas
1.1 字典基本操作
1.1.1 Motivation for dictionaries（列表的索引操作）

提示 :

Use the index()method on countries to find the index of
’germany’. Store this index as ind_ger
使用index()函数打印
使用index()函数打印索引

代码实现

# Definition of countries and capital
countries = ['spain', 'france', 'germany', 'norway']
capitals = ['madrid', 'paris', 'berlin', 'oslo']
# Get index of 'germany': ind_ger
ind_ger=countries.index('germany')
# Use ind_ger to print out capital of Germany
print(capitals[ind_ger])

1.1.2 Access dictionary（创建字典以及访问）

提示 :
示例 europe[‘france’]

Check out which keys are in europe by calling the –keys()method on europe. Print out the result.
打印字典的所有键
Print out the value that belongs to the key 'norway'.
打印指定键‘norway’的值

代码实现

# Definition of dictionary
europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin', 'norway':'oslo' }
# Print out the keys in europe
print(europe.keys())
# Print out value that belongs to key 'norway'
print(europe['norway'])

结果

dict_keys(['norway', 'spain', 'france', 'germany'])
oslo

1.1.3 Dictionary Manipulation1（字典增加）

提示 :

Add the key 'italy' with the value 'rome' to europe.
增加一对健值对
To assert that 'italy' is now a key in europe, print out 'italy' in europe.
检验'italy' 是否在字典中
Add another key:value pair to europe: 'poland' is the key, 'warsaw' is the corresponding value.
增加一对健值对

代码实现

# Definition of dictionary
europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin', 'norway':'oslo' }
# Add italy to europe
europe['italy']='rome'
# Print out italy in europe
print('italy' in europe)
# Add poland to europe
europe['poland']='warsaw'
# Print europe
print(europe)

1.1.4 Dictionary Manipulation 2（字典更新&删除）

提示 :

更新某个键的值
删除字典中的键值对

代码实现

# Definition of dictionary
europe = {'spain':'madrid', 'france':'paris', 'germany':'bonn',
          'norway':'oslo', 'italy':'rome', 'poland':'warsaw',
          'australia':'vienna' }
# Update capital of germany
europe['germany']='berlin'
# Remove australia
del europe['australia']
# Print europe
print(europe)

1.1.5 dictionariception（多层字典）

提示 :

使用多级中括号打印France的capital
新创建一个字典
将新创建的字典嵌套到第一个字典里

代码实现

# Dictionary of dictionaries
europe = { 'spain': { 'capital':'madrid', 'population':46.77 },
           'france': { 'capital':'paris', 'population':66.03 },
           'germany': { 'capital':'berlin', 'population':80.62 },
           'norway': { 'capital':'oslo', 'population':5.084 } }
# Print out the capital of France
print(europe['france']['capital'])
# Create sub-dictionary data
data={'capital':'rome','population':59.83}
# Add data to europe under key 'italy'
europe['italy']=data
# Print europe
print(europe)

运行结果

paris
{'italy': {'population': 59.83, 'capital': 'rome'}, 'norway': {'population': 5.084, 'capital': 'oslo'}, 'spain': {'population': 46.77, 'capital': 'madrid'}, 'france': {'population': 66.03, 'capital': 'paris'}, 'germany': {'population': 80.62, 'capital': 'berlin'}}

1.2 pandas$dataframe初步
1.2.1 Dictionary to DataFrame (1)（字典转df）

提示 :
Use pd.DataFrame() to turn your dict into a DataFrame called cars.
将字典转为dataframe

代码实现

# Pre-defined lists
names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr =  [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]
# Import pandas as pd
import pandas as pd
# Create dictionary my_dict with three key:value pairs: my_dict
my_dict={'country':names,'drives_right':dr,'cars_per_cap':cpc}
# Build a DataFrame cars from my_dict: cars
cars=pd.DataFrame(my_dict)
# Print cars
print(cars)

运行结果

   cars_per_cap        country  drives_right
0           809  United States          True
1           731      Australia         False
2           588          Japan         False
3            18          India         False
4           200         Russia          True
5            70        Morocco          True
6            45          Egypt          True

1.2.2 Dictionary to DataFrame 2（添加df行索引）

提示
列表 → 字典 → dataframe
Specify the row labels by setting cars.index equal to row_labels
指定df行索引

代码实现

import pandas as pd
# Build cars DataFrame
names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr =  [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]
cars_dict = { 'country':names, 'drives_right':dr, 'cars_per_cap':cpc }
cars = pd.DataFrame(cars_dict)
print(cars)
# Definition of row_labels
row_labels = ['US', 'AUS', 'JPN', 'IN', 'RU', 'MOR', 'EG']
# Specify row labels of cars
cars.index=row_labels
# Print cars again
print(cars)

运行结果

   cars_per_cap        country  drives_right
0           809  United States          True
1           731      Australia         False
2           588          Japan         False
3            18          India         False
4           200         Russia          True
5            70        Morocco          True
6            45          Egypt          True
     cars_per_cap        country  drives_right
US            809  United States          True
AUS           731      Australia         False
JPN           588          Japan         False
IN             18          India         False
RU            200         Russia          True
MOR            70        Morocco          True
EG             45          Egypt          True

1.2.3 CSV to DataFrame1 （CSV转DataFrame1）

提示
pd.read_csv()

代码实现

import pandas as pd
#Import the cars.csv data: cars
cars=pd.read_csv('cars.csv')
#Print out cars
print(cars)

运行结果

  Unnamed: 0  cars_per_cap        country  drives_right
0         US           809  United States          True
1        AUS           731      Australia         False
2        JPN           588          Japan         False
3         IN            18          India         False
4         RU           200         Russia          True
5        MOR            70        Morocco          True
6         EG            45          Egypt          True

1.2.4 CSV to DataFrame1 （CSV转DataFrame2行索引设置）

提示：
将第一列设置为行索引

代码实现

# Import pandas as pd
import pandas as pd
# Fix import by including index_col
cars = pd.read_csv('cars.csv',index_col=0)
# Print out cars
print(cars)

运行结果

     cars_per_cap        country  drives_right
US            809  United States          True
AUS           731      Australia         False
JPN           588          Japan         False
IN             18          India         False
RU            200         Russia          True
MOR            70        Morocco          True
EG             45          Egypt          True

1.3 pandas$ dataframe filtering(datafame 筛选)
1.3.1 Square Brackets (1)

提示：

Use single square brackets to print out the country column of cars as a Pandas Series.
筛选出1列做series
Use double square brackets to print out the countrycolumn of cars as a Pandas DataFrame.
筛选出1列做df
Use double square brackets to print out a DataFrame with both the country and drives_right columns of cars, in this order.
筛选出两列做df

代码实现

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
print(cars)
### output
     cars_per_cap        country  drives_right
US            809  United States          True
AUS           731      Australia         False
JPN           588          Japan         False
IN             18          India         False
RU            200         Russia          True
MOR            70        Morocco          True
EG             45          Egypt          True
# Print out country column as Pandas Series
print(cars['country'])
### output
US     United States
AUS        Australia
JPN            Japan
IN             India
RU            Russia
MOR          Morocco
EG             Egypt
Name: country, dtype: object
# Print out country column as Pandas DataFrame
print(cars[['country']])
### output
           country
US   United States
AUS      Australia
JPN          Japan
IN           India
RU          Russia
MOR        Morocco
EG           Egypt
# Print out DataFrame with country and drives_right columns
print(cars[['country','drives_right']])
### output
           country  drives_right
US   United States          True
AUS      Australia         False
JPN          Japan         False
IN           India         False
RU          Russia          True
MOR        Morocco          True
EG           Egypt          True

1.3.2 Square Brackets (2) （筛选dataframe特定行）

提示

Select the first 3 observations from cars and print them out.
选出第3行的观察值
Select the fourth, fifth and sixth observation, corresponding to row indexes 3, 4 and 5, and print them out.
选出第3，4，5行的观察值

# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
# print(cars)
###output
     cars_per_cap        country  drives_right
US            809  United States          True
AUS           731      Australia         False
JPN           588          Japan         False
IN             18          India         False
RU            200         Russia          True
MOR            70        Morocco          True
EG             45          Egypt          True

# Print out first 3 observations
print(cars[0:3])
###output
     cars_per_cap        country  drives_right
US            809  United States          True
AUS           731      Australia         False
JPN           588          Japan         False
# Print out fourth, fifth and sixth observation
print(cars.iloc[[3,4,5]])
###output
     cars_per_cap  country  drives_right
IN             18    India         False
RU            200   Russia          True
MOR            70  Morocco          True

1.3.3 loc and iloc 1（dataframe行筛选按照row_index或row_lable）

     cars_per_cap        country  drives_right
US            809  United States          True
AUS           731      Australia         False
JPN           588          Japan         False
IN             18          India         False
RU            200         Russia          True
MOR            70        Morocco          True
EG             45          Egypt          True

提示 :
以下命令具有同样效果：
cars.loc[‘RU’]
cars.iloc[4]
返回pandas.core.series.Series

In [8]: cars.loc['RU']
Out[8]: 
cars_per_cap       200
country         Russia
drives_right      True
Name: RU, dtype: object

In [9]: cars.iloc[4]
Out[9]: 
cars_per_cap       200
country         Russia
drives_right      True
Name: RU, dtype: object

In [10]: type(cars.loc['RU'])
Out[10]: pandas.core.series.Series

提示 :
以下命令具有同样效果：
cars.loc[[‘RU’]]
cars.iloc[[4]]

In [5]: cars.loc[['RU']]
Out[5]: 
    cars_per_cap country  drives_right
RU           200  Russia          True

In [6]: type(cars.loc[['RU']])
Out[6]: pandas.core.frame.DataFrame

In [7]: cars.iloc[[4]]
Out[7]: 
    cars_per_cap country  drives_right
RU           200  Russia          True

提示 :
以下命令具有同样效果：
cars.loc[[‘RU’, ‘AUS’]]
cars.iloc[[4, 1]]

In [13]: cars.loc[['RU', 'AUS']]
Out[13]: 
     cars_per_cap    country  drives_right
RU            200     Russia          True
AUS           731  Australia         False

In [14]: cars.iloc[[4, 1]]
Out[14]: 
     cars_per_cap    country  drives_right
RU            200     Russia          True
AUS           731  Australia         False

1.3.4 loc and iloc 2（dataframe多行列区域筛选）

     cars_per_cap        country  drives_right
US            809  United States          True
AUS           731      Australia         False
JPN           588          Japan         False
IN             18          India         False
RU            200         Russia          True
MOR            70        Morocco          True
EG             45          Egypt          True

提示 :
以下命令具有同样效果：
cars.loc[‘IN’, ‘cars_per_cap’]
cars.iloc[3, 0]

In [1]: cars.loc['IN', 'cars_per_cap']
Out[1]: 18

In [2]: cars.iloc[3, 0]
Out[2]: 18

提示 :
以下命令具有同样效果：
cars.loc[[‘IN’, ‘RU’], ‘cars_per_cap’]
cars.iloc[[3, 4], 0]

In [3]: cars.loc[['IN', 'RU'], 'cars_per_cap']
Out[3]: 
IN     18
RU    200
Name: cars_per_cap, dtype: int64

In [4]: cars.iloc[[3, 4], 0]
Out[4]: 
IN     18
RU    200
Name: cars_per_cap, dtype: int64

提示 :
以下命令具有同样效果：
cars.loc[[‘IN’, ‘RU’], [‘cars_per_cap’, ‘country’]]
cars.iloc[[3, 4], [0, 1]]

In [5]: cars.loc[['IN', 'RU'], ['cars_per_cap', 'country']]
Out[5]: 
    cars_per_cap country
IN            18   India
RU           200  Russia

In [6]: cars.iloc[[3, 4], [0, 1]]
Out[6]: 
    cars_per_cap country
IN            18   India
RU           200  Russia

 # Print out drives_right value of Morocco
        
print(cars.loc[['MOR'],['drives_right']])
#
		drives_right
MOR          True
# Print sub-DataFrame
print(cars.loc[['RU','MOR'],['country','drives_right']])
#output
     country  drives_right
RU    Russia          True
MOR  Morocco          True

1.3.5 loc and iloc 3（dataframe全行单或多列区域筛选）

提示 :
以下命令具有同样效果：
cars.loc[:, ‘country’]
cars.iloc[:, 1]

In [2]: cars.loc[:, 'country']
Out[2]: 
US     United States
AUS        Australia
JPN            Japan
IN             India
RU            Russia
MOR          Morocco
EG             Egypt
Name: country, dtype: object

In [3]: cars.iloc[:, 1]
Out[3]: 
US     United States
AUS        Australia
JPN            Japan
IN             India
RU            Russia
MOR          Morocco
EG             Egypt
Name: country, dtype: object

提示 :
以下命令具有同样效果：
cars.loc[:, [‘country’,‘drives_right’]]
cars.iloc[:, [1, 2]]

In [4]: cars.loc[:, ['country','drives_right']]
Out[4]: 
           country  drives_right
US   United States          True
AUS      Australia         False
JPN          Japan         False
IN           India         False
RU          Russia          True
MOR        Morocco          True
EG           Egypt          True

In [5]: cars.iloc[:, [1, 2]]
Out[5]: 
           country  drives_right
US   United States          True
AUS      Australia         False
JPN          Japan         False
IN           India         False
RU          Russia          True
MOR        Morocco          True
EG           Egypt          True

In [6]: # Print out drives_right column as Series
... print(cars.loc[:,'drives_right'])
###output
US      True
AUS    False
JPN    False
IN     False
RU      True
MOR     True
EG      True
Name: drives_right, dtype: bool

In [7]: # Print out drives_right column as DataFrame
... print(cars.loc[:,['drives_right']])
###output
     drives_right
US           True
AUS         False
JPN         False
IN          False
RU           True
MOR          True
EG           True

In [8]: # Print out cars_per_cap and drives_right as DataFrame
... print(cars.loc[:,['cars_per_cap','drives_right']])
###output
     cars_per_cap  drives_right
US            809          True
AUS           731         False
JPN           588         False
IN             18         False
RU            200          True
MOR            70          True
EG             45          True

2. Logic, Control Flow and Filtering
2.1 Boolean operators with array
2.1.1 Boolean operators with Numpy（数组数组大小判断中布尔值的运用

提示 :
To use these operators with Numpy, you will need np.logical_and(), np.logical_or()and np.logical_not(). Here’s an example on the my_house and your_house arrays from before to give you an idea:

# Create arrays
import numpy as np
my_house = np.array([18.0, 20.0, 10.75, 9.50])
your_house = np.array([14.0, 24.0, 14.25, 9.0])

# my_house greater than 18.5 or smaller than 10
print(np.logical_or(my_house>18.5,my_house<10))
#output
[False  True False  True]
# Both my_house and your_house smaller than 11
# np.logical_and
print(np.logical_and(my_house<11,your_house<11))
#output
[False  True False  True]

2.2 Filtering pandas DataFrames（DataFrame数值条件筛选）
数据概览

# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
#output
     cars_per_cap        country  drives_right
US            809  United States          True
AUS           731      Australia         False
JPN           588          Japan         False
IN             18          India         False
RU            200         Russia          True
MOR            70        Morocco          True
EG             45          Egypt          True

2.2.1 Driving right
要求：筛选出“靠右驾驶"的国家

步骤1 :

Extract the drives_right column as a Pandas Series and store it as dr.

In [4]: dr=cars['drives_right']
In [5]: dr
Out[5]: 
US      True
AUS    False
JPN    False
IN     False
RU      True
MOR     True
EG      True
Name: drives_right, dtype: bool

步骤2 :

Use dr, a boolean Series, to subset the cars DataFrame. Store the resulting selection in sel.

In [6]: # Use dr to subset cars: sel
... sel=cars[dr==True]

In [7]: sel
Out[7]: 
     cars_per_cap        country  drives_right
US            809  United States          True
RU            200         Russia          True
MOR            70        Morocco          True
EG             45          Egypt          True

Convert the code on the right to a one-liner that calculates the variable sel as before.
以上步骤可以简略为一行

In [1]:  sel= cars[cars['drives_right']]
 		print(sel)
##output
     cars_per_cap        country  drives_right
US            809  United States          True
RU            200         Russia          True
MOR            70        Morocco          True
EG             45          Egypt          True

2.2.2 Cars per capita1
要求：筛选出大于'cars_per_cap'列大于500的国家

步骤 :

Select the cars_per_cap column from cars as a Pandas Series and store it as cpc
Use cpc in combination with a comparison operator and 500. You want to end up with a boolean Series that’s True if the corresponding country has a cars_per_cap of more than500andFalse otherwise. Store this boolean Series as many_cars.
Use many_cars to subset cars, similar to what you did before. Store the result as car_maniac.
Print out car_maniac to see if you got it right.

代码实现

# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
# Create car_maniac: observations that have a cars_per_cap over 500
cpc=cars['cars_per_cap']
many_cars=cpc>500
car_maniac=cars[many_cars]
# Print car_maniac
print(car_maniac)

运行结果

     cars_per_cap        country  drives_right
US            809  United States          True
AUS           731      Australia         False
JPN           588          Japan         False

2.2.3 Cars per capita 2（多条件筛选）

要求：
Use the code sample above to create a DataFrame medium, that includes all the observations of cars that have a cars_per_cap· between 100 and 500.
Print out medium.
提示：
Remember about np.logical_and(), np.logical_or() and np.logical_not()
cpc = cars['cars_per_cap']
between = np.logical_and(cpc > 10, cpc < 80)
medium = cars[between]

代码实现

# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
# Import numpy, you'll need this
import numpy as np
# Create medium: observations with cars_per_cap between 100 and 500
cpc=cars['cars_per_cap']
between=np.logical_and(cpc<500,cpc>100)
medium=cars[between]
# Print medium
print(medium)

运行结果

    cars_per_cap country  drives_right
RU           200  Russia          True

看看中间变量长啥样

In [2]: cpc
Out[2]: 
US     809
AUS    731
JPN    588
IN      18
RU     200
MOR     70
EG      45
Name: cars_per_cap, dtype: int64

In [3]: between
Out[3]: 
US     False
AUS    False
JPN    False
IN     False
RU      True
MOR    False
EG     False
Name: cars_per_cap, dtype: bool

3. Loops
3.1 Loop over Numpy array
3.1.1 Loop over Numpy array（一维或二维数组遍历打印）

In [3]: np_baseball[:6]
Out[3]: 
array([[ 74, 180],
       [ 74, 215],
       [ 72, 210],
       [ 72, 210],
       [ 73, 188],
       [ 69, 176]])

In [4]: np_height[:6]
Out[4]: array([74, 74, 72, 72, 73, 69])

# Import numpy as np
import numpy as np
# For loop over np_height 一维数组
for x in np_height:
    print('%s inches'%x)

# For loop over np_baseball 二维数组,会先打印第一列
for i in np.nditer(np_baseball):
    print(i)

3.2 Loop over DataFrame
3.2.1 Loop over DataFrame1

提示：按照索引遍历每行
for lab, row in brics.iterrows() :

数据总览

 In [2]: cars
Out[2]: 
  cars_per_cap        country  drives_right
US            809  United States          True
AUS           731      Australia         False
JPN           588          Japan         False
IN             18          India         False
RU            200         Russia          True
MOR            70        Morocco          True
EG             45          Egypt          True

代码实现

# Iterate over rows of cars
for lab,row in cars.iterrows():
 print(lab)
 print(row)

运行结果

US
cars_per_cap              809
country         United States
drives_right             True
Name: US, dtype: object
AUS
cars_per_cap          731
country         Australia
drives_right        False
Name: AUS, dtype: object
JPN
cars_per_cap      588
country         Japan
drives_right    False
Name: JPN, dtype: object
IN
cars_per_cap       18
country         India
drives_right    False
Name: IN, dtype: object
RU
cars_per_cap       200
country         Russia
drives_right      True
Name: RU, dtype: object
MOR
cars_per_cap         70
country         Morocco
drives_right       True
Name: MOR, dtype: object
EG
cars_per_cap       45
country         Egypt
drives_right     True
Name: EG, dtype: object

3.2.2 Add column (1)（增加列操作）

Use aforloop to add a new column, named COUNTRY, that contains a uppercase version of the country names in the "country" column. You can use the string method upper() for this.
To see if your code worked, print out cars. Don’t indent this code, so that it’s not part of the for loop.

代码实现

# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
# Code for loop that adds COUNTRY column
for lab,row in cars.iterrows():
    cars.loc[lab,'COUNTRY']=row['country'].upper()
# Print cars
print(cars)

运行结果

     cars_per_cap        country  drives_right        COUNTRY
US            809  United States          True  UNITED STATES
AUS           731      Australia         False      AUSTRALIA
JPN           588          Japan         False          JAPAN
IN             18          India         False          INDIA
RU            200         Russia          True         RUSSIA
MOR            70        Morocco          True        MOROCCO
EG             45          Egypt          True          EGYPT

3.2.2 Add column (2)（增加列操作apply()）

use apply()

代码实现

# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
# Use .apply(str.upper)
cars["COUNTRY"] = cars["country"].apply(str.upper)
print(cars)

运行结果

    cars_per_cap        country  drives_right        COUNTRY
US            809  United States          True  UNITED STATES
AUS           731      Australia         False      AUSTRALIA
JPN           588          Japan         False          JAPAN
IN             18          India         False          INDIA
RU            200         Russia          True         RUSSIA
MOR            70        Morocco          True        MOROCCO
EG             45          Egypt          True          EGYPT

4.需要再深入了解的函数

apply() 的使用

先写到这里啦，以后还会完善的！

小小白的新手之路

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
DataCamp的intermediate python学习笔记(001)

DataCamp DataScientist系列之intermediate python的学习笔记(001)个人感悟：接触python是从2017年1月开始的，中间的学习之路也是断断续续的，学了忘，忘了学。前几天通过datacamp重拾python的基本操作，发现很有必要将一些要点记录下来，形成系统的笔记。嗯，所以这篇笔记就这么诞生了。以后还要写sql的学习笔记，一个人学习太艰难了！加油！坚持写下去！1.要求2.代码解释3.程序实现界面4.知识点总结5.未能实现的功能...
复制链接

扫一扫