【pandas cookbook学习笔记】第一、二章

本文介绍了MySQL中的存储过程和游标的使用。存储过程用于保存和重复执行复杂的SQL操作,可以处理业务规则和智能逻辑。游标则允许在存储过程中逐行处理数据,特别适用于循环和条件判断。同时,文中提到了如何创建和使用游标,并展示了在数据处理中游标的运用。此外,还探讨了数据类型和Pandas库在数据分析中的应用,包括DataFrame和Series的数据操作、数据信息查看以及数据处理方法。
摘要由CSDN通过智能技术生成

2021/3/8

《MySQL必知必会》

【180/254】
存储过程(procedure)被保存在服务器上以供使用。
存储过程(有点像MySQL的函数)的创建、删除、调用。
存储过程并不显示结果,而是把结果返回给你指定的变量。
它不能通过一个参数返回多个行和列;
MySQL中的变量全部都是以@开头,展示变量用select @variable1_name,@variable2_name;
存储过程的价值体现在业务规则和智能处理。
declare total decimal(8,2)定义变量total;

create procedure odertotal(
	in onumber int,
	in taxable boolean, #布尔值
	out ototal decimal(8,2)
) comment 'Obtain order total, optionally adding tax'
#comment值不是必需的,但如果给出,将在show procedure status的结果中显示
begin
	-- declare variable for total
	declare total decimal(8,2); #定义变量,及其数据类型
	--- declare taxrate percentage 
	declare taxrate int default 6; #可选默认值为6
	-- get the order total
	select sum(item_price*quantity)
	from orderitems
	where order_num=onumber
	into total;
	-- is this taxable?
	if taxable then 
		-- yes,so add taxrate to the total
		select total+(total/100*taxrate) into total;
	end if;
	-- and finally, save to our variable
	select total into ototal;
end;

2021/3/9

《MySQL必知必会》

【187/254】

  • 为什么使用游标?游标是什么?
    在这里插入图片描述

游标只存在于存储过程和函数,当存储过程处理完成后,游标消失。

  • 使用游标
    在这里插入图片描述

  • declare语句次序:它在游标或句柄之前,句柄必须在游标之后定义,否则报错。

create procedure processorders()
begin 
	-- declare local variable
	declare done boolean default 0;
	declare o int;
	declare t decimal(8,2);
	-- declare the cursor
	declare odernumbers cursor
	for 
	select oder_num from orders;
	-- declare contiune handler
	declare continue handler for sqlstate '02000' set done=1;
	-- create a table to store the results
	create table if not exist odertotals # 如果它不存在的话,创建一个新表
		(oder_num int , total decimal(8,2));
	-- open the cursor
	open ordernumbers;
	-- loop through all rows
	repeat
		-- get oder number
		fetch odernumbers into o
		-- get the total for this order
		call ordertotal(o,1,t);
		-- insert order and total into ordertotals
		insert into ordertotals(order_num,total)
		values(o,t);
		-- end of looop
		until done end repeat;
	-- close the cursor 
	close odernumbers;
end;
# 此存储过程不返回数据,但是创建并填充了另一个表
# 用select语句查看
select * from odertotals;

【194/254】
触发器(trigger)背景:有些处理需要在某个表发生更改时自动进行;
在这里插入图片描述
其他MySQL语句不支持触发器。

Pandas Cookbook

DataFrame : a two-diemensional dataset;
Series : a one-diemensional dataset;

  • method chaining
    在这里插入图片描述
    by convention 按照惯例;
    alignment 队列;
    在这里插入图片描述
  • dataframe取前几行,取后几行
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    数据类型:
    在这里插入图片描述
    在这里插入图片描述
    展示数据信息的一些方法:
    在这里插入图片描述
    .dtypes attribute will show the column as an object( or O on the series)
    Almost all of pandas data types are built from Numpy.
    .info method prints the data type information in addition to the count of not-null values , also lists the amount of memory used by the DataFrame

2021/3/10

a=[1,2,3]
b=['apple','pear','nut']
data={'iid':a,'fruit':b}
import pandas as pd
data=pd.DataFrame(data)
data.info() # .info() used for DataFrame, not list
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   iid     3 non-null      int64 
 1   fruit   3 non-null      object
dtypes: int64(1), object(1)
memory usage: 176.0+ bytes
a.info()
Traceback (most recent call last):
  File "D:\PyCharm2020\python2020\lib\site-packages\IPython\core\interactiveshell.py", line 3427, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-11-ea9be6f27180>", line 1, in <module>
    a.info()
AttributeError: 'list' object has no attribute 'info'
data.dtypes
Out[12]: 
 # .info() prints information on the screen, but .dtypes attribute returns a Pandas series if you needed to use the data
iid       int64
fruit    object
dtype: object
data_dtypes=data.dtypes
type(data_dtypes)
Out[14]: pandas.core.series.Series
data_info=data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   iid     3 non-null      int64 
 1   fruit   3 non-null      object
dtypes: int64(1), object(1)
memory usage: 176.0+ bytes
type(data_info)
Out[16]: NoneType
#--------------------------------------------

a single column of data is a series.

# how to select one column from one datafarme
data.iid 
# method1 : use attribute access to accomplish the same task
Out[17]: 
0    1
1    2
2    3
Name: iid, dtype: int64
data["iid"] 
# method2 : pass a column name as a string to the indexing operator to select a series of data
Out[18]: 
0    1
1    2
2    3
Name: iid, dtype: int64
# .loc and .iloc attributes to pull out a Series
# .loc pulls out a Series by column name , or say , label-based
data.loc[:,'fruit']
Out[19]: 
0    apple
1     pear
2      nut
Name: fruit, dtype: object
# .iloc pulls out a Series by positon , or say , positional-based
data.iloc[:,1]
Out[20]: 
0    apple
1     pear
2      nut
Name: fruit, dtype: object

monospace font : 等宽字体

data['fruit'].index
Out[21]: RangeIndex(start=0, stop=3, step=1)
data['fruit'].dtype
Out[22]: dtype('O')
data['fruit'].dtypes
Out[23]: dtype('O')
data['fruit'].name
Out[24]: 'fruit'
data['fruit'].size
Out[25]: 3
type(data['fruit'])
Out[26]: pandas.core.series.Series
data['fruit'].apply(type).unique() # chain unique method onto the result
Out[27]: array([<class 'str'>], dtype=object)
data['fruit'].apply(type) # look at a Series that has the type of every member
Out[28]: 
0    <class 'str'>
1    <class 'str'>
2    <class 'str'>
Name: fruit, dtype: object

follow Python naming conventions
the index operator : [

data.head(2)
Out[29]: 
   iid  fruit
0    1  apple
1    2   pear
data.sample(n=2)
Out[30]: 
   iid fruit
1    2  pear
2    3   nut
data.fruit.sample(2)
Out[31]: 
0    apple
1     pear
Name: fruit, dtype: object
data.fruit.value_counts()
# calculate the frequencies
Out[32]: 
pear     1
nut      1
apple    1
Name: fruit, dtype: int64
data.fruit.shape # .shape attributions
Out[33]: (3,)
len(data.fruit) # the built-in len function
Out[34]: 3
data.fruit.unique() # .unique method
Out[35]: array(['apple', 'pear', 'nut'], dtype=object)
data.fruit.count() # not return the count of items, rather the number of non-missing values
Out[36]: 3
# basic summary statistics 
data.iid.min()
Out[37]: 1
data.iid.max()
Out[38]: 3
data.iid.std()
Out[39]: 1.0
data.iid.mean()
Out[40]: 2.0
data.iid.median()
Out[41]: 2.0

be rounded to the nearest thousand : 圆整到千

data.fruit.isna()
# .isna method can be used to determine whether each individual value is missing or not, and its result is a Series
Out[44]: 
0    False
1    False
2    False
Name: fruit, dtype: bool
# to remove the entries in Series element with missing values, use the .dropna method
data.fruit.value_counts(normalize=True)
# setting normalize parameter to True for the relative frequencies
Out[45]: 
pear     0.333333
nut      0.333333
apple    0.333333
Name: fruit, dtype: float64
data.fruit.hasnans
# whether there exists missing values in the Series
Out[46]: False
data.fruit.notna()
Out[47]: 
0    True
1    True
2    True
Name: fruit, dtype: bool

在这里插入图片描述
作用在Series上的operator,做向量化的运算,而不是操作循环。
在这里插入图片描述

  • 相比于符号算子,调用方法的好处:
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
  • dunder
    在这里插入图片描述
# chaining Series methods
data.fruit.values_count().head(2).T
# .T propetry means transpose
# count the number of missing values 
data.fruit.isna().sum()
# pandas treats False and True as 0 and 1,and .sum() returns the number of missing values
  • 对含缺失值的列做数据类型转换
    在这里插入图片描述

2021/3/12

[64/627]

# insert one new column into a specific location in a DataFrame with the .insert method
# it takes the integer position of the new column as its first argument, the name of the new column as its second, the values as the third 
# .get_loc index method to find the integer location of the column name
# .insert method modifies the calling DataFrame in-place, so there won't be an assignment statement, and it also returns None.
#Goal: the profit = gross-budget, and insert it after gross column
profit_index=movies.columns.get_loc("gross")+1
movies.insert(loc=profit_index,column="profit",value=movies["gross"]-movies["budget"])

select multiple DataFrame columns

# pass in a list with a single item, then we get back a DataFrame
# pass in a string with the column name, then we get back a Series
type(data["cone2"])
Out[22]: pandas.core.series.Series
type(data[["cone2"]])
Out[23]: pandas.core.frame.DataFrame
#---
# use .loc to put out a column by name
data
Out[26]: 
   id cone
0   1    a
1   2    b
2   3    c
3   4    d
type(data.loc[:,["cone"]])
Out[27]: pandas.core.frame.DataFrame
type(data.loc[:,"cone"])
Out[28]: pandas.core.series.Series


在这里插入图片描述

  • you’d better save all your column names to a list variable first, and then choose them .
    在这里插入图片描述
  • selecting columns with methods
# use the .select_dtypes method to select only the integer columns
movies.select_dtypes(include="int").head() # take out the first 5 rows
movies.select_dtypes(include="number").head() # select all the numeric columns
movies.select_dtypes(include=["int","object"]) # select the integer and string columns
movies.select_dtypes(exclude="float") # exclude only floating-point columns
# an alternative method to select columns is with .filter method, and search column names or index labels based on which parameter is used 
movies.filter(like="fb") # something like ambiguous search

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

  • .select_dtypes() 中 数据类型的详细解释
    在这里插入图片描述
  • 给字段排序的一些规矩
    在这里插入图片描述
    在这里插入图片描述
    [75/627]
c1=["cone"]
c2=["ctwo"]
c0=["id"]
c=(c0+c2+c1)
c
Out[35]: ['id', 'ctwo', 'cone']
type(c)
Out[36]: list
data[c] # movies[['movie_title','director_name']] creates a new DataFrame
Out[37]: 
   id    ctwo cone #new order
0   1   apple    a
1   2  banana    b
2   3    pear    c
3   4   peach    d
data
Out[38]: 
   id cone    ctwo
0   1    a   apple
1   2    b  banana
2   3    c    pear
3   4    d   peach
  • Summarizing a DataFrame
    在这里插入图片描述
    some details about .shape etc as follows:
    在这里插入图片描述

在这里插入图片描述
关于聚集函数中的 skipna参数
在这里插入图片描述

  • Chaining DataFrame methods
    One of the keys to method chaining is to know the exact object being returned each step of the chain.In pandas, this will nearly always be a DataFrame, Series, or scalar value.
    使用.isnull() .sum() any()判断一个数据框里面有没有空值,有几个空值.
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    parenthese n.括号
    对于某些聚集方法无法对含缺失值的列做聚集,解决策略是把缺失值补上(如果是字符串,补成“ ”),再call method.
    在这里插入图片描述

  • DataFrame operation
    when an arithmetic or comparison operator is used with a DataFrame, each value of each column gets the operations applied to it .
    if the DataFrame does not contain homogeneous data(e.g. contains both numeric and object data types), then the operation is likely to fail.
    关于数据圆整:
    在这里插入图片描述
    在这里插入图片描述
    检测两个数据框是否相等:
    在这里插入图片描述
    the .equals method determines if all the elements and indexes between two DataFrames are exactly the same and returns a Boolean.

  • Comparing missing values
    在这里插入图片描述
    missing values do not compare equally with one another.
    thus, instead of using == to find missing numbers, use the .isna method
    比较两个数据框的正确姿势:
    在这里插入图片描述

  • Transposing the direction of a DataFrame operation
    some details about axis parameter:
    在这里插入图片描述
    在这里插入图片描述

在这里插入图片描述
在这里插入图片描述

# use .loc to select rows based on the index label 
# if there were still missing values, we could have run the .fillna(0) method to fill all the remaining values with 0
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值