2021/3/8
《MySQL必知必会》
【180/254】
存储过程(procedure)被保存在服务器上以供使用。
存储过程(有点像MySQL的函数)的创建、删除、调用。
存储过程并不显示结果,而是把结果返回给你指定的变量。
它不能通过一个参数返回多个行和列;
MySQL中的变量全部都是以@
开头,展示变量用select @variable1_name,@variable2_name;
存储过程的价值体现在业务规则和智能处理。
declare total decimal(8,2)
定义变量total;
create procedure odertotal(
in onumber int,
in taxable boolean, #布尔值
out ototal decimal(8,2)
) comment 'Obtain order total, optionally adding tax'
#comment值不是必需的,但如果给出,将在show procedure status的结果中显示
begin
-- declare variable for total
declare total decimal(8,2); #定义变量,及其数据类型
--- declare taxrate percentage
declare taxrate int default 6; #可选默认值为6
-- get the order total
select sum(item_price*quantity)
from orderitems
where order_num=onumber
into total;
-- is this taxable?
if taxable then
-- yes,so add taxrate to the total
select total+(total/100*taxrate) into total;
end if;
-- and finally, save to our variable
select total into ototal;
end;
2021/3/9
《MySQL必知必会》
【187/254】
- 为什么使用游标?游标是什么?
游标只存在于存储过程和函数,当存储过程处理完成后,游标消失。
-
使用游标
-
declare
语句次序:它在游标或句柄之前,句柄必须在游标之后定义,否则报错。
create procedure processorders()
begin
-- declare local variable
declare done boolean default 0;
declare o int;
declare t decimal(8,2);
-- declare the cursor
declare odernumbers cursor
for
select oder_num from orders;
-- declare contiune handler
declare continue handler for sqlstate '02000' set done=1;
-- create a table to store the results
create table if not exist odertotals # 如果它不存在的话,创建一个新表
(oder_num int , total decimal(8,2));
-- open the cursor
open ordernumbers;
-- loop through all rows
repeat
-- get oder number
fetch odernumbers into o
-- get the total for this order
call ordertotal(o,1,t);
-- insert order and total into ordertotals
insert into ordertotals(order_num,total)
values(o,t);
-- end of looop
until done end repeat;
-- close the cursor
close odernumbers;
end;
# 此存储过程不返回数据,但是创建并填充了另一个表
# 用select语句查看
select * from odertotals;
【194/254】
触发器(trigger)背景:有些处理需要在某个表发生更改时自动进行;
其他MySQL语句不支持触发器。
Pandas Cookbook
DataFrame : a two-diemensional dataset;
Series : a one-diemensional dataset;
- method chaining
by convention 按照惯例;
alignment 队列;
- dataframe取前几行,取后几行
数据类型:
展示数据信息的一些方法:
.dtypes
attribute will show the column as an object( or O on the series)
Almost all of pandas data types are built from Numpy.
.info
method prints the data type information in addition to the count of not-null values , also lists the amount of memory used by the DataFrame
2021/3/10
a=[1,2,3]
b=['apple','pear','nut']
data={'iid':a,'fruit':b}
import pandas as pd
data=pd.DataFrame(data)
data.info() # .info() used for DataFrame, not list
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 iid 3 non-null int64
1 fruit 3 non-null object
dtypes: int64(1), object(1)
memory usage: 176.0+ bytes
a.info()
Traceback (most recent call last):
File "D:\PyCharm2020\python2020\lib\site-packages\IPython\core\interactiveshell.py", line 3427, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-11-ea9be6f27180>", line 1, in <module>
a.info()
AttributeError: 'list' object has no attribute 'info'
data.dtypes
Out[12]:
# .info() prints information on the screen, but .dtypes attribute returns a Pandas series if you needed to use the data
iid int64
fruit object
dtype: object
data_dtypes=data.dtypes
type(data_dtypes)
Out[14]: pandas.core.series.Series
data_info=data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 iid 3 non-null int64
1 fruit 3 non-null object
dtypes: int64(1), object(1)
memory usage: 176.0+ bytes
type(data_info)
Out[16]: NoneType
#--------------------------------------------
a single column of data is a series.
# how to select one column from one datafarme
data.iid
# method1 : use attribute access to accomplish the same task
Out[17]:
0 1
1 2
2 3
Name: iid, dtype: int64
data["iid"]
# method2 : pass a column name as a string to the indexing operator to select a series of data
Out[18]:
0 1
1 2
2 3
Name: iid, dtype: int64
# .loc and .iloc attributes to pull out a Series
# .loc pulls out a Series by column name , or say , label-based
data.loc[:,'fruit']
Out[19]:
0 apple
1 pear
2 nut
Name: fruit, dtype: object
# .iloc pulls out a Series by positon , or say , positional-based
data.iloc[:,1]
Out[20]:
0 apple
1 pear
2 nut
Name: fruit, dtype: object
monospace font
: 等宽字体
data['fruit'].index
Out[21]: RangeIndex(start=0, stop=3, step=1)
data['fruit'].dtype
Out[22]: dtype('O')
data['fruit'].dtypes
Out[23]: dtype('O')
data['fruit'].name
Out[24]: 'fruit'
data['fruit'].size
Out[25]: 3
type(data['fruit'])
Out[26]: pandas.core.series.Series
data['fruit'].apply(type).unique() # chain unique method onto the result
Out[27]: array([<class 'str'>], dtype=object)
data['fruit'].apply(type) # look at a Series that has the type of every member
Out[28]:
0 <class 'str'>
1 <class 'str'>
2 <class 'str'>
Name: fruit, dtype: object
follow Python naming conventions
the index operator : [
data.head(2)
Out[29]:
iid fruit
0 1 apple
1 2 pear
data.sample(n=2)
Out[30]:
iid fruit
1 2 pear
2 3 nut
data.fruit.sample(2)
Out[31]:
0 apple
1 pear
Name: fruit, dtype: object
data.fruit.value_counts()
# calculate the frequencies
Out[32]:
pear 1
nut 1
apple 1
Name: fruit, dtype: int64
data.fruit.shape # .shape attributions
Out[33]: (3,)
len(data.fruit) # the built-in len function
Out[34]: 3
data.fruit.unique() # .unique method
Out[35]: array(['apple', 'pear', 'nut'], dtype=object)
data.fruit.count() # not return the count of items, rather the number of non-missing values
Out[36]: 3
# basic summary statistics
data.iid.min()
Out[37]: 1
data.iid.max()
Out[38]: 3
data.iid.std()
Out[39]: 1.0
data.iid.mean()
Out[40]: 2.0
data.iid.median()
Out[41]: 2.0
be rounded to the nearest thousand : 圆整到千
data.fruit.isna()
# .isna method can be used to determine whether each individual value is missing or not, and its result is a Series
Out[44]:
0 False
1 False
2 False
Name: fruit, dtype: bool
# to remove the entries in Series element with missing values, use the .dropna method
data.fruit.value_counts(normalize=True)
# setting normalize parameter to True for the relative frequencies
Out[45]:
pear 0.333333
nut 0.333333
apple 0.333333
Name: fruit, dtype: float64
data.fruit.hasnans
# whether there exists missing values in the Series
Out[46]: False
data.fruit.notna()
Out[47]:
0 True
1 True
2 True
Name: fruit, dtype: bool
作用在Series上的operator,做向量化的运算,而不是操作循环。
- 相比于符号算子,调用方法的好处:
- dunder
# chaining Series methods
data.fruit.values_count().head(2).T
# .T propetry means transpose
# count the number of missing values
data.fruit.isna().sum()
# pandas treats False and True as 0 and 1,and .sum() returns the number of missing values
- 对含缺失值的列做数据类型转换
2021/3/12
[64/627]
# insert one new column into a specific location in a DataFrame with the .insert method
# it takes the integer position of the new column as its first argument, the name of the new column as its second, the values as the third
# .get_loc index method to find the integer location of the column name
# .insert method modifies the calling DataFrame in-place, so there won't be an assignment statement, and it also returns None.
#Goal: the profit = gross-budget, and insert it after gross column
profit_index=movies.columns.get_loc("gross")+1
movies.insert(loc=profit_index,column="profit",value=movies["gross"]-movies["budget"])
select multiple DataFrame columns
# pass in a list with a single item, then we get back a DataFrame
# pass in a string with the column name, then we get back a Series
type(data["cone2"])
Out[22]: pandas.core.series.Series
type(data[["cone2"]])
Out[23]: pandas.core.frame.DataFrame
#---
# use .loc to put out a column by name
data
Out[26]:
id cone
0 1 a
1 2 b
2 3 c
3 4 d
type(data.loc[:,["cone"]])
Out[27]: pandas.core.frame.DataFrame
type(data.loc[:,"cone"])
Out[28]: pandas.core.series.Series
- you’d better save all your column names to a list variable first, and then choose them .
- selecting columns with methods
# use the .select_dtypes method to select only the integer columns
movies.select_dtypes(include="int").head() # take out the first 5 rows
movies.select_dtypes(include="number").head() # select all the numeric columns
movies.select_dtypes(include=["int","object"]) # select the integer and string columns
movies.select_dtypes(exclude="float") # exclude only floating-point columns
# an alternative method to select columns is with .filter method, and search column names or index labels based on which parameter is used
movies.filter(like="fb") # something like ambiguous search
.select_dtypes()
中 数据类型的详细解释
- 给字段排序的一些规矩
[75/627]
c1=["cone"]
c2=["ctwo"]
c0=["id"]
c=(c0+c2+c1)
c
Out[35]: ['id', 'ctwo', 'cone']
type(c)
Out[36]: list
data[c] # movies[['movie_title','director_name']] creates a new DataFrame
Out[37]:
id ctwo cone #new order
0 1 apple a
1 2 banana b
2 3 pear c
3 4 peach d
data
Out[38]:
id cone ctwo
0 1 a apple
1 2 b banana
2 3 c pear
3 4 d peach
- Summarizing a DataFrame
some details about.shape
etc as follows:
关于聚集函数中的 skipna
参数
-
Chaining DataFrame methods
One of the keys to method chaining is to know the exact object being returned each step of the chain.In pandas, this will nearly always be a DataFrame, Series, or scalar value.
使用.isnull()
.sum()
any()
判断一个数据框里面有没有空值,有几个空值.
parenthese n.括号
对于某些聚集方法无法对含缺失值的列做聚集,解决策略是把缺失值补上(如果是字符串,补成“ ”),再call method.
-
DataFrame operation
when an arithmetic or comparison operator is used with a DataFrame, each value of each column gets the operations applied to it .
if the DataFrame does not contain homogeneous data(e.g. contains both numeric and object data types), then the operation is likely to fail.
关于数据圆整:
检测两个数据框是否相等:
the.equals
method determines if all the elements and indexes between two DataFrames are exactly the same and returns a Boolean. -
Comparing missing values
missing values do not compare equally with one another.
thus, instead of using==
to find missing numbers, use the.isna
method
比较两个数据框的正确姿势:
-
Transposing the direction of a DataFrame operation
some details aboutaxis
parameter:
# use .loc to select rows based on the index label
# if there were still missing values, we could have run the .fillna(0) method to fill all the remaining values with 0