Python

最新推荐文章于 2022-11-05 23:19:46 发布

張恬簡_

最新推荐文章于 2022-11-05 23:19:46 发布

阅读量585

点赞数

文章标签：爬虫

本文链接：https://blog.csdn.net/MEAAphrorAQUA/article/details/109272372

版权

Python

简介
Python是纯粹的自由软件，源代码和解释器CPython遵循 GPL （GNU General Public License)协议。Python语法简洁清晰，特色之一是强制用空白符(white space)作为语句缩进。

Python具有丰富和强大的库。它常被昵称为胶水语言，能够把用其他语言制作的各种模块（尤其是C/C++）很轻松地联结在一起。常见的一种应用情形是，使用Python快速生成程序的原型（有时甚至是程序的最终界面），然后对其中有特别要求的部分，用更合适的语言改写，比如3D游戏中的图形渲染模块，性能要求特别高，就可以用C/C++重写，而后封装为Python可以调用的扩展类库。需要注意的是在您使用扩展类库时可能需要考虑平台问题，某些可能不提供跨平台的实现。

7月20日，IEEE发布2017年编程语言排行榜：Python高居首位 2() 。

2018年3月，该语言作者在邮件列表上宣布 Python 2.7将于2020年1月1日终止支持。用户如果想要在这个日期之后继续得到与Python 2.7有关的支持，则需要付费给商业供应商。

math.sqrt(81)
9.0
随机函数
choice(seq)
import random
#只能有序序列
random.choice(range(10))
randrange([start,] stop[,step])
random.randrange(100)
#开始，结束，步长
random.randrange(0,100,2)
random()
#生成0-1之间的实数
random.random()
seed([])
random.seed(123)
shuffle()
#随机排序
lst=[1,2,3,4,5]
random.shuffle(lst)
uniform(x,y)
#生成实数
random.uniform(1,100)

三角函数
sin(x)
x弧度的正弦值
#30°正弦值
import math
math.sin(math.pi/6)
cos(x)
x弧度的余弦值
#90°余弦值
import math
math.sin(math.pi/2)
tan(x)
x弧度的正切值
#45°正切值
import math
math.tan(math.pi/4)
asin(x)

x的反正弦弧度值
#0.5反正弦弧度值
import math
math.asin(0.5)
acos(x)
x的反余弦弧度值
atan(x)
x的反正切弧度值
hypot(x,y)
欧几里德范数
import math
#勾三股四玄五
math.hypot(3,4)
degrees(x)
将弧度转换为角度
radians(x)
将角度转换为弧度

字符串
截取
var1 = “Python Runoob”
print(var1[1:5])
转义字符
转义字符描述
\ (在行尾时) 续行符
\ \ 反斜杠符号
’ 单引号
" 双引号
\a 响铃
\b 退格(Backspace)
\e 转义
\000 空
\n 换行
\v 纵向制表符
\t 横向制表符
\r 回车
\f 换页
\oyy 八进制数，yy代表的字符，例如：\o12代表换行
\xyy 十六进制数，yy代表的字符，例如：\x0a代表换行
\other 其它的字符以普通格式输出
逻辑运算符
运算符表达式描述实例
and x and y 与：x 为 0或 false 返回 x 反之 y 8 and 9 返回 9
or x or y 或：x 为 0或 false 返回 y 反之 x 8 and 9 返回 8
not not x 非：x 为 0或 false 返回 True 反之 False not 0 返回 True
成员运算符
运算符描述实例
in 在内如果找到值返回 True 否则返回 false “a” in “abc” 返回True
not in 不在内如果找到值返回Ture 否则false 1 not in [2,3] 返回True
身份运算符
运算符描述实例
is 引用一个对象返回 True 否则返回 false “a” is"a" 返回True
is not 不是引用一个对象返回Ture 否则false 1 not in 2 返回True
函数
format()

#按位填充
“a{}cd{}f1{}345{}7”.format(“b”,“e”,2,6)
isalnum()

#判断至少有一个数字
“Python Runoob”.isalnum()

False
isalpha()

#判断至少有一个字符串
“Python Runoob”.isalpha()

True
isdigit()

#判断全是数字
“123”.isdigit()

True
lower()

#大转小
“SKHGJSDOHG”.lower()

skhgjsdohg
upper()

#小转大
“skhgjsdohg”.upper()

SKHGJSDOHG

max(str)

#最大的字母
max(“ABCDEFG”)

G

min(str)

#最小的字母
max(“ABCDEFG”)

A

count(str,beg,end)

#返回出现次数 beg开始位，end结束位
“skhgjsdohg”.count(“s”,0,3)

1
“skhgjsdohg”.count(“s”,0,6)

2

find

#返回索引值否则返回-1 beg开始位，end结束位
“skhgjsdohg”.find(“h”,0,3)

2

join(seq)

#选择字符串作为分隔符
“%”.join([“a”,“b”,“c”,“d”])

‘a%b%c%d’

len(str)

#字符串长度
len(“skhgjsdohg”)

10

replace()

#替换字符串
“skhgjsdohg”.replace(“s”,“S”)

‘SkhgjSdohg’

split(str,num,count)

#根据字符串截取数组
“skh gjsd ohg”.split(" ")

[‘skh’, ‘gjsd’, ‘ohg’]

strip([chars])

#删除首尾空格，或chars指定首尾字符
" skh gjsd ohg “.strip(” ")

‘skh gjsd ohg’
“skh gjsd ohg”.strip(“s”)

‘kh gjsd ohg’

方法描述
string.capitalize() 把字符串的第一个字符大写
string.center(width) 返回一个原字符串居中,并使用空格填充至长度 width 的新字符串
string.count(str, beg=0, end=len(string)) 返回 str 在 string 里面出现的次数，如果 beg 或者 end 指定则返回指定范围内 str 出现的次数
string.decode(encoding=‘UTF-8’, errors=‘strict’) 以 encoding 指定的编码格式解码 string，如果出错默认报一个 ValueError 的异常，除非 errors 指定的是 ‘ignore’ 或者’replace’
string.encode(encoding=‘UTF-8’, errors=‘strict’) 以 encoding 指定的编码格式编码 string，如果出错默认报一个ValueError 的异常，除非 errors 指定的是’ignore’或者’replace’
string.endswith(obj, beg=0, end=len(string)) 检查字符串是否以 obj 结束，如果beg 或者 end 指定则检查指定的范围内是否以 obj 结束，如果是，返回 True,否则返回 False.
string.expandtabs(tabsize=8) 把字符串 string 中的 tab 符号转为空格，tab 符号默认的空格数是 8。
string.find(str, beg=0, end=len(string)) 检测 str 是否包含在 string 中，如果 beg 和 end 指定范围，则检查是否包含在指定范围内，如果是返回开始的索引值，否则返回-1
string.format() 格式化字符串
string.index(str, beg=0, end=len(string)) 跟find()方法一样，只不过如果str不在 string中会报一个异常.
string.isalnum() 如果 string 至少有一个字符并且所有字符都是字母或数字则返回 True,否则返回 False
string.isalpha() 如果 string 至少有一个字符并且所有字符都是字母则返回 True,否则返回 False
string.isdecimal() 如果 string 只包含十进制数字则返回 True 否则返回 False.
string.isdigit() 如果 string 只包含数字则返回 True 否则返回 False.
string.islower() 如果 string 中包含至少一个区分大小写的字符，并且所有这些(区分大小写的)字符都是小写，则返回 True，否则返回 False
string.isnumeric() 如果 string 中只包含数字字符，则返回 True，否则返回 False
string.isspace() 如果 string 中只包含空格，则返回 True，否则返回 False.
string.istitle() 如果 string 是标题化的(见 title())则返回 True，否则返回 False
string.isupper() 如果 string 中包含至少一个区分大小写的字符，并且所有这些(区分大小写的)字符都是大写，则返回 True，否则返回 False
string.join(seq) 以 string 作为分隔符，将 seq 中所有的元素(的字符串表示)合并为一个新的字符串
string.ljust(width) 返回一个原字符串左对齐,并使用空格填充至长度 width 的新字符串
string.lower() 转换 string 中所有大写字符为小写.
string.lstrip() 截掉 string 左边的空格
string.maketrans(intab, outtab]) maketrans() 方法用于创建字符映射的转换表，对于接受两个参数的最简单的调用方式，第一个参数是字符串，表示需要转换的字符，第二个参数也是字符串表示转换的目标。
max(str) 返回字符串 str 中最大的字母。
min(str) 返回字符串 str 中最小的字母。
string.partition(str) 有点像 find()和 split()的结合体,从 str 出现的第一个位置起,把字符串 string 分成一个 3 元素的元组 (string_pre_str,str,string_post_str),如果 string 中不包含str 则 string_pre_str == string.
string.replace(str1, str2, num=string.count(str1)) 把 string 中的 str1 替换成 str2,如果 num 指定，则替换不超过 num 次.
string.rfind(str, beg=0,end=len(string) ) 类似于 find()函数，不过是从右边开始查找.
string.rindex( str, beg=0,end=len(string)) 类似于 index()，不过是从右边开始.
string.rjust(width) 返回一个原字符串右对齐,并使用空格填充至长度 width 的新字符串
string.rpartition(str) 类似于 partition()函数,不过是从右边开始查找.
string.rstrip() 删除 string 字符串末尾的空格.
string.split(str="", num=string.count(str)) 以 str 为分隔符切片 string，如果 num有指定值，则仅分隔 num 个子字符串
[string.splitlines(keepends]) 按照行(’\r’, ‘\r\n’, \n’)分隔，返回一个包含各行作为元素的列表，如果参数 keepends 为 False，不包含换行符，如果为 True，则保留换行符。
string.startswith(obj, beg=0,end=len(string)) 检查字符串是否是以 obj 开头，是则返回 True，否则返回 False。如果beg 和 end 指定值，则在指定范围内检查.
string.strip([obj]) 在 string 上执行 lstrip()和 rstrip()
string.swapcase() 翻转 string 中的大小写
string.title() 返回"标题化"的 string,就是说所有单词都是以大写开始，其余字母均为小写(见 istitle())
string.translate(str, del="") 根据 str 给出的表(包含 256 个字符)转换 string 的字符,要过滤掉的字符放到 del 参数中
string.upper() 转换 string 中的小写字母为大写
string.zfill(width) 返回长度为 width 的字符串，原字符串 string 右对齐，前面填充0
string.isdecimal() isdecimal()方法检查字符串是否只包含十进制字符。这种方法只存在于unicode对象。
列表
len(obj)
#列表值个数
len([1,2,3])

3

max(obj)
#最大值
max([1,2,3])

3

min(obj)
#最小值
min([1,2,3])

1

list()
#元组转换为列表
list((1,2,3))

[1,2,3]
list(“abc”)

[‘a’,‘b’,‘c’]

append(obj)
#追加列表元素
lst = [1,2,3]
lst.append(4)
lst

[1, 2, 3, 4]

lst = [1,2,3]
lst.append([4,5])
lst

[1, 2, 3, [4, 5]]

count(obj)
#元素在列表中出现的次数
lst = [1,2,3,1]
lst.count(1)

2

extend(seq)
#追加列表
olst = [1,2,3]
nlst = [4,5]
olst.extend(nlst)
olst

[1, 2, 3, 4, 5]

index(obj)
#元素在列表中第一次出现的位置
lst = [1,2,3,1]
lst.index(1)

0

insert(index,obj)
#列表中插入元素
lst = [1,2,3]
lst.insert(1,1.5)
lst

[1, 1.5, 2, 3]

pop(obj=list[-1])
#列表中移除一个指定元素，默认值是最后一个
lst = [1,2,3,4,5,6]
lst.pop(0)
lst

[2, 3, 4, 5, 6]

remove(obj)
#移除某个元素第一次出现的位置
lst = [1,1,2,3,4,5,6]
lst.remove(1)
lst

[1, 2, 3, 4, 5, 6]

reverse()
#翻转
lst = [1,1,2,3,4,5,6]
lst.reverse()
lst

[6, 5, 4, 3, 2, 1, 1]

sort([func])
#升序
lst = [3,6,2,9,6,7,0,9]
lst.sort()
lst

[0, 2, 3, 6, 6, 7, 9, 9]

#降序
lst = [3,6,2,9,6,7,0,9]
lst.sort(reverse=True)
lst

[9, 9, 7, 6, 6, 3, 2, 0]

clear()
#清空列表
lst = [3,6,2,9,6,7,0,9]
lst.clear()
lst

[]

copy()
#勤拷贝，地址不同，不修改原列表值
olst = [3,6,2,9,6,7,0,9]
nlst = olist.copy()

增删改查
#查
list1 = [‘physics’, ‘chemistry’, 1997, 2000]
list2 = [1, 2, 3, 4, 5, 6, 7 ]

print "list1[0]: ", list1[0]
print "list2[1:5]: ", list2[1:5]

#增
list = [] ## 空列表
list.append(‘Google’) ## 使用 append() 添加元素
list.append(‘Runoob’)

print list
[‘Google’, ‘Runoob’]

#删
list1 = [‘physics’, ‘chemistry’, 1997, 2000]

print list1
del list1[2]
print "After deleting value at index 2 : "
print list1
元组
字典
add
staff = {‘name’:‘bob’,‘age’:25}
staff[‘sex’] = ‘m’
staff

{‘age’: 25, ‘name’: ‘bob’, ‘sex’: ‘m’}

del
staff = {‘name’:‘bob’,‘age’:25}
del staff[‘age’]
staff

{‘name’: ‘bob’}

update
staff = {‘name’:‘bob’,‘age’:25}
staff[‘age’]=1
staff

{‘age’: 1, ‘name’: ‘bob’}

select
staff = {‘name’:‘bob’,‘age’:25}
staff[‘age’]

25

clear
staff = {‘name’:‘bob’,‘age’:25}
staff.clear()
staff

{}

delete
staff = {‘name’:‘bob’,‘age’:25}
del staff

json
#转换成json和字符串
import json
staff = {‘name’:‘bob’,‘age’:25}
json.dumps(staff)

‘{“name”: “bob”, “age”: 25}’

json.loads(staff)

{‘age’: 25, ‘name’: ‘bob’}

dict.copy()
#潜复制
staff = {‘name’:‘bob’,‘age’:25}
staff.copy()
staffcopy = staff.copy()
id(staff),id(staffcopy)

(92578008, 92506656)

dict.fromkeys(seq,val)
#创建字典，以seq元素作为键名，val作为键值
dict.fromkeys(range(10),1)

dict.get()
#键名寻找键值，不存在返回default值
staff = {‘name’:‘bob’,‘age’:25}
staff.get(“sex”,“none”)

‘none’

dict.items()
#返回可遍历的元组数组
staff = {‘name’:‘bob’,‘age’:25}
for k,v in staff.items():
print(k,v)

name bob
age 25

key in dict
#找键名返回True，不存在返回False
staff = {‘name’:‘bob’,‘age’:25}
"sex"in staff

False

dict.keys()
#返回所有键名
staff = {‘name’:‘bob’,‘age’:25}
staff.keys()

dict_keys([‘name’, ‘age’])

dict.values()
#返回所有键值
staff = {‘name’:‘bob’,‘age’:25}
staff.values()

dict_values([‘bob’, 25])

dict.setdefault(key,default)
#新增键，存在则不新增
staff = {‘name’:‘bob’,‘age’:25}
staff.setdefault(‘name’:‘bob’,“sex”,“m”)
staff

{‘age’: 25, ‘name’: ‘bob’, ‘sex’: ‘m’}

dict.update(dict2)
#更新nstaff更新到ostaff
ostaff = {‘name’:‘bob’,‘age’:25}
nstaff = {‘name’:‘bob’,‘sex’:‘m’}
ostaff.update(nstaff)
ostaff

{‘age’: 25, ‘name’: ‘bob’, ‘sex’: ‘m’}

dict.pop(key,default)
#删除键名所指的键值对,无此键名则返回default设定的值
staff = {‘name’:‘bob’,‘age’:25}
staff.pop(“age”,"")

25

dict.popitem()
#随机删除
staff = {‘name’:‘bob’,‘age’:25}
staff.popitem()

(‘age’, 25)

集合
集合运算
set1 = set(“abcxyzcba”)
set2 = set(“idea”)

符号描述过程结果

差集 set1-set2 {‘b’, ‘c’, ‘x’, ‘y’, ‘z’}
| 并集 set1|set2 {‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘i’, ‘x’, ‘y’, ‘z’}
& 交集 set1&set2 {‘a’}
^ 去同集（去掉相同） set1^set2 {‘b’, ‘c’, ‘d’, ‘e’, ‘i’, ‘x’, ‘y’, ‘z’}
条件控制语句
if…elif…else
if 条件:
do
elif 条件:
do
else:
do

while
cai = int(input(“1-100”))
key = 20
while cai != key:
if cai > key:
print(“大了”)
else:
print(“小了”)
cai = int(input(“1-100”))
print(“中了”)

for
a = [‘a’,‘b’,‘c’]
for e in a:
print(e)

a
b
c

break
a = [‘a’,‘b’,‘c’]
for e in a:
if e == “b”:
print(e)
break
print(“no”)

no
b

continue
a = [1,2,3,4]
for e in a:
if e >= 3:
continue
print(e)

1
2

符号描述
< 小于
<= 小于等于

大于
= 大于等于
== 恒等
!= 不等
in 属于
not in 不属于
自定义函数
def
def 函数名(参数)
语句

例：
def afunc(sex):
if sex == 0:
return(“man”)
else:
return(“woman”)
afunc(1)

def 函数名(*参数)
语句

例：
def average(*args):
s = 0
for i in args
s += i
return s/len(args)
average(1,2,3,4,5)

lambda
常规：
def fun(x,y):
return xy
fun(1,2)
lambda:
fun = lambda x,y:xy
fun(1,2)

全局变量
name = “a” #全局变量
def changeName():
global name #加上global则修改name
name = “b”
changeName()
name

类
Class
class Human:
age = 1
def language(self):
return “CH”
H = Human()

print(“Human的age:”,H.age,“Human的language:”,H.language())

Construction Class
extend
class 子类(父类)

private
class Human:
__age = 0

Human.__age

AttributeError: type object ‘Human’ has no attribute ‘__age’

模块
import module #导入某个模块
from module import function #导入模块中某个函数
from module inport func1,func2 #导入模块中多个函数
from module import * #导入模块中所有函数
文件
file object = open(file_name[,access_mode][,buffering])

模式描述
r 只读/指针开头/默认
rb 二进制只读/指针开头/默认/非文本
r+ 读写/指针开头
rb+ 二进制/读写/指针开头/非文本
w 写入/覆盖/建新
wb 二进制/写入/覆盖/创建非文本
w+ 读写/覆盖/创建
wb+ 二进制/读写/覆盖/创建/非文本
a 追加/指针结尾/创建
ab 二进制/追加/指针结尾/创建
a+ 读写/指针结尾/追加/创建
ab+ 二进制/追加/指针/结尾/创建
open()
fo = open(“D:/foo.txt”,“wb”)
print(fo.name,fo.closed,fo.mode)

close()
fo = open(“D:/foo.txt”,“wb”)
fo.close()

with open(“D:/foo.txt”,“wb”) as fo:
fo.write(b"hello")

write()
with open(“D:/foo.txt”,“w”) as fo:
fo.write(“www.baidu.com!\naaa!\n”)

read()
#读取到
with open(“D:/foo.txt”,“r+”) as fo:
str = fo.read(10)
print(str)

www.baidu.

tell()
#获取当前文件位置
with open(“D:/foo.txt”,“w”) as fo:
fo.write(“www.baidu.com!\naaa!\n”)

line = fo.read()
print “读取的数据为: %s” % (line)

获取当前文件位置

pos = fo.tell()
print “当前位置: %d” % (pos)

fo.close()

seek()
#调整指针位置
#seek(offset[, whence])
#offset – 开始的偏移量，也就是代表需要移动偏移的字节数
#whence：可选，默认值为 0。给offset参数一个定义，表示要从哪个位置开始偏移；0代表从文件开头开始算起，1代表从当前位置开始算起，2代表从文件末尾算起。

重新设置文件读取指针到开头

with open(“D:/foo.txt”,“w”) as fo:
fo.write(“www.baidu.com!\naaa!\n”)

line = fo.read()
print “读取的数据为: %s” % (line)

fo.seek(0, 0)
line = fo.read()
print “读取的数据为: %s” % (line)

fo.close()

rename()
import os
os.rename(“D:/foo.txt”,“D:/too.txt”)

remove()
import os
os.remove(“D:/too.txt”)

目录
mkdir()
import os
os.mkdir(“D:/too”)

rmdir()
#内含文件不能删
import os
os.rmdir(“D:/too”)

walk()
#遍历文件夹和文件
import os
os.walk(“D:/too”)

#删除文件夹下的所有文件夹和文件
for i in os.walk(“D:/too”):
for fname in i[2]:
full_name = os.path.join(i[0],fname)
os.remove(full_name)
for dirc in i[1]:
full_dir = os.path.join(i[0],dirc)
remove_dir(full_dir)
os.rmdir(“D:/too”)

getcwd()
os.getcwd()

异常处理
示例

numpy
属性
a.ndim
#维度
import numpy as np

a.shape
#各维度尺度

a.size
#元素的个数

a.dtype
#元素的类型

a.itemsize
#元素大小

np.arange(n)
#从0到n-1的ndarray类型
import numpy as np
a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

np.ones(shape)
#生成1的数组
import numpy as np
np.ones((3,4))

array([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])

np.zeros((shape),dtype=np.int32)
#生成int32型的0
import numpy as np
np.zeros((3,4),dtype=np.int32)

array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]])

np.full(shape,val)
#生成val
import numpy as np
np.full((2,3),6)

array([[6, 6, 6],
[6, 6, 6]])

np.eye(n)
#生成单位矩阵
import numpy as np
np.eye(5)

array([[1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.]])

np.ones_like(arr)
#按照数组形状生成1的数组
import numpy as np
a = np.eye(5)
np.ones_like(a)

array([[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.]])

np.zeros_like(arr)
#按照数组形状生成0的数组
import numpy as np
a = np.eye(5)
np.zeros_like(a)

array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]])

np.full_like(arr,val)
#按照数组形状生成定值的数组
import numpy as np
a = np.eye(5)
np.full_like(a,6)

array([[6., 6., 6., 6., 6.],
[6., 6., 6., 6., 6.],
[6., 6., 6., 6., 6.],
[6., 6., 6., 6., 6.],
[6., 6., 6., 6., 6.]])

np.linspace(start,end,isometry)
#范围内等距生成数组
import numpy as np
np.linspace(1,9,3)

array([1., 5., 9.])

数组维度变换
a.reshape(shape)
#生成行列
import numpy as np
a=np.arange(27)
a.reshape(3,9)

array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26]])

a.resize(shape)

a.swpaxes(ax1,ax2)
#轴交换
import numpy as np
a=np.arange(27).reshape(3,9)
a.swapaxes(0,1)

array([[ 0, 9, 18],
[ 1, 10, 19],
[ 2, 11, 20],
[ 3, 12, 21],
[ 4, 13, 22],
[ 5, 14, 23],
[ 6, 15, 24],
[ 7, 16, 25],
[ 8, 17, 26]])

a.flatten()
#数组压扁
import numpy as np
a=np.arange(24).reshape((2,3,4))
a.flatten()

array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,17, 18, 19, 20, 21, 22, 23])

数组的类型转换
a.astype(new_type)
import numpy as np
a=np.arange(24).reshape((2,3,4))
a.astype(float)

a.tolist()
import numpy as np
a=np.arange(24).reshape((2,3,4))
a.tolist()

数组的索引和切片
一维数组切片
#arr[起始:结束(不含):步长]
import numpy as np
a = np.arange(10)
a[1:4:2]

多维数组索引和切片
#索引
#arr[起始:结束(不含):步长]
import numpy as np
a=np.arange(24).reshape((2,3,4))
a[1,2,3]

23

#切片
#arr[起始:结束:步长]
import numpy as np
a=np.arange(24).reshape((2,3,4))
a[:,:,::2]

array([[[ 0, 2],
[ 4, 6],
[ 8, 10]],

      [[12, 14],
       [16, 18],
       [20, 22]]])

随机数函数
np.random.rand(int0,int1…intN)
#随机浮点数
import numpy as np
np.random.rand(2,3)

np.random.randn(int0,int1…intN)
#标准正态分布
import numpy as np
np.random.randn(2,3)

np.random.randint(low,high,(shape))
#随机整数或整数数组
import numpy as np
np.random.randint(1,10,(2,3))

统计函数
np.sum
#求和
import numpy as np
a=np.arange(24).reshape((2,3,4))
np.sum(a)
np.sum(a,axis=1)

np.mean
#总体平均值
import numpy as np
a=np.arange(24).reshape((2,3,4))
np.mean(a)
np.mean(a,axis=1)

np.std

np.var

np.min
#最小值
a=np.arange(24).reshape((2,3,4))
np.min(a)
np.min(a,axis=1)

np.max
#最大值
a=np.arange(24).reshape((2,3,4))
np.max(a)
np.max(a,axis=1)

np.argmin
#最小值的下标（一维）

np.argmax
#最大值的下标（一维）

np.median()

#中值

画图
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

x = np.linspace(0,2*np.pi,100)
y = np.sin(x)

plt.plot(x,y)
plt.axhline(0,linestyle="–")

pandas
import pandas as pd

导入
pd.read_csv(filename)
c = pd.read_csv(“D://a.csv”,encoding=“UTF-8”)

pd.read_table(filename)
c = pd.read_table(“D://a.csv”)

pd.read_execl(filename)
c = pd.read_excel(“D://BTC-buy.xls”,encoding=“UTF-8”)

pd.read_sql(query,connection_object)
[提示] 需要cmd里找pip.exe安装install pymysql

import pymysql
conn = pymysql.Connect(
host=‘127.0.0.1’,
port=3306,
user=‘root’,
passwd=‘root’,
db=‘test’,
charset=‘utf8mb4’
)
conn
query = “select * from salse”
df = pd.read_sql(query,conn)
df
conn.close()

pd.read_json()
json_str = ‘{“name”:[“jack”,“tom”],“age”:[18,20]}’
df = pd.read_json(json_str)
df

pd.read_html(url)
url=“http://quote.stockstar.com/”
dfs = pd.read_html(url,attrs={‘id’:‘table1’})
dfs[0]

pd.read_clipboard()
a,1
b,2
c,3
df = pd.read_clipboard(seq=",")

pd.DataFrame(dict)
dict1 = {“name”:[“jack”,“tom”],“age”:[18,20]}
df = pd.DataFrame(dict1)
df

导出
df.to_cvs(filename)
dict1 = {“name”:[“jack”,“tom”],“age”:[18,20]}
df = pd.DataFrame(dict1)
df.to_csv(“D://csv1.csv”,index=False)

df.to_excel(filename)
dict1 = {“name”:[“jack”,“tom”],“age”:[18,20]}
df = pd.DataFrame(dict1)
df.to_excel(“D://xls1.xls”,index=False)

df.to_sql(table_name,connection_object)
import pandas as pd
from sqlalchemy import create_engine
conn = create_engine(‘mysql+pymysql://root:root@localhost:3306/test?charset=utf8mb4’)

pd.io.sql.to_sql(df,“BTC_buy”,con=conn,if_exists=‘append’,index=False)

df.to_json(filename)
import pandas as pd
dict1 = {“name”:[“jack”,“tom”],“age”:[18,20]}
df = pd.DataFrame(dict1)
df.to_json(“D://json1.json”)

创建对象
pd.DataFrame(np.random.rand(row,column),column=list)
#创建DataFrame对象
import numpy as np
df = pd.DataFrame(np.random.rand(5,2),columns=[‘a’,‘b’])
df

pd.Series(list)
#创建索引
import numpy as np
arr = np.array([1,2,3,4])
s = pd.Series(arr)
s

pd.date_range()
#日期索引
import numpy as np
df = pd.DataFrame(np.random.rand(5,2),columns=[‘a’,‘b’])
df.index = pd.date_range(“2018-5-28”,periods=df.shape[0])
df

查看/检查数据
df.head(n)
#查看DataFrame前N行
import numpy as np
df = pd.DataFrame(np.random.rand(5,2),columns=[‘a’,‘b’])
df.head(2)

df.tail(n)
#查看DataFrame倒数N行
import numpy as np
df = pd.DataFrame(np.random.rand(5,2),columns=[‘a’,‘b’])
df.tail(2)

df.shape
#查看行，列数
import numpy as np
df = pd.DataFrame(np.random.rand(5,2),columns=[‘a’,‘b’])
df.shape

df.info()
#查看索引类型、数据类型、内存信息
import numpy as np
df = pd.DataFrame(np.random.rand(5,2),columns=[‘a’,‘b’])
df.info()

df.describe()
#查看数值列的汇总统计
import numpy as np
df = pd.DataFrame(np.random.rand(5,2),columns=[‘a’,‘b’])
df.describe()

s.value_counts(dropna=False)
#查看Series的唯一值和计数
conn = pymysql.Connect(
host=‘127.0.0.1’,
port=3306,
user=‘root’,
passwd=‘root’,
db=‘test’,
charset=‘utf8mb4’
)
conn
pd.read_sql(“select 位置 from salse”,conn)[“位置”].value_counts()

conn.close()

df.apply(pd.Series.value_counts)
#查看DataFrame中每列的唯一值
import numpy as np
df = pd.DataFrame(np.random.rand(5,2),columns=[‘a’,‘b’])

数据选取
df[col]
#根据列，以Series形式返回列
conn = pymysql.Connect(
host=‘127.0.0.1’,
port=3306,
user=‘root’,
passwd=‘root’,
db=‘test’,
charset=‘utf8mb4’
)
conn
query = “select * from salse”
df = pd.read_sql(query,conn)
df[‘位置’]

df[[col1,col2]]
#以DataFrame返回多列
df[[‘位置’,‘价格’]]

s.iloc[0]
#按位选取数据
df.iloc[0]
df[‘位置’].iloc[0]

s.iloc[‘index_one’]
#按索引选取数据
df[‘价格’].iloc[1]

df.iloc[0,:]
#返回第一行
df.iloc[0,:]

df.iloc[0,0]
#返回第一列第一个元素
df.iloc[0,0]

df.loc[‘index_one’,‘column_one’]
#按索引，列选取数据
df.loc[0,‘成交量’]

df.ix[0,‘column_one’]
#返回第一行的某一列对应元素
df.ix[0,‘成交量’]

数据清洗
df.columns=[‘a’,‘b’,‘c’]
#重命名列名
import numpy as np
df = pd.DataFrame(np.random.rand(5,2),columns=[‘a’,‘b’])
df.columns=[‘val1’,‘val2’]
df

df.isnull()
#检查DataFrame中空值，返回boolean数组
import numpy as np
df = pd.DataFrame(np.random.rand(5,2),columns=[‘a’,‘b’])
df.isnull()

df.notnull()
#检查DataFrame中非空值，返回boolean数组
import numpy as np
df = pd.DataFrame(np.random.rand(5,2),columns=[‘a’,‘b’])
df.notnull()

df.dropna()
#删除所有包含空值的行
import numpy as np
df = pd.DataFrame(np.random.rand(5,2),columns=[‘a’,‘b’])
df.dropna()

df.dropna(axis=1)
#删除所有包含空值的列
import numpy as np
df = pd.DataFrame(np.random.rand(5,2),columns=[‘a’,‘b’])
df.dropna(axis=1)

df.dropna(axis=1,thresh=n)
#删除所有小于n的非空值的列
import numpy as np
df = pd.DataFrame(np.random.rand(5,2),columns=[‘a’,‘b’])
df.dropna(axis=1,thresh=n)

df.fillna(x)
#用x替换DataFrame中所有空值
import numpy as np
df = pd.DataFrame(np.random.rand(5,2),columns=[‘a’,‘b’])
df.fillna(x)

df.rename(columns=lambda x:x+‘1’)
#批量更改列名
import numpy as np
df = pd.DataFrame(np.random.rand(5,2),columns=[‘a’,‘b’])
df.rename(columns=lambda x:x+‘1’)

df.rename(columns={‘old_name’:“new_name”})
#有选取的更改列名
import numpy as np
df = pd.DataFrame(np.random.rand(5,2),columns=[‘a’,‘b’])
df.rename(columns={‘b’:‘c’})

df.set_index(‘column_one’)
#更改索引列
import numpy as np
df = pd.DataFrame(np.random.rand(5,2),columns=[‘a’,‘b’])
df.set_index(‘a’)

df.rename(index=lambda x:x+1)
#批量重命名索引名
import numpy as np
df = pd.DataFrame(np.random.rand(5,2),columns=[‘a’,‘b’])
df.rename(index=lambda x:x+1)

s.replace(1,‘one’)
#用one代替所有等于1,
import numpy as np
s = pd.DataFrame(np.random.rand(5,2),columns=[‘a’,‘b’])
s.replace(1,‘one’)

s.replace([1,3],[‘one’,‘three’])
#用one代替所有等于1的，用three代替所有等于3的
import numpy as np
s = pd.DataFrame(np.random.rand(5,2),columns=[‘a’,‘b’])
s.replace([1,3],[‘one’,‘three’])

s.astype(float)
#将Series数据类型改为float类型
import numpy as np
df = pd.DataFrame({“name”:[“A”,“B”],“score”:[1,2]})
df.score = df.score.astype(float)
df.info()

s.fillna(s.mean())
#用某列的均值填充某列的空值
import numpy as np
df = pd.DataFrame(np.random.rand(5,2),columns=[‘a’,‘b’])