pandas基本操作
这里需要先说一下一种新的格式csv格式:
csv格式的特点:
1.纯文本
2.每条记录被相同的分隔符分割开
3.每条记录有相同的字段序列
pandas
导入表格数据
import pandas as pd
abs_path = "C:/Users/14037/Desktop/pandastest.csv"
df = pd.read_csv(abs_path)
print(df)
id name class age math_score English_sccore
0 0 zhao 1 16 90 87
1 1 qian 2 15 88 90
2 2 sun 3 16 69 89
3 3 li 1 15 89 91
4 4 zhou 3 17 59 85
5 5 wu 2 16 83 90
6 6 wang 2 16 83 87
取前几条数据用head()函数
print(df.head()) #默认取前五条,根据需要,在小括号里增加数字
id name class age math_score English_sccore
0 0 zhao 1 16 90 87
1 1 qian 2 15 88 90
2 2 sun 3 16 69 89
3 3 li 1 15 89 91
4 4 zhou 3 17 59 85
列名,索引信息
print(df.columns) # 列名
print(df.index) # 索引
Index(['id', 'name', 'class', 'age', 'math_score', 'English_sccore'], dtype='object')
RangeIndex(start=0, stop=7, step=1)
定位函数.loc(),具体每条信息详情
print(df.loc[0])
id 0
name zhao
class 1
age 16
math_score 90
English_sccore 87
Name: 0, dtype: object
信息筛选
1> 简单筛选。例如选出数学的成绩(列名中任意一个的信息)
print(df.math_score)
0 90
1 88
2 69
3 89
4 59
5 83
6 83
Name: math_score, dtype: int64
2> 筛选部分成绩
print(df.math_score>80)
0 True
1 True
2 False
3 True
4 False
5 False
6 False
Name: math_score, dtype: bool
3> 筛选输出表格形式
print(df[df.math_score>85])
id name class age math_score English_sccore
0 0 zhao 1 16 90 87
1 1 qian 2 15 88 90
3 3 li 1 15 89 91
4>复杂筛选
print(df[(df.math_score>85) & (df.English_sccore>=90)])
id name class age math_score English_sccore
1 1 qian 2 15 88 90
3 3 li 1 15 89 91
排序
.sort_values()函数,默认升序排序
print(df.sort_values(['math_score']))
id name class age math_score English_sccore
4 4 zhou 3 17 59 85
2 2 sun 3 16 69 89
5 5 wu 2 16 83 90
6 6 wang 2 16 83 87
1 1 qian 2 15 88 90
3 3 li 1 15 89 91
0 0 zhao 1 16 90 87
可更改排序方式,设置ascending=False,默认为True
print(df.sort_values(['math_score'],ascending=False))
id name class age math_score English_sccore
0 0 zhao 1 16 90 87
3 3 li 1 15 89 91
1 1 qian 2 15 88 90
5 5 wu 2 16 83 90
6 6 wang 2 16 83 87
2 2 sun 3 16 69 89
4 4 zhou 3 17 59 85
数学成绩相等时,按照英语成绩排序
print(df.sort_values(['math_score', 'English_sccore']))
id name class age math_score English_sccore
4 4 zhou 3 17 59 85
2 2 sun 3 16 69 89
6 6 wang 2 16 83 87
5 5 wu 2 16 83 90
1 1 qian 2 15 88 90
3 3 li 1 15 89 91
0 0 zhao 1 16 90 87
索引
score = {
'english': [90, 78, 89],
'math': [64, 78, 45],
'name': ['wang', 'li', 'sun']
}
df = pd.DataFrame(score)
print(df)
print(df.index)
english math name
0 90 64 wang
1 78 78 li
2 89 45 sun
RangeIndex(start=0, stop=3, step=1)
索引可以自己设置
score = {
'english': [90, 78, 89],
'math': [64, 78, 45],
'name': ['wang', 'li', 'sun']
}
# df = pd.DataFrame(score)
df = pd.DataFrame(score, index=['one','two','three'])
print(df)
print(df.index)
english math name
one 90 64 wang
two 78 78 li
three 89 45 sun
Index(['one', 'two', 'three'], dtype='object')
此时不存在数字索引,因此,不能用数字进行访问,否则会出错
print(df.loc['one'])
english 90
math 64
name wang
Name: one, dtype: object
当索引不是数字索引时,想要用数字索引,用iloc[0]
print(df.iloc[0])
english 90
math 64
name wang
Name: one, dtype: object
当索引是数字时,loc[]和iloc[]是一样的,但是索引不是数字时,想用数字索引,就需要用iloc[数字]
访问多行
print(df.iloc[:3])
id name class age math_score English_sccore
0 0 zhao 1 16 90 87
1 1 qian 2 15 88 90
2 2 sun 3 16 69 89
切片方式访问多行数据
print(df[:2])
id name class age math_score English_sccore
0 0 zhao 1 16 90 87
1 1 qian 2 15 88 90
取出表中所有值
print(df.values)
[[0 'zhao' 1 16 90 87]
[1 'qian' 2 15 88 90]
[2 'sun' 3 16 69 89]
[3 'li' 1 15 89 91]
[4 'zhou' 3 17 59 85]
[5 'wu' 2 16 83 90]
[6 'wang' 2 16 83 87]]
数学成绩的数值
print(df.math_score.values)
[90 88 69 89 59 83 83]
数学成绩
print(df['math_score'])
0 90
1 88
2 69
3 89
4 59
5 83
6 83
Name: math_score, dtype: int64
print(df.math_score.value_counts) #等价于df.math_score
<bound method IndexOpsMixin.value_counts of
0 90
1 88
2 69
3 89
4 59
5 83
6 83
获取多列
print(df[['math_score','English_sccore']])
math_score English_sccore
0 90 87
1 88 90
2 69 89
3 89 91
4 59 85
5 83 90
6 83 87