pandas 是一个 Python 数据分析包,主要目的是为了
数据分析。pandas 有两个主要的数据结构:
Series 和
DataFrame。
1 Serise对象
Series 是一个一维数组对象 ,它包含一组索引和一组数据,可以把它理解为一组带索引的数组。
>>> from pandas import Series,DataFrame
>>> series = Series([2,3,4,-5]) # 将列表转化为Series对象
>>> series
0 2
1 3
2 4
3 -5
dtype: int64
>>> dic ={"name":"lixia","sex":"女","age":22} # 将字典转化为Series对象
>>> series1 = Series(dic)
>>> series1
age 22
name lixia
sex 女
dtype: object
>>> series2 = Series([2,3,4,-5],index=["a","b","c","d"]) # 指定索引,默认的索引是0 1 2 3 4 ...
>>> series2
a 2
b 3
c 4
d -5
dtype: int64
>>> series2["a"] # 根据索引获取Series对象的值
2
>>> series2[["a","c"]] # 注意
a 2
c 4
dtype: int64
>>> series2[["a","b"]] = 10 # 修改值,可以同时修改多个
>>> series2
a 10
b 10
c 4
d -5
dtype: int64
>>> series2.index # 查看索引
Index(['a', 'b', 'c', 'd'], dtype='object')
>>> series2.values # 查看值
array([10, 10, 4, -5], dtype=int64)
>>> serise2 + 1
a 3
b 4
c 5
d -4
dtype: int64
>>> series2 + 1 # 计算,但Series对象值不变
a 11
b 11
c 5
d -4
dtype: int64
>>> series2 * 3
a 30
b 30
c 12
d -15
dtype: int64
>>> series2[series2 < 20]
a 10
b 10
c 4
d -5
dtype: int64
>>> series2 # 最后再查看一下series2
a 10
b 10
c 4
d -5
dtype: int64
2 DataFrame 是一个表格型的数据结构。它提供有序的列和不同类型的列值。
DataFrame将两个或多个Series统一为单个数据结构。
>>> dic
{'name': 'lixia', 'sex': '女', 'age': 22}
>>> df = DataFrame(dic)
Traceback (most recent call last):
File "<pyshell#42>", line 1, in <module>
df = DataFrame(dic)
File "C:\Users\Administrator.USER-20180412TT\AppData\Roaming\Python\Python36\site-packages\pandas\core\frame.py", line 330, in __init__
mgr = self._init_dict(data, index, columns, dtype=dtype)
File "C:\Users\Administrator.USER-20180412TT\AppData\Roaming\Python\Python36\site-packages\pandas\core\frame.py", line 461, in _init_dict
return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File "C:\Users\Administrator.USER-20180412TT\AppData\Roaming\Python\Python36\site-packages\pandas\core\frame.py", line 6163, in _arrays_to_mgr
index = extract_index(arrays)
File "C:\Users\Administrator.USER-20180412TT\AppData\Roaming\Python\Python36\site-packages\pandas\core\frame.py", line 6202, in extract_index
raise ValueError('If using all scalar values, you must pass'
ValueError: If using all scalar values, you must pass an index
>>> data = {"name":["lili","lala","lele"],"age":[22,23,24],"sex",["女","女","女"]}
SyntaxError: invalid syntax
>>> data = {"name":["lili","lala","lele"],"age":[22,23,24],"sex":["女","女","女"]}
>>> df = DataFrame(data) # 将该字典转化为DataFrame对象
>>> df
age name sex
0 22 lili 女
1 23 lala 女
2 24 lele 女
>>> # DataFrame 默认根据列名首字母顺序进行排序,可以传入一个列名的列表进行排序
>>> DataFrame(data,columns=["name","age","sex"])
name age sex
0 lili 22 女
1 lala 23 女
2 lele 24 女
>>> df["name"] # 获取数据方式1
0 lili
1 lala
2 lele
Name: name, dtype: object
>>> df.name # 获取数据方式2
0 lili
1 lala
2 lele
Name: name, dtype: object
>>> df["sex"] = "男" # 修改某一列
>>> df
age name sex
0 22 lili 男
1 23 lala 男
2 24 lele 男
>>> del df.sex # 删除某一列
Traceback (most recent call last):
File "<pyshell#53>", line 1, in <module>
del df.sex # 删除某一列
AttributeError: sex
>>> del df["sex"] # 删除某一列
>>> df
age name
0 22 lili
1 23 lala
2 24 lele