第1节、pandas的series的了解
import pandas as pd
t = pd.Series([21,1,3,4,5,6,7])
print(t)
0 21
1 1
2 3
3 4
4 5
5 6
6 7
dtype: int64
print(type(t))
<class ‘pandas.core.series.Series’>
通过字典创建Series
t2 = pd.Series([23,1,2,2,1],index=list('abcde'))
t2
a 23
b 1
c 2
d 2
e 1
dtype: int64
t2.astype(float)
a 23.0
b 1.0
c 2.0
d 2.0
e 1.0
dtype: float64
temp_dict = {"name":"xiaohong","age":30,"tel":10086}
t3 = pd.Series(temp_dict)
t3
age 30
name xiaohong
tel 10086
dtype: object
t3["age"]
30
t3[1]
‘xiaohong’
t3[["age","name"]]
age 30
name xiaohong
dtype: object
t3[:2]
age 30
name xiaohong
dtype: object
t3[[1,2]]
name xiaohong
tel 10086
dtype: object
t[["age","tel"]] #因t中无age,tel,所以NaN
age NaN
tel NaN
dtype: float64
t3[['age','tel']]
age 30
tel 10086
dtype: object
t3.index
Index([‘age’, ‘name’, ‘tel’], dtype=‘object’)
for i in t3.index:
print(i)
age
name
tel
type(t3.index)
pandas.indexes.base.Index
len(t3.index)
3
list(t3.index)
[‘age’, ‘name’, ‘tel’]
list(t3.index)[:2]
[‘age’, ‘name’]
t3.values
array([30, ‘xiaohong’, 10086], dtype=object)
type(t3.values)
numpy.ndarray
第二节:02pandas读取外部数据
pandas读csv文件:
带参数读取:
train_data = pd.read_csv(data_path+'train_dataset.csv',header =0,error_bad_lines = False,encoding='gbk')
train_data = pd.read_csv(data_path + 'train_dataset.csv', header=0, error_bad_lines=False)
不带参数读取:
input_file = sys.argv[1]
output_file = sys.argv[2]
data_frame = pd.read_csv(input_file)
print(data_frame)
#保存成csv文件
data_frame.to_csv(output_file, index=False)
读MongoDB数据:
from pymongo import MongoClient
import pandas as pd
client = MongoClient()
collection = client["douban"]["tv1"]
data = list(collection.find())
t1 = data[0]
t1 = pd.Series(t1)
print(t1)
读mysql数据:
pd.read_sql(sql_sentence,connection)