pandas:
numpy处理数值类型数据,pandas处理非数值型数据
常见数据类型:
1.series一维,带标签数组
2.dataframe二维,series容器
pandas的Serise创建:
pd.Series(np.arange(10),index=list(string.ascii_uppercase[;10]))
import pandas as pd pd.Series([1,2,31,12,3,4]) t = pd.Series([1,2,31,12,3,4]) t2 = pd.Series([1,2,3,4,5],index=list("abcde")) print(t2) temp_dict = {"name":"xiaoming","age":30,"tel":10086} t3=pd.Series(temp_dict) print(t3) #切片和索引 print(t3["age"]) print(t3["tel"]) print(t3[0]) print(t3[:2]) print(t3[[0,2]]) print(t3.index) for i in t3.index: print(i) print(len(t3.index)) print(list(t3.index)[:2]) print(t3.values)
Series对象本质上由两个数组构成,
一个数组构成对象的键(index,索引)一个数组构成对象的值(values),键->值
ndarray的很多方法都可以运用于series类型,例如argmax,clip
series具有where方法,但是结果和ndnarray不同
pandas之读取外部数据:
import pandas as pd from pymongo import MongoClient #pandas读取csv中文件 #df = pd.read_csv("./dogNames2.csv") #读取数据库文件:pd.read_sql(sql_sentence,connection) #print(df) client = MongoClient() collection = client["douban"]["tv1"] data = list(collection.find()) print(data)