image.png
介绍
Pandas是一个易于使用且功能强大的数据库分析库。与NumPy一样,它可以矢量化大多数基本操作,即使在CPU上也可以并行计算,从而加快计算速度。这里指定的操作非常基础,但如果您刚开始使用Pandas则非常重要。您将被要求将pandas导入为'pd',然后使用'pd'对象执行其他基本的pandas操作。
- 如何从CSV文件或文本文件中读取数据?
CSV文件以逗号分隔,因此为了读取CSV文件,请执行以下操作:
df = pd.read_csv(file_path, sep=’,’, header = 0, index_col=False,names=None)Explanation:‘read_csv’ function has a plethora of parameters and I have specified only a few, ones that you may use most often. A few key points:a) header=0 means you have the names of columns in the first row in the file and if you don’t you will have to specify header=Noneb) index_col = False means to not use the first column of the data as an index in the data frame, you might want to set it to true if the first column is really an index.c) names = None implies you are not specifying the column names and want it to be inferred from csv file, which means that your header = some_number contains column names. Otherwise, you can specify the names in here in the same order as you have the data in the csv file. If you are reading a text file separated by space or tab, you could simply change the sep to be:sep = " " or sep=''
2.如何使用预先存在的列或NumPy 2D阵列的字典创建数据框?
使用字典
# c1, c2, c3, c4 are column names. d_dic ={'first_col_name':c1,'second_col_names':c2,'3rd_col_name':c3} df = pd.DataFrame(data = d_dic)
使用NumPy数组