一、pandas
pandas 是基于NumPy 的一种工具,该工具是为解决数据分析任务而创建的。Pandas 纳入了大量库和一些标准的数据模型,提供了高效地操作大型数据集所需的工具。pandas提供了大量能使我们快速便捷地处理数据的函数和方法。你很快就会发现,它是使Python成为强大而高效的数据分析环境的重要因素之一。
二、if条件判断DataFrame是否为空
dataframe.empty加if条件判断文件是否为空,如果返回的dataframe为空,可能导致某些逻辑错误。
data = pd.read_csv(filename, skiprows=1, header=None, error_bad_lines=False)
if data.empty:
do empty
else:
do not empty
data = pd.read_csv(filename, skiprows=1, header=None, error_bad_lines=False)
if not data.empty:
do not empty
else:
do empty
三、DataFrame取某一列
# one method
dataframe[b][dataframe[a]==1].values[0]
# two method
dataframe[dataframe[a]==1][b].values[0]
三、DataFrame按行按列遍历的方式
DataFrame是一种矩阵形式,所有的行名保存在index里,列名保存在columns里。如下方式可以创建一个DataFrame:
import pandas as pd
import numpy as np
# 行数*列数要与数据个数一致
>>> df = pd.DataFrame(np.arange(12).reshape(3, 4), index = ['row1', 'row2', 'row3'], columns=['col1', 'col2','col3'])
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.6/site-packages/pandas/core/internals.py", line 4857, in create_block_manager_from_blocks
placement=slice(0, len(axes[0])))]
File "/root/miniconda3/lib/python3.6/site-packages/pandas/core/internals.py", line 3205, in make_block
return klass(values, ndim=ndim, placement=placement)
File "/root/miniconda3/lib/python3.6/site-packages/pandas/core/internals.py", line 125, in __init__
'{mgr}'.format(val=len(self.values), mgr=len(self.mgr_locs)))
ValueError: Wrong number of items passed 4, placement implies 3
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/root/miniconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 379, in __init__
copy=copy)
File "/root/miniconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 536, in _init_ndarray
return create_block_manager_from_blocks([values], [columns, index])
File "/root/miniconda3/lib/python3.6/site-packages/pandas/core/internals.py", line 4866, in create_block_manager_from_blocks
construction_error(tot_items, blocks[0].shape[1:], axes, e)
File "/root/miniconda3/lib/python3.6/site-packages/pandas/core/internals.py", line 4843, in construction_error
passed, implied))
ValueError: Shape of passed values is (4, 3), indices imply (3, 3)
>>> df = pd.DataFrame(np.arange(12).reshape(3, 4), index = ['row1', 'row2', 'row3'], columns=['col1', 'col2', 'col3', 'col4'])
>>>
>>> df
col1 col2 col3 col4
row1 0 1 2 3
row2 4 5 6 7
row3 8 9 10 11
>>> df.index
Index(['row1', 'row2', 'row3'], dtype='object')
>>>
>>> df.columns
Index(['col1', 'col2', 'col3', 'col4'], dtype='object')
iteritems(): 按列遍历,将DataFrame的每一列迭代为(列名, Series)对,可以通过row[index]对元素进行访问
iterrows(): 按行遍历,将DataFrame的每一行迭代为(index, Series)对,可以通过row[name]对元素进行访问
itertuples(): 按行遍历,将DataFrame的每一行迭代为元祖,可以通过row[name]对元素进行访问,比iterrows()效率要高
>>> import pandas as pd
>>>
>>> pdd = [{'c1':10, 'c2':100}, {'c1':11, 'c2':111}, {'c1':22, 'c2':222}]
>>>
>>> print(type(pdd))
<class 'list'>
>>>
>>> df = pd.DataFrame(pdd)
>>>
>>> print(df)
c1 c2
0 10 100
1 11 111
2 22 222
>>> print(type(df))
<class 'pandas.core.frame.DataFrame'>
按列遍历iteritems()用法:
# index--列名
>>> for index, row in df.iteritems():
... print(index)
...
c1
c2
# row--某一列, row[0]某一列的第一行
>>> for index, row in df.iteritems():
... print(row[0], row[1], row[2])
...
10 11 22
100 111 222
按行遍历iterrows()用法:
# index-行号
>>> for index, row in df.iterrows():
... print(index)
...
0
1
2
# 某一行通过列名name访问对应的元素
>>> for index, row in df.iterrows():
... print(row['c1'], row['c2'])
...
10 100
11 111
22 222
按行遍历itertuples()用法:
# getattr(row, 'name')得到某行的元素
>>> for row in df.itertuples():
... print(getattr(row, 'c1'), getattr(row, 'c2'))
...
10 100
11 111
22 222