Pandas 解析excel文件常用方法

SimpleLearing

已于 2022-10-12 20:30:57 修改

阅读量622

点赞数 1

分类专栏： python 文章标签： python pandas numpy

于 2022-08-19 19:00:06 首次发布

本文链接：https://blog.csdn.net/yiqiedouhao11/article/details/126429147

版权

python 专栏收录该内容

25 篇文章 0 订阅

订阅专栏

Talk is Cheap,show me the code!

原始excel 内容如下：

# 引入数据处理包pandas，
    import pandas as pd

    # 读取excel文件，header为none代表不考虑头的情况！
    df = pd.read_excel(file, header=None)
    print('df is: \n', df,type(df))
    print('*' * 20)

    # 读取excel文件，header默认是有表头的！
    df = pd.read_excel(file)
    print('df is: \n', df, type(df))
    print('*' * 20)

    # 可以将表格的数据装换为numpy进行处理
    print('numpy is \n', df.to_numpy())
    print('*' * 20)

    # 如果简单的了解表格的内容，可以用head()函数，默认前五行；
    print('head is \n', df.head())
    #用tail()函数，默认后五行；
    print('tail is \n', df.tail())

输出如下：

df is: 
    0  1  2
0  a  b  c
1  1  2  3
2  4  5  6
3  7  8  9 <class 'pandas.core.frame.DataFrame'>
********************
df is: 
    a  b  c
0  1  2  3
1  4  5  6
2  7  8  9 <class 'pandas.core.frame.DataFrame'>
********************
numpy is 
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
********************
head is 
    a  b  c
0  1  2  3
1  4  5  6
2  7  8  9
tail is 
    a  b  c
0  1  2  3
1  4  5  6
2  7  8  9

Process finished with exit code 0

#遍历每一行的索引
    print('行的索引：',df.index.values)
    print('*' * 20)

    #遍历每一列的索引
    print('列的索引：',df.columns.values)
    print('*' * 20)

    # 遍历每一行的数据，可以调用iterrows()方法；
    for idx, line in df.iterrows():
        print('行索引：',idx)
        print(line.values,type(line.values))

输出：

********************
行的索引： [0 1 2]
********************
列的索引： ['a' 'b' 'c']
********************
行索引： 0
[1 2 3] <class 'numpy.ndarray'>
行索引： 1
[4 5 6] <class 'numpy.ndarray'>
行索引： 2
[7 8 9] <class 'numpy.ndarray'>

    print('原始数据:\n', df)
    
    # 插入一列,用insert 方法，
    # 其中loc对应插入的位置，原来的依次后移；column对应列的名称；value对应列的值
    player_vals = ''
    df.insert(loc=3, column='player', value=player_vals)
    print('插入一列后变为：\n', df)

    # 对特定位置填入数据，可以用loc函数
    for idx, line in df.iterrows():
        df.loc[idx,'player'] = 'liming'
    print('填充数据之后：\n',df)

输出：

原始数据:
    a  b  c
0  1  2  3
1  4  5  6
2  7  8  9
插入一列后变为：
    a  b  c player
0  1  2  3       
1  4  5  6       
2  7  8  9       
填充数据之后：
    a  b  c  player
0  1  2  3  liming
1  4  5  6  liming
2  7  8  9  liming

    #添加一行数据，同样可以用loc函数；
    df.loc[len(df)] = [11,12,13,'liubei']
    print('loc方法填充一行数据后：\n',df)

    #写入到新的excel文件
    df.to_excel(file.replace('.xlsx','new.xlsx'))

输出后：

a  b  c  player
0  1  2  3  liming
1  4  5  6  liming
2  7  8  9  liming
********************
loc方法填充一行数据后：
     a   b   c  player
0   1   2   3  liming
1   4   5   6  liming
2   7   8   9  liming
3  11  12  13  liubei

有问题随时交流，欢迎一键三连~