anaconda3之pandas主要用法

最新推荐文章于 2024-03-12 19:19:00 发布

沐泽__

最新推荐文章于 2024-03-12 19:19:00 发布

阅读量2.9k

点赞数 1

分类专栏： python

本文链接：https://blog.csdn.net/baidu_38225647/article/details/115842229

版权

python 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

1 安装

anaconda3安装时无需安装python，其内置有python3和spyder IDE开发环境。下载地址: https://www.anaconda.com/products/individual.。win10安装时，需配置环境变量，才能在命令行中输入python直接调用python命令行。可在开始菜单直接寻找python命令行工具和spyder IDE工具。

2 使用pandas

接口网址：https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_table.html
常用数据结构：
DataFrame：二维的表格型数据结构。
Series：一维数组。

2.1 使用正则表达式提取数据

首字母a或b开头，其后大于等于0个数字，将首字母和数字分为两列。

import pandas as pd

s = pd.Series(['a12', 'b2', 'c3'])
#提取数据
n= s.str.extract(r'([ab])(\d*)')
print(n)
# 给列名c1，c2
n = s.str.extract(r'(?P<c1>[ab])(?P<c2>\d*)')
print(n)

结果：

     0    1
0    a   12
1    b    2
2  NaN  NaN
    c1   c2
0    a   12
1    b    2
2  NaN  NaN

2.2 read_table读取文件作为表格处理

接口参数说明：https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_table.html
返回类型时DataFrame

2021-04-21|a|1
2021-04-22|b|2
2021-04-23|c|3
2021-04-24|d|4

import pandas as pd
df =  pd.read_table("./data1.txt",header=None,encoding='utf8',sep='|')
df.columns = ["日期",'内容1','内容2']
print(df)

结果：

           日期 内容1  内容2
0  2021-04-21   a    1
1  2021-04-22   b    2
2  2021-04-23   c    3
3  2021-04-24   d    4

2.3 字符串处理常用函数

大小写转换

import pandas as pd 

data = {'Name':['Jack', 'Jucy', 'Join', 'Luch']} 
df = pd.DataFrame(data) 
print(df)
print("--转换小写--")
# 对整列进行处理
df["Name"]= df["Name"].str.lower()
print(df)
df["Name"]= df["Name"].str.upper()
print("--转换大写--")
print(df)
>>>

拆分替换数据

import pandas as pd 

data = {'Address':['山东省 济南市 历城区','山东省 济南市 历城区','山东省 济南市 历城区','山东省 济南市 历城区','山东省 济南市 历城区'],"Age":[25,24,23,26,26]}
df = pd.DataFrame(data) 
print(df)
# 避免存在null的数据，若存在，则用None代替整个DataFrame
df.dropna(inplace = True) 
df["Address"]= df["Address"].str.split(" ", n = 1, expand = True) 
print(df)
df["Age"]= df["Age"].replace(26, "Twenty six")
print(df)
# 结果
       Address  Age
0  山东省 济南市 历城区   25
1  山东省 济南市 历城区   24
2  山东省 济南市 历城区   23
3  山东省 济南市 历城区   26
4  山东省 济南市 历城区   26
  Address  Age
0     山东省   25
1     山东省   24
2     山东省   23
3     山东省   26
4     山东省   26
  Address         Age
0     山东省          25
1     山东省          24
2     山东省          23
3     山东省  Twenty six
4     山东省  Twenty six

数据串联

import pandas as pd 

data = {'provice':['山东','山东'],'city':["青岛","济南"]}
df = pd.DataFrame(data) 
print(df)
# 复制
new = df["city"].copy()
df["city"]= df["provice"].str.cat(new, sep =", ")
print(df)
# 结果
  provice city
0      山东   青岛
1      山东   济南
  provice    city
0      山东  山东, 青岛
1      山东  山东, 济南

删除空白

import pandas as pd 

data = {'city':["青岛  ","  济南  "],"name":[1,1]}
df = pd.DataFrame(data) 
print(df)
df["city"] = df['city'].str.strip()
print(df)
# 结果
     city  name
0    青岛       1
1    济南       1
  city  name
0   青岛     1
1   济南     1

功能	描述
str.lower	字符大小写转换
str.upper	-
str.find	搜索序列中存在的每个字符串中的子字符串
str.rfind	从右侧搜索系列中存在的每个字符串中的子字符串
str.findall	在系列中的每个字符串中查找子字符串或分隔符
str.isalpha	检查序列中每个字符串中的所有字符是否都是字母（az / AZ）
str.isdecimal	检查字符串中的所有字符是否均为十进制
str.title	字符串中每个单词的首字母大写的方法
str.len	返回字符串中字符数的计数，len(str)
str.replace	方法用提供的另一个值替换字符串中的子字符串
str.contains	方法测试模式或正则表达式是否包含在系列或索引的字符串中
str.extract	从正则表达式模式的第一个匹配项中提取组。
str.startswith	方法测试每个字符串元素的开头是否与模式匹配
str.endswith	方法测试每个字符串元素的末尾是否与模式匹配
str.isdigit	检查序列中每个字符串中的所有字符是否都是数字的方法
str.lstrip	从字符串的左侧（开头）删除空格
str.rstrip	从字符串的右侧（结尾）删除空格
str.strip	删除开头和结尾空格
str.split	根据指定的值的出现来拆分字符串值
str.join	通过传递的定界符连接列表中存在的所有元素
str.cat	将字符串连接到传递的字符串系列。
str.repeat	用于在传递的序列本身的相同位置重复字符串值
str.get	用于获取通过位置的元素
str.partition	与str.split（）不同，该方法仅在第一次出现时才拆分字符串
str.rpartition	方法仅将字符串拆分一次，而且也相反。它的工作方式类似于str.partition（）和str.split（）
str.pad	将填充（空格或其他字符）添加到系列中的每个字符串元素的方法
str.swapcase	交换系列中每个字符串的大小写的方法