CPCA用于获取地址对应省市县详情的库

weixin_42132366

已于 2024-05-13 16:51:35 修改

阅读量429

点赞数 13

文章标签： java python 前端

于 2024-04-25 14:17:25 首次发布

本文链接：https://blog.csdn.net/weixin_42132366/article/details/138187877

版权

本文介绍了如何使用cpca模块从文本中提取地址，并通过Python获取对应的6位行政代码。涉及数据预处理、文件操作和pandasDataFrame的使用。

摘要由CSDN通过智能技术生成

简单示例：

import cpca

location_str = [
    "广东省深圳市福田区",
    "四川省广汉市城西三星堆镇的鸭子河畔，属青铜时代文化遗址"
]
df = cpca.transform(location_str)
print(df)

该模块可通过自然文本，获取到文章中的地址及6位行政代码

guthub地址：项目首页 - chinese_province_city_area_mapper - GitCode

补充一点：目前这个库作者维护到了2020年，之后的行政区划未修改。文中作者根据写了详细的修改说明，可根据自己需求维护。

批量获取行政区

import pandas as pd  
import cpca  # 确保你已经安装了cpca库  
  
location_str = [  
    "重庆市市辖区忠县精忠路",  
    # ... 其他地址 ...  
    "云南省昆明市晋宁区晋宁宝峰工业区进宝公路南侧"  
]  
  
# 假设cpca.transform返回的是一个pandas DataFrame  
df = cpca.transform(location_str)  
  
# 将DataFrame转换为字符串形式，以便写入文件  
df_str = df.to_string(index=False)  # 移除索引，使输出更整洁  
  
# 定义文件名和路径  
filename = 'D:\\output.txt'  
  
# 打开文件并写入字符串  
with open(filename, 'w', encoding='utf-8') as file:  
    file.write(df_str)

读取文件批量切分地址并获取行政代码

import pandas as pd
import cpca  # 确保你已经安装了cpca库

# 定义文件名和路径
filename_input = 'D:\\file.txt'  # 假设你的地址文件名为file.txt
filename_output = 'D:\\output.txt'

# 读取D盘根目录下的file.txt文件内容到列表中
with open(filename_input, 'r', encoding='utf-8') as file:
    location_str = file.readlines()

# 去除每行末尾的换行符（如果有的话）
location_str = [address.strip() for address in location_str]

# 假设cpca.transform返回的是一个pandas DataFrame
# 注意：cpca.transform函数可能不支持直接处理列表，可能需要转换为其他格式或循环处理
# 这里我假设我们可以直接将列表传递给它
df = cpca.transform(location_str)

# 将DataFrame转换为字符串形式，以便写入文件
df_str = df.to_string(index=False)  # 移除索引，使输出更整洁

# 打开文件并写入字符串
with open(filename_output, 'w', encoding='utf-8') as file:
    file.write(df_str)