【已解决】python解决replace(“/n“,““)无法替换换行符

最新推荐文章于 2024-07-28 03:39:36 发布

SophiaSSSSS

最新推荐文章于 2024-07-28 03:39:36 发布

阅读量1.4w

点赞数 4

分类专栏：练手纠错帖集结文章标签： python 正则表达式字符串

本文链接：https://blog.csdn.net/weixin_44216391/article/details/107472319

版权

练手纠错帖集结专栏收录该内容

27 篇文章 2 订阅

订阅专栏

先看原数据，一直在想办法清除 “\n”和“/”这两个符号。

# 从提取出的几列来看，还有些细节需要再洗洗：例如为了统计和美观需要，“\n”和“/”这两个符号应去掉。

lendhouse_content_split3 = lendhouse_content_split2.iloc[:,[0,16,24,42,70,94]]
lendhouse_content_split3.columns=['location_name','area','direction','housetype','stair_type','stairs']
print("未使用replace前：\n",lendhouse_content_split3.head(2),"\n")

# lendhouse_content_split3 = lendhouse_content_split3.map(lambda x: x.replace("/n",""))   
# 报错 AttributeError: 'DataFrame' object has no attribute 'map'

lendhouse_content_split3 = lendhouse_content_split3.replace("/n","")
print("第一次使用replace：\n",lendhouse_content_split3.head(2)) 
# 并没有替换成功，看 print 结果还是有 “/n” 这个符号在。
# 第一列 location_name 还需要再分列，下面先分列整理。

未使用replace前：
     location_name   area direction housetype stair_type stairs
0  黄埔-科学城-万科里享家\n  78㎡\n        /南    3室2厅1卫        中楼层  （34层）
1   黄埔-科学城-沙湾新村\n  18㎡\n        /南    4室2厅2卫        低楼层  （16层） 

第一次使用replace：
     location_name   area direction housetype stair_type stairs
0  黄埔-科学城-万科里享家\n  78㎡\n        /南    3室2厅1卫        中楼层  （34层）
1   黄埔-科学城-沙湾新村\n  18㎡\n        /南    4室2厅2卫        低楼层  （16层）

# lendhouse_content_split4 = pd.DataFrame(x.split("-") for x in lendhouse_content_split3[0])  # 报错 KeyError: 0 —— 备注以对比参考。

lendhouse_content_split4 = pd.DataFrame(x.split("-") for x in lendhouse_content_split3['location_name'])
lendhouse_content_split4.columns=['district','板块','name','none1']
lendhouse_content_split4.head()

	district	板块	name	none1
0	黄埔	科学城	万科里享家\n	None
1	黄埔	科学城	沙湾新村\n	None
2	番禺	石碁	雅苑青年公馆\n	None
3	仅剩4间\n	None	None	None
4	天河	华景新城	华景新城绿茵居\n	None

# 合并 lendhouse_content_split3 和 lendhouse_content_split4
lendhouse_content_split5 = pd.merge(lendhouse_content_split4.iloc[:,:3],lendhouse_content_split3.iloc[:,1:6],
                         right_index=True, left_index=True)
print("得到 lendhouse_content 的数据状态：\n",lendhouse_content_split5.head())

# 接下来要想办法清除 “\n”和“/”这两个符号。

得到 lendhouse_content 的数据状态：
   district    板块       name   area direction housetype stair_type stairs
0       黄埔   科学城    万科里享家\n  78㎡\n        /南    3室2厅1卫        中楼层  （34层）
1       黄埔   科学城     沙湾新村\n  18㎡\n        /南    4室2厅2卫        低楼层  （16层）
2       番禺    石碁   雅苑青年公馆\n  61㎡\n        /北    1室1厅1卫        中楼层   （5层）
3   仅剩4间\n  None       None                        /\n       None   None
4       天河  华景新城  华景新城绿茵居\n  62㎡\n        /南    2室1厅1卫        低楼层   （9层）

# lendhouse_content_split5['area'] = lendhouse_content_split5['area'].replace("\n","")
lendhouse_content_split5 = lendhouse_content_split5.replace("\r\n","")
print("第一次使用replace：\n",lendhouse_content_split5.head(2))

# lendhouse_content_split5['direction'] = lendhouse_content_split5['direction'].replace("/","")
lendhouse_content_split5 = lendhouse_content_split5.replace("/","")
print("\n第二次使用replace：\n",lendhouse_content_split5.head(2))

# 发现替换函数 replace 还是没有生效。那接下来看看能不能直接截取特定符号前面或者特定符号后面的字符串，作为新的内容。
# lendhouse_content_split5.to_excel(total_path+"\\lendhouse_content_split5"+".xlsx", encoding='utf-8', index=False, header=True)

第一次使用replace：
   district   板块     name   area direction housetype stair_type stairs
0       黄埔  科学城  万科里享家\n  78㎡\n        /南    3室2厅1卫        中楼层  （34层）
1       黄埔  科学城   沙湾新村\n  18㎡\n        /南    4室2厅2卫        低楼层  （16层）

第二次使用replace：
   district   板块     name   area direction housetype stair_type stairs
0       黄埔  科学城  万科里享家\n  78㎡\n        /南    3室2厅1卫        中楼层  （34层）
1       黄埔  科学城   沙湾新村\n  18㎡\n        /南    4室2厅2卫        低楼层  （16层）

# import re
# # lendhouse_content_split5['direction'] = re.findall(r'/*', lendhouse_content_split5['direction']) 
# # # 上述报错 error: nothing to repeat at position 0
# # print("第一次使用re.findall：\n",lendhouse_content_split5.head(2))

# lendhouse_content_split5['area'] = re.findall(r'*\n', lendhouse_content_split5['area'])  
# # 上述报错 error: nothing to repeat at position 0
# print("\n第二次使用re.findall：\n",lendhouse_content_split5.head(2))

好了，遍搜帖子，看到这个：
《python去除字符串中的换行符》https://www.jb51.net/article/125536.htm。

文中提到：
如果行尾符是 CR，则用replace("\r","") 　　
如果行尾符是 LF，则用replace("\n","")

至于如何判断行尾符是CR还是LF，可查阅：
《怎么设置notepad++显示空白制表行尾等所有符号》
https://jingyan.baidu.com/article/48206aea814786216ad6b39e.html

按照指引，查到了自己的行尾符（如下图）：两种都有。
在这里插入图片描述
接下来两种都替换，但结果还是没替换掉（如下）。额，尝试继续失败。

# lendhouse_content_split5['area'] = lendhouse_content_split5['area'].replace("\n","")
lendhouse_content_split5 = lendhouse_content_split5.replace("\r\n","")
lendhouse_content_split5 = lendhouse_content_split5.replace("\n","")
print("第一次使用replace：\n",lendhouse_content_split5.head(2))

# lendhouse_content_split5['direction'] = lendhouse_content_split5['direction'].replace("/","")
lendhouse_content_split5 = lendhouse_content_split5.replace("/","")
print("\n第二次使用replace：\n",lendhouse_content_split5.head(2))

在这里插入图片描述
之后试了多种方法之后，该问题已解决，详阅解决全流程：
《houseprice_analysis_广州房子租售比分析（中）》
https://blog.csdn.net/weixin_44216391/article/details/107633831

SophiaSSSSS

关注

4
点赞
踩
5

收藏

觉得还不错? 一键收藏
3
评论
【已解决】python解决replace(“/n“,““)无法替换换行符

先看原数据，一直在想办法清除 “\n”和“/”这两个符号。# 从提取出的几列来看，还有些细节需要再洗洗：例如为了统计和美观需要，“\n”和“/”这两个符号应去掉。lendhouse_content_split3 = lendhouse_content_split2.iloc[:,[0,16,24,42,70,94]]lendhouse_content_split3.columns=['location_name','area','direction','housetype','stair_type'
复制链接

扫一扫

专栏目录