python 怎么找文件中的日文_如何在Python中查找字符串中的中文或日文字符？

最新推荐文章于 2023-06-11 10:53:10 发布

彭河森

最新推荐文章于 2023-06-11 10:53:10 发布

阅读量598

点赞数 1

文章标签： python 怎么找文件中的日文

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_31083901/article/details/112888095

版权

作为一个开始,您可以检查字符是否在以下unicode块之一中：

之后,您需要做的就是遍历字符串,检查char是中文,日文还是韩文(CJK),并相应地追加：

# -*- coding:utf-8 -*-

ranges = [

{"from": ord(u"\u3300"), "to": ord(u"\u33ff")}, # compatibility ideographs

{"from": ord(u"\ufe30"), "to": ord(u"\ufe4f")}, # compatibility ideographs

{"from": ord(u"\uf900"), "to": ord(u"\ufaff")}, # compatibility ideographs

{"from": ord(u"\U0002F800"), "to": ord(u"\U0002fa1f")}, # compatibility ideographs

{"from": ord(u"\u30a0"), "to": ord(u"\u30ff")}, # Japanese Kana

{"from": ord(u"\u2e80"), "to": ord(u"\u2eff")}, # cjk radicals supplement

{"from": ord(u"\u4e00"), "to": ord(u"\u9fff")},

{"from": ord(u"\u3400"), "to": ord(u"\u4dbf")},

{"from": ord(u"\U00020000"), "to": ord(u"\U0002a6df")},

{"from": ord(u"\U0002a700"), "to": ord(u"\U0002b73f")},

{"from": ord(u"\U0002b740"), "to": ord(u"\U0002b81f")},

{"from": ord(u"\U0002b820"), "to": ord(u"\U0002ceaf")} # included as of Unicode 8.0

]

def is_cjk(char):

return any([range["from"] <= ord(char) <= range["to"] for range in ranges])

def cjk_substrings(string):

i = 0

while i

if is_cjk(string[i]):

start = i

while is_cjk(string[i]): i += 1

yield string[start:i]

i += 1

string = "sdf344asfasf天地方益3権sdfsdf".decode("utf-8")

for sub in cjk_substrings(string):

string = string.replace(sub, "(" + sub + ")")

print string

以上打印

sdf344asfasf(天地方益)3(権)sdfsdf

为了面向未来,您可能需要注意CJK Unified Ideographs Extension E.它将是ship with Unicode 8.0,即scheduled for release in June 2015.我将其添加到范围,但是在Unicode 8.0被发布之前不应该包含它.

[编辑]

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。