python输入名字、可以查出名字对应的unicode_Python-我可以检测Unicode字符串语言代码吗？...

最新推荐文章于 2023-03-03 08:49:33 发布

weixin_39824191

最新推荐文章于 2023-03-03 08:49:33 发布

阅读量224

点赞数

文章标签： python输入名字、可以查出名字对应的unicode

I'm faced with a situation where I'm reading a string of text and I need to detect the language code (en, de, fr, es, etc).

Is there a simple way to do this in python?

解决方案

If you need to detect language in response to a user action then you could use google ajax language API:

#!/usr/bin/env python

import json

import urllib, urllib2

def detect_language(text,

userip=None,

referrer="http://stackoverflow.com/q/4545977/4279",

api_key=None):

query = {'q': text.encode('utf-8') if isinstance(text, unicode) else text}

if userip: query.update(userip=userip)

if api_key: query.update(key=api_key)

url = 'https://ajax.googleapis.com/ajax/services/language/detect?v=1.0&%s'%(

urllib.urlencode(query))

request = urllib2.Request(url, None, headers=dict(Referer=referrer))

d = json.load(urllib2.urlopen(request))

if d['responseStatus'] != 200 or u'error' in d['responseData']:

raise IOError(d)

return d['responseData']['language']

print detect_language("Python - can I detect unicode string language code?")

Output

en

Default limit 100000 characters/day (no more than 5000 at a time).

#!/usr/bin/env python

# -*- coding: utf-8 -*-

import json

import urllib, urllib2

from operator import itemgetter

def detect_language_v2(chunks, api_key):

"""

chunks: either string or sequence of strings

Return list of corresponding language codes

"""

if isinstance(chunks, basestring):

chunks = [chunks]

url = 'https://www.googleapis.com/language/translate/v2'

data = urllib.urlencode(dict(

q=[t.encode('utf-8') if isinstance(t, unicode) else t

for t in chunks],

key=api_key,

target="en"), doseq=1)

# the request length MUST be < 5000

if len(data) > 5000:

raise ValueError("request is too long, see "

"http://code.google.com/apis/language/translate/terms.html")

#NOTE: use POST to allow more than 2K characters

request = urllib2.Request(url, data,

headers={'X-HTTP-Method-Override': 'GET'})

d = json.load(urllib2.urlopen(request))

if u'error' in d:

raise IOError(d)

return map(itemgetter('detectedSourceLanguage'), d['data']['translations'])

def detect_language_v2(chunks, api_key):

"""

chunks: either string or sequence of strings

Return list of corresponding language codes

"""

if isinstance(chunks, basestring):

chunks = [chunks]

url = 'https://www.googleapis.com/language/translate/v2/detect'

data = urllib.urlencode(dict(

q=[t.encode('utf-8') if isinstance(t, unicode) else t

for t in chunks],

key=api_key), doseq=True)

# the request length MUST be < 5000

if len(data) > 5000:

raise ValueError("request is too long, see "

"http://code.google.com/apis/language/translate/terms.html")

#NOTE: use POST to allow more than 2K characters

request = urllib2.Request(url, data,

headers={'X-HTTP-Method-Override': 'GET'})

d = json.load(urllib2.urlopen(request))

return [sorted(L, key=itemgetter('confidence'))[-1]['language']

for L in d['data']['detections']]

Example:

print detect_language_v2(

["Python - can I detect unicode string language code?",

u"матрёшка",

u"打水"], api_key=open('api_key.txt').read().strip())

Output

[u'en', u'ru', u'zh-CN']

weixin_39824191

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python输入名字、可以查出名字对应的unicode_Python-我可以检测Unicode字符串语言代码吗？...

I'm faced with a situation where I'm reading a string of text and I need to detect the language code (en, de, fr, es, etc).Is there a simple way to do this in python?解决方案If you need to detect language...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。