As it is summer now, I decided to learn a new language and Python was my choice. Really, what I would like to learn is how to manipulate Arabic text using Python. Now, I have found many many resources on using Python, which are really great. However, when I apply what I learned on Arabic strings, I get numbers and letters combined together.
Take for example this for English:
>>> ebook = 'The American English Dictionary'
>>> ebook[2]
'e'
Now, for Arabic:
>>> abook = 'القاموس العربي'
>>> abook[2]
'\xde' #the correct output should be 'ق'
However, using print works fine, as in:
>>> print abook[2]
ق
What do I need to modify to get Python to always recognize Arabic letters?
解决方案
Use Unicode explicitly:
>>> s = u'القاموس العربي'
>>> s
u'\u0627\u0644\u0642\u0627\u0645\u0648\u0633 \u0627\u0644\u0639\u0631\u0628\u064a'
>>> print s
القاموس العربي
>>> print s[2]
ق
Or even character by character:
>>> for i, c in enumerate(s):
... print i,c
...
0 ا
1 ل
2 ق
3 ا
4 م
5 و
6 س
7
8 ا
9 ل
10 ع
11 ر
12 ب
13 ي
14
I recommend the Python Unicode page which is short, practical and useful.