The Differences in The Way Coding Between Python2 and Python3.


When input strings like 'this is a test'.

In Python2:
# basic
s = 'this is a test'
t = u'this is a test'
print type(s)    #  string
print type(t)    #  Unicode

# transform
print type(s.decode('utf-8'))    # <type 'unicode'>
print type(t.encode('utf-8'))    # <type 'str'>
In Python3:
# basic
s = 'this is a test'
t = b'this is a test'
print(type(s))    # <type 'str'>
print(type(t))    # <type 'byte'>

# transform
print(type(s.encode('utf-8')))    # <type 'byte'>
print(type(t.decode('utf-8')))    # <type 'string'>
Differences:

    The methods used to transform between byte(unicode) and string are opposite.


When input bytes like '\u0074\u0068\u0069\u0073'

In Python2:

"""
'\' is special character in python. If you want to display '\t\n' itself, please use '\\r\\n'
"""
s = '\u0074\u0068\u0069\u0073\u0020\u0069\u0073\u0020\u0061\u0020\u0074\u0065\u0073\u0074'
t = u'\u0074\u0068\u0069\u0073\u0020\u0069\u0073\u0020\u0061\u0020\u0074\u0065\u0073\u0074'

newline_str = '\r\n'
newline_uni = u'\r\n'

print type(s)    # <type 'str'>
print type(t)    # <type 'unicode'>
print type(newline_str)    # <type 'str'>
print type(newline_uni)    # <type 'unicode'>

print newline_str    # change to a new line
print newline_uni    # change to a new line
print s    # this is a test
print t    # this is a test

An Example: Get The Words Frequency

Processing the special characters in the file, such as '\r\n', '\x80'

This article 'The Call of The Wild' comes from http://novel.tingroom.com/jingdian/198/

# Example
with open('TheCallofTheWild.txt') as file:
    str = file.read()

puncs = [',', '.', ';', "'s", '-', ':', '"', '\r\n', '\xe3\x80\x80\xe3\x80\x80']
for punc in puncs:
    str = str.replace(punc, ' ')
print 'Punctuations replacement completed.'

# sort the words index by frequency.
words = str.lower().split(' ')
wordsindex = list(set(words))
wordsindex.remove('')
wordsindex = sorted(wordsindex, key=lambda x: words.count(x), reverse=True)
print 'The total number of words in The Call of The Wild is: {}'.format(len(wordsindex))

# compute the spercific frequency of word, and save to a dictionary.
wordsfrequency = {}
for word in wordsindex:
    wordsfrequency[word] = words.count(word)

# Verify the sort is correct.
for word in wordsindex[1000:1030]:
    print '{}: {}'.format(word, wordsfrequency[word])

# Done.



 


  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值