The Differences in The Way Coding Between Python2 and Python3.

最新推荐文章于 2024-07-25 19:52:01 发布

悼良会之永绝兮

最新推荐文章于 2024-07-25 19:52:01 发布

阅读量172

点赞数 1

分类专栏： Python 文章标签： python2 python3 code

本文链接：https://blog.csdn.net/Slience_Gu/article/details/80975381

版权

Python 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

When input strings like 'this is a test'.

In Python2:

# basic
s = 'this is a test'
t = u'this is a test'
print type(s)    #  string
print type(t)    #  Unicode

# transform
print type(s.decode('utf-8'))    # <type 'unicode'>
print type(t.encode('utf-8'))    # <type 'str'>

In Python3:

# basic
s = 'this is a test'
t = b'this is a test'
print(type(s))    # <type 'str'>
print(type(t))    # <type 'byte'>

# transform
print(type(s.encode('utf-8')))    # <type 'byte'>
print(type(t.decode('utf-8')))    # <type 'string'>

Differences:

The methods used to transform between byte(unicode) and string are opposite.

When input bytes like '\u0074\u0068\u0069\u0073'

In Python2:


"""
'\' is special character in python. If you want to display '\t\n' itself, please use '\\r\\n'
"""
s = '\u0074\u0068\u0069\u0073\u0020\u0069\u0073\u0020\u0061\u0020\u0074\u0065\u0073\u0074'
t = u'\u0074\u0068\u0069\u0073\u0020\u0069\u0073\u0020\u0061\u0020\u0074\u0065\u0073\u0074'

newline_str = '\r\n'
newline_uni = u'\r\n'

print type(s)    # <type 'str'>
print type(t)    # <type 'unicode'>
print type(newline_str)    # <type 'str'>
print type(newline_uni)    # <type 'unicode'>

print newline_str    # change to a new line
print newline_uni    # change to a new line
print s    # this is a test
print t    # this is a test

An Example: Get The Words Frequency

Processing the special characters in the file, such as '\r\n', '\x80'

This article 'The Call of The Wild' comes from http://novel.tingroom.com/jingdian/198/

# Example
with open('TheCallofTheWild.txt') as file:
    str = file.read()

puncs = [',', '.', ';', "'s", '-', ':', '"', '\r\n', '\xe3\x80\x80\xe3\x80\x80']
for punc in puncs:
    str = str.replace(punc, ' ')
print 'Punctuations replacement completed.'

# sort the words index by frequency.
words = str.lower().split(' ')
wordsindex = list(set(words))
wordsindex.remove('')
wordsindex = sorted(wordsindex, key=lambda x: words.count(x), reverse=True)
print 'The total number of words in The Call of The Wild is: {}'.format(len(wordsindex))

# compute the spercific frequency of word, and save to a dictionary.
wordsfrequency = {}
for word in wordsindex:
    wordsfrequency[word] = words.count(word)

# Verify the sort is correct.
for word in wordsindex[1000:1030]:
    print '{}: {}'.format(word, wordsfrequency[word])

# Done.

悼良会之永绝兮

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
The Differences in The Way Coding Between Python2 and Python3.

When input strings like 'this is a test'.In Python2:# basics = 'this is a test't = u'this is a test'print type(s) # stringprint type(t) # Unicode# transformprint type(s.decode('utf-8'))...
复制链接

扫一扫