python怎么创建一个二维数组_如何在Python中创建二维数组?

假设每个blog中的文本都是字符串形式的,并且您在blogs中有一个这样的字符串列表,那么您就可以创建矩阵了。在import re

# Sample input for the following code.

blogs = ["This is a blog.","This is another blog.","Cats? Cats are awesome."]

# This is a list that will contain dictionaries counting the wordcounts for each blog

wordcount = []

# This is a list of all unique words in all blogs.

wordlist = []

# Consider each blog sequentially

for blog in blogs:

# Remove all the non-alphanumeric, non-whitespace characters,

# and then split the string at all whitespace after converting to lowercase.

# eg: "That's not mine." -> "Thats not mine" -> ["thats","not","mine"]

words = re.sub("\s+"," ",re.sub("[^\w\s]","",blog)).lower().split(" ")

# Add a new dictionary to the list. As it is at the end,

# it can be referred to by wordcount[-1]

wordcount.append({})

# Consider each word in the list generated above.

for word in words:

# If that word has been encountered before, increment the count

if word in wordcount[-1]: wordcount[-1][word]+=1

# Else, create a new entry in the dictionary

else: wordcount[-1][word]=1

# If it is not already in the list of unique words, add it.

if word not in wordlist: wordlist.append(word)

# We now have wordlist, which has a unique list of all words in all blogs.

# and wordcount, which contains len(blogs) dictionaries, containing word counts.

# Matrix is the table that you need of wordcounts. The number of rows will be

# equal to the number of unique words, and the number of columns = no. of blogs.

matrix = []

# Consider each word in the unique list of words (corresponding to each row)

for word in wordlist:

# Add as many columns as there are blogs, all initialized to zero.

matrix.append([0]*len(wordcount))

# Consider each blog one by one

for i in range(len(wordcount)):

# Check if the currently selected word appears in that blog

if word in wordcount[i]:

# If yes, increment the counter for that blog/column

matrix[-1][i]+=wordcount[i][word]

# For printing matrix, first generate the column headings

temp = "\t"

for i in range(len(blogs)):

temp+="Blog "+str(i+1)+"\t"

print temp

# Then generate each row, with the word at the starting, and tabs between numbers.

for i in range(len(matrix)):

temp = wordlist[i]+"\t"

for j in matrix[i]: temp += str(j)+"\t"

print temp

现在,matrix[i][j]将包含单词wordlist[i]出现在博客blogs[j]中的次数。在

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值