labels用python 怎么用_用句子和标签在python中分割线条(splitting lines in python with sentences and labels)...

用句子和标签在python中分割线条(splitting lines in python with sentences and labels)

我有一个带有句子和标签的文件样本。 怎么能分成句子和标签?

一部非常,非常非常缓慢,漫无目的的电影,讲述一个忧郁,漂泊的年轻人。 0

不知道谁更迷失 - 扁平人物或观众,其中近一半人走了出去。 0

这部电影以黑白和巧妙的摄影角度尝试艺术,令人失望 - 变得更加荒谬 - 因为表演很差,情节和线条几乎不存在。 0

很少有音乐或任何可以谈论的东西。 0

产量

句子列表:

['一部非常非常非常慢动作的漫无目的的电影,讲述一个忧心忡忡,漂泊的年轻人','不知道谁更迷失 - 平面人物或观众,其中将近一半人走出去了']

相应的标签:

[ '0', '0']

I have a sample of a file with sentences and labels. How can it be split into sentences and labels?

A very, very, very slow-moving, aimless movie about a distressed, drifting young man. 0

Not sure who was more lost - the flat characters or the audience, nearly half of whom walked out. 0

Attempting artiness with black & white and clever camera angles, the movie disappointed - became even more ridiculous - as the acting was poor and the plot and lines almost non-existent. 0

Very little music or anything to speak of. 0

output

list of sentences:

['A very, very, very slow-moving, aimless movie about a distressed, drifting young man','Not sure who was more lost - the flat characters or the audience, nearly half of whom walked out']

corresponding labels:

['0','0']

原文:https://stackoverflow.com/questions/47466917

2020-08-18 19:08

满意答案

假设最后一个“。”(点)之后的数字是标签

对于存储在文件'yourdata.txt'中的给定示例,以下代码应该生成2个列表sentence_list和label_list 。 您可以根据您的要求单独将这些列表中的数据写入文件。

fmov=open('yourdata.txt','r')

sentence_list=[]

label_list=[]

for f in fmov.readlines():

lineinfo=f.split('.')

sentenceline=".".join(lineinfo[0:-1])

sentence_list.append(sentenceline)

label_list.append(str(lineinfo[-1]).replace('\n',''))

print(sentence_list)

print(label_list)

OUT:

['A very, very, very slow-moving, aimless movie about a distressed, drifting young man', 'Not sure who was more lost - the flat characters or the audience, nearly half of whom walked out', 'Attempting artiness with black & white and clever camera angles, the movie disappointed - became even more ridiculous - as the acting was poor and the plot and lines almost non-existent', 'Very little music or anything to speak of']

[' 0', ' 0', ' 0', ' 0']

Assuming that the number after the last "."(dot) is the Label

For the given example when stored in a file 'yourdata.txt' the following code should produce 2 lists sentence_list and label_list. You can write the data in these lists to files separately then as requested by you.

fmov=open('yourdata.txt','r')

sentence_list=[]

label_list=[]

for f in fmov.readlines():

lineinfo=f.split('.')

sentenceline=".".join(lineinfo[0:-1])

sentence_list.append(sentenceline)

label_list.append(str(lineinfo[-1]).replace('\n',''))

print(sentence_list)

print(label_list)

OUT:

['A very, very, very slow-moving, aimless movie about a distressed, drifting young man', 'Not sure who was more lost - the flat characters or the audience, nearly half of whom walked out', 'Attempting artiness with black & white and clever camera angles, the movie disappointed - became even more ridiculous - as the acting was poor and the plot and lines almost non-existent', 'Very little music or anything to speak of']

[' 0', ' 0', ' 0', ' 0']

2017-11-24

相关问答

自然语言工具包( nltk.org )具有您需要的功能。 这个小组发帖表示这样做: import nltk.data

tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')

fp = open("test.txt")

data = fp.read()

print '\n-----\n'.join(tokenizer.tokenize(data))

(我还没试过!) The Natural Language Toolkit (n...

your_list有你的解决方案。 您不需要执行任何进一步的步骤。 with open('testcsv.csv', 'r') as f:

reader = csv.reader(f)

your_list = list(reader)

print your_list

结果:[['1','11'],['2','12'],['3','13'],['4','14']] your_list has your solution. You do not need to perform any f...

使用DataFrame.itertuples()方法: import pandas as pd

df = pd.DataFrame(

[['John Lennon', 10], ['George Harrison', 6]],

columns=['beatle', 'songs']

)

longform = pd.DataFrame(columns=['word', 'num'])

for idx, name, songs in df.itertuples():

na...

以下列表推导创建了一个元组列表,其中前两个元素是索引,最后一个是相似性: edges = [(i,j,dice_coefficient(x,y))

for i,x in enumerate(sentences)

for j,y in enumerate(sentences) if i < j]

您现在可以删除某个阈值以下的边缘,并将剩余的边缘转换为带有networkx的图形: import networkx as nx

G = nx.Graph()

G.a...

在多个列表的情况下你可以这样做 尝试这个:- import itertools

final_list = [list1,list2,list3,....]

print(list(itertools.product(*final_list))) #you will get all possible matches

In Multiple list cases You can do like this Try this:- import itertools

final_list = [list1,l...

它不是直接拆分的正则表达式,而是一种解决方法: (?!Mrs?\.|Jr\.|Dr\.|Sr\.|Prof\.)(\b\S+[.?!]["']?)\s

DEMO 您可以将匹配的片段替换为例如: $1# (或其他未在文本中出现的字符,而不是# ),然后使用#DEMO将其拆分。 然而,它不是太优雅的解决方案。 It is not regex for direct split, but kind of workaround: (?!Mrs?\.|Jr\.|Dr\.|Sr\.|Prof\.)(\b\S+...

这很有可能更好地使用nltk处理( 安装正确 ,那是): from nltk.tokenize import sent_tokenize

string = "This is a sentence. This is another. And here one another, same line, starting with space. this sentence starts with lowercase letter. Here is a site you may know: google....

如果一行不包含句点,则split将返回一个元素:行本身: >>> "asdasd".split('.')

['asdasd']

所以你要计算行数加周期数。 为什么要将文件拆分为行? with open('words.txt', 'r') as file:

file_contents = file.read()

print('Total words: ', len(file_contents.split()))

print('total stops: '...

假设最后一个“。”(点)之后的数字是标签 对于存储在文件'yourdata.txt'中的给定示例,以下代码应该生成2个列表sentence_list和label_list 。 您可以根据您的要求单独将这些列表中的数据写入文件。 fmov=open('yourdata.txt','r')

sentence_list=[]

label_list=[]

for f in fmov.readlines():

lineinfo=f.split('.')

sentenceline=".".jo...

([!?.])(?=\s*[A-Z])\s*

你可以使用这个正则表达式在你的正则表达式之前创建句子。参见demo。放置\1\n 。 https://regex101.com/r/sH8aR8/5 x="I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"

print re.sub(r"([!?.])(?=\s*[A-Z])",...

相关文章

The most splendid achievement of all is the constan

...

Python 编程语言具有很高的灵活性,它支持多种编程方法,包括过程化的、面向对象的和函数式的。但最重

...

python2和python3的区别,1.性能 Py3.0运行 pystone benchmark的速

...

Python的文件类型 Python有三种文件类型,分别是源代码文件、字节码文件和优化代码文件

源代

...

python的官网:http://www.python.org/ 有两个版本,就像struts1和st

...

好久没有写了,还不是近期刚过的期末考试和期中考试 最近因为一个微信公众平台大赛在学phthon 找了本

...

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值