labels用python 怎么用_用句子和标签在python中分割线条(splitting lines in python with sentences and labels)...

最新推荐文章于 2022-12-21 14:51:18 发布

weixin_39675289

最新推荐文章于 2022-12-21 14:51:18 发布

阅读量276

点赞数

文章标签： labels用python 怎么用

本文链接：https://blog.csdn.net/weixin_39675289/article/details/112892327

版权

用句子和标签在python中分割线条(splitting lines in python with sentences and labels)

我有一个带有句子和标签的文件样本。怎么能分成句子和标签？

一部非常，非常非常缓慢，漫无目的的电影，讲述一个忧郁，漂泊的年轻人。 0

不知道谁更迷失 - 扁平人物或观众，其中近一半人走了出去。 0

这部电影以黑白和巧妙的摄影角度尝试艺术，令人失望 - 变得更加荒谬 - 因为表演很差，情节和线条几乎不存在。 0

很少有音乐或任何可以谈论的东西。 0

产量

句子列表：

['一部非常非常非常慢动作的漫无目的的电影，讲述一个忧心忡忡，漂泊的年轻人'，'不知道谁更迷失 - 平面人物或观众，其中将近一半人走出去了']

相应的标签：

[ '0'， '0']

I have a sample of a file with sentences and labels. How can it be split into sentences and labels?

A very, very, very slow-moving, aimless movie about a distressed, drifting young man. 0

Not sure who was more lost - the flat characters or the audience, nearly half of whom walked out. 0

Attempting artiness with black & white and clever camera angles, the movie disappointed - became even more ridiculous - as the acting was poor and the plot and lines almost non-existent. 0

Very little music or anything to speak of. 0

output

list of sentences:

['A very, very, very slow-moving, aimless movie about a distressed, drifting young man','Not sure who was more lost - the flat characters or the audience, nearly half of whom walked out']

corresponding labels:

['0','0']

原文：https://stackoverflow.com/questions/47466917

2020-08-18 19:08

满意答案

假设最后一个“。”(点)之后的数字是标签

对于存储在文件'yourdata.txt'中的给定示例，以下代码应该生成2个列表sentence_list和label_list 。您可以根据您的要求单独将这些列表中的数据写入文件。

fmov=open('yourdata.txt','r')

sentence_list=[]

label_list=[]

for f in fmov.readlines():

lineinfo=f.split('.')

sentenceline=".".join(lineinfo[0:-1])

sentence_list.append(sentenceline)

label_list.append(str(lineinfo[-1]).replace('\n',''))

print(sentence_list)

print(label_list)

OUT:

['A very, very, very slow-moving, aimless movie about a distressed, drifting young man', 'Not sure who was more lost - the flat characters or the audience, nearly half of whom walked out', 'Attempting artiness with black & white and clever camera angles, the movie disappointed - became even more ridiculous - as the acting was poor and the plot and lines almost non-existent', 'Very little music or anything to speak of']

[' 0', ' 0', ' 0', ' 0']

Assuming that the number after the last "."(dot) is the Label

For the given example when stored in a file 'yourdata.txt' the following code should produce 2 lists sentence_list and label_list. You can write the data in these lists to files separately then as requested by you.

fmov=open('yourdata.txt','r')

sentence_list=[]

label_list=[]

for f in fmov.readlines():

lineinfo=f.split('.')

sentenceline=".".join(lineinfo[0:-1])

sentence_list.append(sentenceline)

label_list.append(str(lineinfo[-1]).replace('\n',''))

print(sentence_list)

print(label_list)

OUT:

[' 0', ' 0', ' 0', ' 0']

2017-11-24

相关问答

自然语言工具包( nltk.org )具有您需要的功能。这个小组发帖表示这样做： import nltk.data

tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')

fp = open("test.txt")

data = fp.read()

print '\n-----\n'.join(tokenizer.tokenize(data))

(我还没试过！) The Natural Language Toolkit (n...

your_list有你的解决方案。您不需要执行任何进一步的步骤。 with open('testcsv.csv', 'r') as f:

reader = csv.reader(f)

your_list = list(reader)

print your_list

结果：[['1'，'11']，['2'，'12']，['3'，'13']，['4'，'14']] your_list has your solution. You do not need to perform any f...

使用DataFrame.itertuples()方法： import pandas as pd

df = pd.DataFrame(

[['John Lennon', 10], ['George Harrison', 6]],

columns=['beatle', 'songs']

)

longform = pd.DataFrame(columns=['word', 'num'])

for idx, name, songs in df.itertuples():

na...

以下列表推导创建了一个元组列表，其中前两个元素是索引，最后一个是相似性： edges = [(i,j,dice_coefficient(x,y))

for i,x in enumerate(sentences)

for j,y in enumerate(sentences) if i < j]

您现在可以删除某个阈值以下的边缘，并将剩余的边缘转换为带有networkx的图形： import networkx as nx

G = nx.Graph()

G.a...

在多个列表的情况下你可以这样做尝试这个：- import itertools

final_list = [list1,list2,list3,....]

print(list(itertools.product(*final_list))) #you will get all possible matches

In Multiple list cases You can do like this Try this:- import itertools

final_list = [list1,l...

它不是直接拆分的正则表达式，而是一种解决方法： (?!Mrs?\.|Jr\.|Dr\.|Sr\.|Prof\.)(\b\S+[.?!]["']?)\s

DEMO 您可以将匹配的片段替换为例如： $1# (或其他未在文本中出现的字符，而不是# )，然后使用#DEMO将其拆分。然而，它不是太优雅的解决方案。 It is not regex for direct split, but kind of workaround: (?!Mrs?\.|Jr\.|Dr\.|Sr\.|Prof\.)(\b\S+...

这很有可能更好地使用nltk处理( 安装正确，那是)： from nltk.tokenize import sent_tokenize

string = "This is a sentence. This is another. And here one another, same line, starting with space. this sentence starts with lowercase letter. Here is a site you may know: google....

如果一行不包含句点，则split将返回一个元素：行本身： >>> "asdasd".split('.')

['asdasd']

所以你要计算行数加周期数。为什么要将文件拆分为行？ with open('words.txt', 'r') as file:

file_contents = file.read()

print('Total words: ', len(file_contents.split()))

print('total stops: '...

假设最后一个“。”(点)之后的数字是标签对于存储在文件'yourdata.txt'中的给定示例，以下代码应该生成2个列表sentence_list和label_list 。您可以根据您的要求单独将这些列表中的数据写入文件。 fmov=open('yourdata.txt','r')

sentence_list=[]

label_list=[]

for f in fmov.readlines():

lineinfo=f.split('.')

sentenceline=".".jo...

([!?.])(?=\s*[A-Z])\s*

你可以使用这个正则表达式在你的正则表达式之前创建句子。参见demo。放置\1\n 。 https://regex101.com/r/sH8aR8/5 x="I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"

print re.sub(r"([!?.])(?=\s*[A-Z])",...

The most splendid achievement of all is the constan

...

Python 编程语言具有很高的灵活性，它支持多种编程方法，包括过程化的、面向对象的和函数式的。但最重

...

python2和python3的区别，1.性能 Py3.0运行 pystone benchmark的速

...

Python的文件类型 Python有三种文件类型，分别是源代码文件、字节码文件和优化代码文件

源代

...

python的官网：http://www.python.org/ 有两个版本，就像struts1和st

...

好久没有写了，还不是近期刚过的期末考试和期中考试最近因为一个微信公众平台大赛在学phthon 找了本

...

weixin_39675289

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
labels用python 怎么用_用句子和标签在python中分割线条(splitting lines in python with sentences and labels)...

用句子和标签在python中分割线条(splitting lines in python with sentences and labels)我有一个带有句子和标签的文件样本。怎么能分成句子和标签？一部非常，非常非常缓慢，漫无目的的电影，讲述一个忧郁，漂泊的年轻人。 0不知道谁更迷失 - 扁平人物或观众，其中近一半人走了出去。 0这部电影以黑白和巧妙的摄影角度尝试艺术，令人失望 - 变得更加荒谬...
复制链接

扫一扫

labels用python 怎么用_用句子和标签在python中分割线条(splitting lines in python with sentences and labels)...

“相关推荐”对你有帮助么？