pythoncsv按内容切分_使用Python从csv文件中拆分并保存文本块

该博客讨论如何使用Python从CSV文件中读取每一行,并根据特定内容(如'ABC')将其拆分为多个文本文件。代码示例中,作者尝试通过去除空行和寻找特定标识符来分割数据,但在应用到CSV文件时遇到'ABC'不在列表中的错误。
摘要由CSDN通过智能技术生成

我想将csv文件的每一行拆分成多个文本块,并将它们保存为单独的文本文件(它只有1列,每行包含一个文本块).我的items_split函数与定义的文本块完全正常,但是当应用于csv文件时,我收到了错误

“File “untitled.py”, line 25, in items_split

idx = text_lines.index(“ABC”) + 1

ValueError: ‘ABC’ is not in list”

我使用的代码如下:

import re

import uuid

def items_split(file):

data=file

## First, we want to remove all empty lines in the text files

data = re.sub(r'\n\s*\n','\n',data,re.MULTILINE)

data = re.sub(r'\n\s*\n','\n',data,re.MULTILINE)

data = re.sub(r'\n\s*\n','\n',data,re.MULTILINE)

data = re.sub(r'\n\s*\n','\n',data,re.MULTILINE)

data = re.sub(r'\n\s*\n','\n',data,re.MULTILINE)

data = re.sub(r'\n\s*\n','\n',data,re.MULTILINE)

data = re.sub(r'\n\s*\n','\n',data,re.MULTILINE)

data = re.sub(r'\n\s*\n','\n',data,re.MULTILINE)

## Then, we remove all lines up to ABC

text_lines = data.split("\n")

idx = text_lines.index("ABC") + 1

data = "\n".join(text_lines[idx:])

## Last, we split the text files into multiple files, each with a news item

current_file = None

for line in data.split('\n'):

# Set initial filename,

if current_file == None and line != '':

current_file = str(uuid.uuid4()) + '.txt' #this will assign a random file name

#current_file = line + '.txt'

# This is to handle the blank line after Brief

if current_file == None:

continue

text_file = open(current_file, "a")

text_file.write(line + "\n")

text_file.close()

# Reset filename if we have finished this section

# which is idenfitied by:

# starts with Demographics - ^Demographics

# contains some random amount of text - .*

# ends with ) - )$

if re.match(r'^Demographics:.*\)$', line) is not None:

current_file = None

import csv

with open('Book1.csv', 'rb') as csvfile:

spamreader = csv.reader(csvfile, delimiter=',')

for row in spamreader:

items_split(row)

例如,csv文件中的每一行都如下所示:

“MEDIA News report

ABC

Topic 1 dzfffa a agasgeaherhryyeshdh

Demographics: 12,000 (male 16+) • 7,000 (female 16+)

Topic 2

fszg seez trbwtewtmytmutryrmujfcj

Demographics: 10,000 (male 16+) • 5,000 (female 16+)

Are you happy with this content? “

我想把它分成:

ABC

Topic 1 dzfffa a agasgeaherhryyeshdh

Demographics: 12,000 (male 16+) • 7,000 (female 16+)

Topic 2

fszg seez trbwtewtmytmutryrmujfcj

Demographics: 10,000 (male 16+) • 5,000 (female 16+)

Are you happy with this content? “

并将每个保存为单独的文本文件.我已经在文本本身上运行了这个功能,它完全正常.问题是当我在csv文件上运行它时,它不知道每行都是一个文本块,我试图将它转换成字符串等是徒劳的.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值