python将一行作为字段,使用Python在一行中选择具有特定值范围的行

最新推荐文章于 2022-05-04 16:42:17 发布

好奇de梓洋

最新推荐文章于 2022-05-04 16:42:17 发布

阅读量156

点赞数

文章标签： python将一行作为字段

这篇博客解决了Python新手在处理数据文件时遇到的问题。内容涉及从多个tab分隔的文件中读取数据，筛选出年龄小于23岁的条目，并将这些条目分别写入两个不同的输出文件。博主分享了错误的尝试和正确的解决方案，利用csv模块实现更高效和可读性的代码。此解决方案适用于处理大量数据文件，并且可以轻松扩展到多个输入文件。

摘要由CSDN通过智能技术生成

I know this is simple, but I'm a new user to Python so I'm having a bit of trouble here. I'm using Python 3 by the way.

I have multiple files that look something like this:

NAME DATE AGE SEX COLOR

Name Date Age Sex Color

Ray May 25.1 M Gray

Alex Apr 22.3 F Green

Ann Jun 15.7 F Blue

(Pretend this is tab delimited. I should add that the real file will have about 3,000 rows and 17-18 columns)

What I want to do is select all the rows which have a value in the age column which is less than 23.

In this example, the output would be:

Name Date Age Sex Color

Alex Apr 22.3 F Green

Ann Jun 15.7 F Blue

Here's what I tried to do:

f = open("addressbook1.txt",'r')

line = f.readlines()

file_data =[line.split("\t")]

f.close()

for name, date, age, sex, color in file_data:

if age in line_data < 23:

g = open("college_age.txt",'a')

g.write(line)

else:

h = open("adult_age.txt",'a')

h.write(line)

Now, ideally, I have 20-30 of these "addressbook" inputfiles and I wanted this script to loop through them all and add all the entries with an age under 23 to the same output file ("college_age.txt"). I really don't need to keep the other lines, but I didn't know what else to do with them.

This script, when I run it, generates an error.

AttributeError: 'list' object has no attribute 'split'

Then I change the third line to:

file_data=[line.split("\t") for line in f.readlines()]

And it no longer gives me an error, but simply does nothing at all. It just starts and then starts.

Any help? :) Remember I'm dumb with Python.

I should have added that my actual data has decimals and are not integers. I have edited the data above to reflect that.

解决方案

The issue here is that you are using readlines() twice, which means that the data is read the first time, then nothing is left the second time.

You can iterate directly over the file without using readlines() - in fact, this is the better way, as it doesn't read the whole file in at once.

While you could do what you are trying to do by using str.split() as you have, the better option is to use the csv module, which is designed for the task.

import csv

with open("addressbook1.txt") as input, open("college_age.txt", "w") as college, open("adult_age.txt", "w") as adult:

reader = csv.DictReader(input, dialect="excel-tab")

fieldnames = reader.fieldnames

writer_college = csv.DictWriter(college, fieldnames, dialect="excel-tab")

writer_adult = csv.DictWriter(adult, fieldnames, dialect="excel-tab")

writer_college.writeheader()

writer_adult.writeheader()

for row in reader:

if int(row["Age"]) < 23:

writer_college.writerow(row)

else:

writer_adult.writerow(row)

So what are we doing here? First of all we use the with statement for opening files. It's not only more pythonic and readable but handles closing for you, even when exceptions occur.

Next we create a DictReader that reads rows from the file as dictionaries, automatically using the first row as the field names. We then make writers to write back to our split files, and write the headers in. Using the DictReader is a matter of preference. It's generally used more where you access the data a lot (and when you don't know the order of the columns), but it makes the code nice a readable here. You could, however, just use a standard csv.reader().

Next we loop through the rows in the file, checking the age (which we convert to an int so we can do a numerical comparison) to know what file to write to. The with statement closes out files for us.

For multiple input files:

import csv

fieldnames = ["Name", "Date", "Age", "Sex", "Color"]

filenames = ["addressbook1.txt", "addressbook2.txt", ...]

with open("college_age.txt", "w") as college, open("adult_age.txt", "w") as adult:

writer_college = csv.DictWriter(college, fieldnames, dialect="excel-tab")

writer_adult = csv.DictWriter(adult, fieldnames, dialect="excel-tab")

writer_college.writeheader()

writer_adult.writeheader()

for filename in filenames:

with open(filename, "r") as input:

reader = csv.DictReader(input, dialect="excel-tab")

for row in reader:

if int(row["Age"]) < 23:

writer_college.writerow(row)

else:

writer_adult.writerow(row)

We just add a loop in to work over multiple files. Please note that I also added a list of field names. Before I just used the fields and order from the file, but as we have multiple files, I figured it would be more sensible to do that here. An alternative would be to use the first file to get the field names.

好奇de梓洋

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python将一行作为字段,使用Python在一行中选择具有特定值范围的行

I know this is simple, but I'm a new user to Python so I'm having a bit of trouble here. I'm using Python 3 by the way.I have multiple files that look something like this:NAME DATE AGE SEX ...
复制链接

扫一扫