I am a python noob, and I am attempting to compare values between lines in two files and output the "line name" followed by a 1 if the line is in the second file and a 0 if the line is missing from the second file. The first iteration returns a 1, because that line is in the second file, but for the remaning > 1,000 lines, they all return a 0 regardless of whether they are in the second list or not. It seems as though the second "for loop" only executes on the first iteration. Any ideas on why? Here is my code:
import sys
file1 = sys.argv[1]
file2 = sys.argv[2]
name = str(file2)
f1 = open(file1, 'r')
f2 = open(file1, 'r')
o1 = open((name + '1.txt'), 'w')
for line in f1:
name = line.strip('\r\n')
count = 0
for line1 in f2:
if name == line1.strip('\r\n'):
count += 1
print (str(name) + '\t' + str(1))
o1.write(str(name) + '\t' + str(1) + '\r\n')
if count == 0:
print (str(name) + '\t' + str(0))
o1.write(str(name) + '\t' + str(0) + '\r\n')
f1.close()
f2.close()
o1.close()
Any help is very much appreciated!
After some changes, this is what I have and it only returns '1s'
f1 = open(file1, 'r') #opens files for reading
f2 = open(file2, 'r')
o1 = open((name + '1.txt'), 'w')
f2s = {line.strip('\n') for line in f2}
for line in f1:
line = line.strip('\n')
count = 0
if line in f2s:
count += 1
print (str(line) + '\t' + str(1))
o1.write(str(line) + '\t' + str(1) + '\n')
if count == 0:
print (str(line) + '\t' + str(0))
o1.write(str(line) + '\t' + str(0) + '\n')
Embarrassing, I was opening the same file twice. Rookie.
解决方案
f2 is an iterator over your second file, and when that has been read, it's exhausted.
You can reset the iterator f2.seek(0, 0), but that's not really the best way to go.
Better put all the values from f2 into a set and then iterate over f1 only once:
f2s = {line.strip('\n') for line in f2}
for line in f1:
name = line.strip('\n') # No need for \r\n
if name in f2s:
# etc.
If you need to count the number of occurences of each line from f1 in f2, then you can use a Counter:
from collections import Counter
f2c = Counter(line.strip('\n') for line in f2)
for line in f1:
name = line.strip('\n')
if name in f2c:
count = f2c[name]