I have 243607 ips in the log file. the output of a function is displaying unique ips continuously so that i can't able to check whether the output ips are unique. So i want each ip to be print in seprate line. as i'm new to python i can't able to figure it out. is there any way to do it?
I also want the count of the ips printed
def unique_ips():
f = open('epiclogs.txt','r')
ips = set(line.split()[0]
for line in f:
if not line.isspace())
ip = line.split()[0]
ips.add(ip)
return ips
if name__=='__main':
print unique_ips()
解决方案
The requirements are not complete:
The format of the log file is unknown.
The format of the output file (e.g. sorted?)
My assumptions
The IP addresses are located in the first column
The output format should be '[count] [ip address]'
Test data
10.1.10.190 http://example.com/t1 404
10.1.10.171 http://example.com/t1 404
10.1.10.180 http://example.com/t2 200
10.1.10.190 http://example.com/t1 404
10.1.11.180 http://example.com/t3 302
Program
#!/usr/bin/env python
#
# Counts the IP addresses of a log file.
#
# Assumption: the IP address is logged in the first column.
# Example line: 10.1.10.190 http://example.com/t1 404
#
import sys
def extract_ip(line):
'''Extracts the IP address from the line.
Currently it is assumed, that the IP address is logged in
the first column and the columns are space separated.'''
return line.split()[0]
def increase_count(ip_dict, ip_addr):
'''Increases the count of the IP address.
If an IP address is not in the given dictionary,
it is initially created and the count is set to 1.'''
if ip_addr in ip_dict:
ip_dict[ip_addr] += 1
else:
ip_dict[ip_addr] = 1
def read_ips(infilename):
'''Read the IP addresses from the file and store (count)
them in a dictionary - returns the dictionary.'''
res_dict = {}
log_file = file(infilename)
for line in log_file:
if line.isspace():
continue
ip_addr = extract_ip(line)
increase_count(res_dict, ip_addr)
return res_dict
def write_ips(outfilename, ip_dict):
'''Write out the count and the IP addresses.'''
out_file = file(outfilename, "w")
for ip_addr, count in ip_dict.iteritems():
out_file.write("%5d\t%s\n" % (count, ip_addr))
out_file.close()
def parse_cmd_line_args():
'''Return the in and out file name.
If there are more or less than two parameters,
an error is logged in the program is exited.'''
if len(sys.argv)!=3:
print("Usage: %s [infilename] [outfilename]" % sys.argv[0])
sys.exit(1)
return sys.argv[1], sys.argv[2]
def main():
infilename, outfilename = parse_cmd_line_args()
ip_dict = read_ips(infilename)
write_ips(outfilename, ip_dict)
if __name__ == "__main__":
main()
Comment
I like small functions - each of them does exactly one thing. IMHO this makes the program easier to understand.