今天练习了一个redis与python结合的小实践,感觉很有意思,在此做一个记录。
根据输入的ip地址查找ip所在的城市/国家。
ip所属城市测试数据来源于 http://dev.maxmind.com/geoip/geolite, 有两个很重要的数据文件: GeoLiteCity-Blocks.csv and GeoLite-Location.csv
其中GeoLiteCity-Blocks.csv 记录了多个ip地址段以及这些地址段所属的城市id。
GeoLite-Location.csv则记录了城市ip与城市名,地区名以及国家及其他信息。
import json
import unittest
import redis
import csv
def ip_to_score(ip_address):
score = 0
for v in ip_address.split('.'):
score = score * 256 + int(v, 10)
return score
def import_ips_to_redis(conn, filename):
csv_file = csv.reader(open(filename,'rb'))
for count, row in enumerate(csv_file):
start_ip = row[0] if row else ''
city_id = row[2] + '_' + str(count)
conn.zadd('ip2cityid:', city_id, start_ip)
def import_cities_to_redis(conn, filename):
for row in csv.reader(open(filename,'rb')):
if len(row) < 4 or not row[0].isdigit():
continue
row = [i.decode('latin-1') for i in row]
city_id = row[0]
country = row[1]
region = row[2]
city = row[3]
infor = city + '_' + region + '_' + country
conn.hset('cityid2city:', city_id, infor)
def find_city_by_ip(conn, ip_address):
if isinstance(ip_address,str):
ip_address = ip_to_score(ip_address)
city_id = conn.zrevrangebyscore('ip2cityid:', ip_address, 0, start=0, num=1)
if not city_id:
return None
city_id = city_id[0].partition('_')[0]
return conn.hget('cityid2city:',city_id)
class TestCh05(unittest.TestCase):
def setUp(self):
global config_connection
import redis
self.conn = config_connection = redis.Redis(db=15)
self.conn.flushdb()
def tearDown(self):
self.conn.flushdb()
del self.conn
global config_connection, QUIT, SAMPLE_COUNT
config_connection = None
QUIT = False
SAMPLE_COUNT = 100
print
print
def test_ip_lookup(self):
conn = self.conn
try:
open('GeoLiteCity-Blocks.csv', 'rb')
open('GeoLiteCity-Location.csv', 'rb')
except:
print "********"
print "You do not have the GeoLiteCity database available, aborting test"
print "Please have the following two files in the current path:"
print "GeoLiteCity-Blocks.csv"
print "GeoLiteCity-Location.csv"
print "********"
return
print "Importing IP addresses to Redis... (this may take a while)"
import_ips_to_redis(conn, 'GeoLiteCity-Blocks.csv')
ranges = conn.zcard('ip2cityid:')
print "Loaded ranges into Redis:", ranges
print "Importing Location lookups to Redis... (this may take a while)"
import_cities_to_redis(conn, 'GeoLiteCity-Location.csv')
cities = conn.hlen('cityid2city:')
print "Loaded city lookups into Redis:", cities
print "Let's lookup some locations!"
print find_city_by_ip(conn, '10.11.12.12')
if __name__ == '__main__':
unittest.main()
由于GeoLiteCity-Blocks.csv and GeoLiteCity-Location.csv 数据很大,因此读入内存时需要花费一下时间,不过读入后,查询ip地址很迅速,如果你用grep程序手动操作,会有明显的差异。
输出结果:(只是测试使用,简化了两个文件)
Importing IP addresses to Redis... (this may take a while)
Loaded ranges into Redis: 2
Importing Location lookups to Redis... (this may take a while)
Loaded city lookups into Redis: 2
Let's lookup some locations!
Kefar Shemaryahu_05_IL