本篇是第二篇。
本篇要实现python实现清华大学联网助手(一)——urllib/hashlib/getopt/time/codecs的使用中提到的大部分功能。主要新增的功能是:查询流量、帐户余额、当前用户组等基本信息;查询本账号当前在线IP信息;查询每日流量使用明细,并生成直方图。而且需要访问的url不在是之前登录校园网的url。
如果你是希望找到以下问题的答案,请移步至python实现清华大学联网助手(一)——urllib/hashlib/getopt/time/codecs的使用。
1 如何利用hashlib处理密码;
2 如何实现带账号密码登录的网页request和urlopen;
3 如何实现带参数输入的自定义命令;
4 如何实现当前日期和时间的获取;
5 if __name__ == '__main__'的作用。
总结以下需要实现上述功能所遇到的问题:
1 文件的打开和读写和关闭;
2 cookiejar实现网页登录和cookie状态的保存;
3 网页登录跳转;
4 正则表达式re模块的使用方法和注意事项。
1 文件的打开和读写和关闭:
a) 获取当前的目录的绝对路径
os模块带有一个获取当前执行脚本(argv[0])的绝对路径的函数方法os.path.abspath(os.path.dirname(sys.argv[0]))。它通过os.path.dirname(sys.argv[0])获取当前执行脚本的名字并通过abspath找到它的绝对地址。
现在我们已经有了当前目录的绝对地址了。我们只需要将它与相对地址结合,就得到了我们要打开的文件地址。用到了os.path.join(base_path, relative_path)。
def create_path(relative_path):
base_path = os.path.abspath(os.path.dirname(sys.argv[0]))
return os.path.join(base_path, relative_path)
b) 文件的打开和读写和关闭
各个语言对文件打开/读写/关闭的语法都大同小异。
具体的语法请见Python的文件IO相关操作说明,我这里主要用到了open(),write_inline(),write(),close()。
def save_query(contents):
relative_path = 'USER_DETAIL_INFOMATION.LOG'
file_handler = open(create_path(relative_path), 'w')
write_inline(file_handler, info_header)
file_handler.write('\t\t\t\tDatetime: ' + time.strftime('%Y-%m-%d %H:%M:%S') + '\r\n' + '-' * 89 + '\r\n')
write_inline(file_handler, contents)
file_handler.write('\r\n')
file_handler.close()
我给存储查询结果的文件命名为USER_DETAIL_INFOMATION.LOG,打开通过上面的方法得到的文件绝对地址,写入数据后关闭文件。记住一定要关闭文件,否则会占用系统活跃文件inode列表,减少了系统可操作的文件数。具体写入的内容就是自己决定了。
2 cookiejar实现网页登录和cookie状态的保存:
需要载入http.cookiejar模块。
在python实现清华大学联网助手(一)——urllib/hashlib/getopt/time/codecs的使用这篇文章中,由于我们只需要登录校园网一次就可以完成联网操作。因此不必要在登陆状态下进行任何其它页面内的作业。但现在我们必须维持当前的登录状态,并且在各个跳转页面中抓取所需信息。因此我们就需要cookiejar,而不能仅单纯地使用urlopen()。
先讲一讲cookiejar的工作机制:
cookie是由服务器端生成并发送给User-Agent(比如浏览器),浏览器将这个键值对(username, password)保存到某个文本文件中,下次继续登录该网址基址下的各个页面时,都附带这个cookie状态,保证页面能够正常访问。cookiejar的实例就是模拟浏览器保存用户名和密码这个键值对的容器,我们将这个cookiejar实例作为参数传递给opener,那么我们只需一次登录就能记住这个cookie状态,下次请求页面时自动附带cookie状态。
应注意,python3并没有urllib2,因此代码可能与python2有所不同。
def create_opener():
cookie = http.cookiejar.CookieJar()
cookie_proc = urllib.request.HTTPCookieProcessor(cookie)
return urllib.request.build_opener(cookie_proc)
def response_login(login_data):
request_url = urllib.request.Request(query_login_url, login_data.encode())
response_url = urllib.request.urlopen(request_url)
return response_url.read().decode()
def query_login(username, password):
hashcd_md5 = hashlib.md5()
hashcd_md5.update(password.encode())
tr_password = hashcd_md5.hexdigest()
login_data = 'user_login_name=' + username + '&user_password=' + tr_password + '&action=login'
urllib.request.install_opener(create_opener())
answer = response_login(login_data)
if answer == 'ok':
return True
else:
return False
def query_logout():
logout_data = 'action=logout'
request_url = urllib.request.Request(query_logout_url, logout_data.encode())
response_url = urllib.request.urlopen(request_url)
print ('Your flux details and other infomations are saved in USER_DETAIL_INFOMATION.LOG under the SAME directory')
基本步骤:
a) cookie = http.cookiejar.CookieJar();
b) cookie_proc = urllib.request.HTTPCookieProcessor(cookie),生成一个cookiejar实例;
c) bopener = urllib.request.build_opener(cookie_proc),创建一个opener;
d) urllib.request.install_opener(bopener),装载该opener;
e) 像普通访问需要登录url一样,request和urlopen,这是第一次登录,cookie信息会被记录和保存;
f) 所有其他子页面都用urlopen即可。
3 网页登录跳转:
网页的响应正文可能会提供跳转的url。普通的header跳转urllib会自动完成,而js跳转则需要你通过一些手段去获取真实的跳转网址。由于我们所需访问的网页并不存在这个问题,这里不再赘述。感兴趣可以google。
4 正则表达式re模块的使用方法和注意事项:
抓取了页面的所有信息,我们需要解析网页,提取对我们有用的信息。这也一度让我非常头疼。有很多可以用来解析网页内容的优秀模块,比如BeautifulSoup。但我必须考虑没有安装这个包的用户,因此最后决定用正则表达式re模块完成所需信息的提取。
首先需要学习正则表达式。看起来十分复杂,但会用到语法可能并不多,推荐这个教程:正则表达式30分钟入门教程。尤其应注意正则表达式的贪婪和懒惰特性。
然后是re模块的几个常用函数。re.sub(),re.search(),re.findall()。
a) re.sub():
sub是substitute的缩写,表示代替。接受3个参数,一个正则表达式,一个代替字符串,一个待处理的字符串内容。比如done = re.sub('<[^>]+>| |[\n\t]+|-->',' ',raw)表示将raw字符串中符合这个'<[^>]+>| |[\n\t]+|-->'正则表达式的部分都用‘’代替(这意味着删除这个符合正则表达式的部分),同时返回处理后的字符串。如果学过上面正则表达式教程的话,应该明白'<[^>]+>| |[\n\t]+|-->'是表示删除raw中所有的标签、空格、回车、制表符和‘-->’形式的字符串。
b) re.search():
search()需要2参数,一个正则表达式和待处理字符串。返回匹配对象。通过group()方法获得字符串。group(n)表示返回匹配的第n个字符串,group()表示返回第一个匹配字符串。而groups()方法比较复杂,是返回所有成功匹配的字符串的一个组合。
match = re.search('用户名.*?(元) ', done)
done = match.group()
c) re.findall():
re.findall()是获取字符串中所有匹配的字符串。
d) re.compile():
这是将一个正则表达式编译成一个正则表达式对象,可以多次使用,提高效率。
BONUS:最后提示一个让脚本暂停一段时间继续执行的函数,time.sleep(gap),gap为秒数。这样可以防止过于频繁的请求动作。
提供实现代码,一个分为3个.py文件:
pytunet_query.py
import sys, time, os
import urllib.request, hashlib, http.cookiejar
import codecs, re
query_login_url = 'https://usereg.tsinghua.edu.cn/do.php'
user_info_url = 'https://usereg.tsinghua.edu.cn/user_info.php'
online_state_url = 'https://usereg.tsinghua.edu.cn/online_user_ipv4.php'
query_logout_url = 'https://usereg.tsinghua.edu.cn/do.php'
info_header = [ '#' * 89,
'#\t\t\t\tUser Flux Detail Infomation\t\t\t\t#',
'#' * 89]
#########################################################
# File I/O Modules #
#########################################################
def create_path():
relative_path = 'USER_DETAIL_INFOMATION.LOG'
base_path = os.path.abspath(os.path.dirname(sys.argv[0]))
return os.path.join(base_path, relative_path)
def write_inline(file_handler, contents):
for line in contents:
file_handler.write(line + '\r\n')
def save_query(contents):
file_handler = open(create_path(), 'w')
write_inline(file_handler, info_header)
file_handler.write('\t\t\t\tDatetime: ' + time.strftime('%Y-%m-%d %H:%M:%S') + '\r\n' + '-' * 89 + '\r\n')
write_inline(file_handler, contents)
file_handler.write('\r\n')
file_handler.close()
#########################################################
# Connection Modules #
#########################################################
def create_opener():
cookie = http.cookiejar.CookieJar()
cookie_proc = urllib.request.HTTPCookieProcessor(cookie)
return urllib.request.build_opener(cookie_proc)
def response_login(login_data):
request_url = urllib.request.Request(query_login_url, login_data.encode())
response_url = urllib.request.urlopen(request_url)
return response_url.read().decode()
#########################################################
# Main Login/Logout Modules #
#########################################################
def query_login(username, password):
hashcd_md5 = hashlib.md5()
hashcd_md5.update(password.encode())
tr_password = hashcd_md5.hexdigest()
login_data = 'user_login_name=' + username + '&user_password=' + tr_password + '&action=login'
urllib.request.install_opener(create_opener())
answer = response_login(login_data)
if answer == 'ok':
return True
else:
return False
def query_logout():
logout_data = 'action=logout'
request_url = urllib.request.Request(query_logout_url, logout_data.encode())
response_url = urllib.request.urlopen(request_url)
print ('Your flux details and other infomations are saved in USER_DETAIL_INFOMATION.LOG under the SAME directory')
#########################################################
# Data Post-Process Modules #
#########################################################
def post_process(info):
end_time = time.strftime('%Y-%m-%d')
start_time = end_time[:8:] + '01'
flux_detail_url = 'https://usereg.tsinghua.edu.cn/user_detail_list.php?action=balance2&user_login_name=&user_real_name=&desc=&order=&start_time=' + start_time + '&end_time=' + end_time + '&user_ip=&user_mac=&nas_ip=&nas_port=&is_ipv6=0&page=1&offset=200'
response_usr = urllib.request.urlopen(user_info_url)
response_state = urllib.request.urlopen(online_state_url)
response_details = urllib.request.urlopen(flux_detail_url)
info = flux_account_query(info, response_usr)
info = online_state_query(info, response_state)
info = flux_detail_query(info, response_details)
return info
#########################################################
# Integrated Query Modules #
#########################################################
flux_account_keys = ('用户名', '用户组', '姓名', '证件号', '当前计费组', '使用时长(IPV4)', '使用流量(IPV4)',
'使用时长(IPV6)', '使用流量(IPV6)', '帐户余额')
online_state_keys = ()
flux_detail_keys = ()
#Auxiliary Function
def turn_key(key):
if key[-5:-1] == 'byte':
flux, unit = key.split('(')
flux = float(flux) / 1024 / 1024
new_key = '-->' + str(int(flux)) + '(MB)'
key += new_key
return key
def get_days(year, month):
month_length = (31,28,31,30,31,30,31,31,30,31,30,31)
month_length_leap = (31,29,31,30,31,30,31,31,30,31,30,31)
if year % 400 == 0 or year % 100 != 0 and year % 4 == 0:
return month_length_leap[month-1]
else:
return month_length[month-1]
def solve_flux(flux):
unit = flux[-1]
val = float(flux[:len(flux)-1:])
if unit == 'B':
val /= 1024 * 1024
elif unit == 'K':
val /= 1024
elif unit == 'G':
val *= 1024
return int(val)
def trans_content(response):
raw = response.read().decode('gb2312')
raw = re.sub('<[^>]+>| |[\n\t]+|-->',' ',raw)
raw = re.sub(' +', ' ', raw)
return raw
def push_front(figure, line):
tf = []
tf.append(line)
return tf + figure
def display_fluxAccount_onlineState(info):
print()
for line in info:
if line[0] != '-':
print(line)
else:
print()
def display_flux_detail(fluxin, year, month, day):
maxflux = 0
divide = 10
figure = []
for flux in fluxin:
if flux > maxflux:
maxflux = flux
top = str(int(maxflux)) + 'MB|'
length = len(top)
mid = str(int(maxflux / 2)) + 'MB|'
mid = ' ' * (length - len(mid)) + mid
bottom = '0MB|'
bottom = ' ' * (length - len(bottom)) + bottom
unit = maxflux / divide
for i in range(day):
fluxin[i] = int(fluxin[i] / unit)
for i in range(divide):
line = ''
if i == divide - 1:
line = top
elif i == int((divide - 1) / 2):
line = mid
elif i == 0:
line = bottom
else:
line = ' ' * (length - 1) + '|'
for j in range(day):
if fluxin[j] > 0:
line += '**'
fluxin[j] -= 1
else:
line += ' '
figure = push_front(figure, line)
figure = push_front(figure, '**每日流量使用统计列表**')
figure.append(' ' * length + '--' * day)
date_front = str(year) + '-' + str(month) +'-' + '1'
date_rear = str(year) + '-' + str(month) +'-' + str(day)
date_mid = str(year) + '-' + str(month) +'-' + '15'
figure.append(' %s\t\t\t%s\t\t\t%s' %(date_front, date_mid, date_rear))
for line in figure:
print(line)
print()
#Integrated Query
def flux_account_query(info, response):
info.append('**用户基本信息**')
done = trans_content(response)
match = re.search('用户名.*?(元) ', done)
done = match.group()
tlist = done.split(' ')
line = ''
for key in tlist:
if line != '':
key = turn_key(key)
line = line + '\t: ' + key
info.append(line)
line = ''
elif key in flux_account_keys:
line = key
info.append('-' * 89)
return info
def online_state_query(info, response):
info.append('**用户在线状态**')
done = trans_content(response)
match = re.search('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}.*\d{2}:\d{2}', done)
if match == None:
info.append('当前没有任何IP在线')
else:
info.append('在线IP地址\t登陆日期\t登陆时间')
done = match.group()
tlist = done.split(' ')
line = ''
count = 0
for key in tlist:
if count == 0:
if re.search('\d{1,3}\.\d{1,3}\.\d{1,3}.\d{1,3}', key) != None:
line = key
count += 1
elif count == 1:
if re.search('\d{4}-\d{2}-\d{2}', key) != None:
line += '\t' + key
count += 1
elif count == 2:
if re.search('\d{2}:\d{2}:\d{2}', key) != None:
line += '\t' + key
info.append(line)
line = ''
count = 0
info.append('-' * 89)
display_fluxAccount_onlineState(info);
return info
def flux_detail_query(info, response):
info.append('**每日流量使用统计列表**')
info.append('登出日期\t入流量\t出流量')
done = trans_content(response)
year, month = time.strftime('%Y %m').split(' ')
days = get_days(int(year), int(month))
tlist = done.split(' ')
fluxin_perday = [0 for i in range(days)]
fluxout_perday = [0 for i in range(days)]
offline_date = True
count = 0
for key in tlist:
if re.search('\d{4}-\d{2}-\d{2}', key) and offline_date:
offline_date = False
elif re.search('\d{4}-\d{2}-\d{2}', key) and not offline_date:
offline_date = True
year, month, day = key.split('-')
iday = int(day)
elif re.search('\d+[.]\d*[BKMG]', key) and count == 0:
fluxin_perday[iday-1] += solve_flux(key)
count += 1
elif re.search('\d+[.]\d*[BKMG]', key) and count == 1:
fluxout_perday[iday-1] += solve_flux(key)
count += 1
elif re.search('\d+[.]\d*[BKMG]', key) and count == 2:
count = 0
for i in range(days):
if i + 1 < 10:
d = '0' + str(i + 1)
else:
d = str(i + 1)
info.append('%s\t%s\t%s' %(time.strftime('%Y-%m-') + d, str(fluxin_perday[i]) + 'MB', str(fluxout_perday[i]) + 'MB'))
display_flux_detail(fluxin_perday, int(year), int(month), days)
return info
#########################################################
# Main Part #
#########################################################
def tunet_query(username, password):
print('FETCHING DATA FROM http://usereg.tsinghua.edu.cn, PLEASE WAIT A MOMENT...')
is_login = query_login(username, password)
if is_login:
info = []
info = post_process(info)
save_query(info)
query_logout()
else:
print ('CAN\'T CAPTURE YOUR FLUX DATA, PLEASE TRY AGAIN LATER')
def pytunet_query():
username = 'hhy14'
password = '123456'
tunet_query(username, password)
if __name__ == '__main__':
pytunet_query()
pytunet_connect.py:
import time
import urllib.request, hashlib
import codecs
login_url = 'http://net.tsinghua.edu.cn/cgi-bin/do_login'
logout_url = 'http://net.tsinghua.edu.cn/cgi-bin/do_logout'
check_url = 'http://net.tsinghua.edu.cn/cgi-bin/do_login'
query_url = 'https://usereg.tsinghua.edu.cn/login.php'
times_cnt = {1: 'FIRST', 2: 'SECOND', 3: 'THIRD', 4: 'FORTH', 5: 'FIFTH'}
ret_type = {'logout_ok' : 'LOGOUT SUCCESS',
'not_online_error' : 'NOT ONLINE',
'ip_exist_error' : 'IP ALREADY EXISTS',
'user_tab_error' : 'THE CERTIFICATION PROGRAM WAS NOT STARTED',
'username_error' : 'WRONG USERNAME',
'user_group_error' : 'ACCOUNT INFOMATION INCORRECT',
'password_error' : 'WRONG PASSWORD',
'status_error' : 'ACCOUNT OVERDUE, PLEASE RECHARGE',
'available_error' : 'ACCOUNT HAS BEEN SUSPENDED',
'delete_error' : 'ACCOUNT HAS BEEN DELETED',
'usernum_error' : 'USERS NUMBER LIMITED',
'online_num_error' : 'USERS NUMBER LIMITED',
'mode_error' : 'DISABLE WEB REGISTRY',
'time_policy_error': 'CURRENT TIME IS NOT ALLOWED TO CONNECT',
'flux_error' : 'FLUX OVER',
'ip_error' : 'IP NOT VALID',
'mac_error' : 'MAC NOT VALID',
'sync_error' : 'YOUR INFOMATION HAS BEEN MODIFIED, PLEASE TRY AGAIN AFTER 2 MINUTES',
'ip_alloc' : 'THE IP HAS BEEN ASSIGNED TO OTHER USER'
}
version = '1.1'
sleep_time = 8
def trans_content(response):
content = response.read().decode()
ret = ''
for ch in content:
if ch.isalpha() or ch == '_':
ret += ch
return ret
def tunet_login(username, password):
hashcd_md5 = hashlib.md5()
hashcd_md5.update(password.encode())
tr_password = hashcd_md5.hexdigest()
login_data = 'username=' + username + '&password=' + tr_password + '&drop=0&type=1&n=100'
login_data = login_data.encode()
request_url = urllib.request.Request(login_url, login_data)
response_url = urllib.request.urlopen(request_url)
ret = trans_content(response_url)
print (ret_type.get(ret, 'CONNECTED'))
return ret
def tunet_logout():
response_url = urllib.request.urlopen(logout_url)
ret = trans_content(response_url)
print (ret_type.get(ret, 'CONNECTED'))
return ret
def tunet_check():
check_data = 'action=check_online'
check_data = check_data.encode()
request_url = urllib.request.Request(check_url, check_data)
response_url = urllib.request.urlopen(request_url)
ret = trans_content(response_url)
if ret == '':
print ('NOT ONLINE')
else:
print (ret_type.get(ret, 'CONNECTED'))
return ret
def tunet_help():
print ('-h, --help : show all options of Tsinghua University Internet Connector')
print ('-v, --version: show version of Tsinghua University Internet Connector')
print ('-u : input your username after \'-u\'')
print ('-p : input your password after \'-p\'')
print ('-a : enter username and password later, you can login other campus network account')
print ('-i, --login : login operation')
print ('-o, --logout : logout operation')
print ('-c, --check : check the internet')
print ('-q, --query : query basic infomation, online state and flux usage details')
def tunet_version():
print ('Tsinghua University Internet Connector ', version)
def tunet_others():
print ('UNKNOWN OPTIONS')
print ('WHICH OPTION DO YOU WANT?')
tunet_help()
print ('IF ANY ERROR, PLEASE CONTACT im@huhaoyu.com.')
def tunet_connect(username, password):
ret = 'ip_exist_error'
for count in range(5):
print ('%s attempts to connect...' % times_cnt.get(count + 1))
if ret != tunet_login(username, password):
break
if count == 4:
print ('please try to reconnect after 1 minute')
break
print ('try to reconnect after %s seconds' %sleep_time)
time.sleep(sleep_time)
print ()
pytunet.py:
#########################################################
# Welcome #
# Tsinghua University Internet Connector in Python #
# Version: v1.1 #
# Date: 2015/04/03 #
# By: Haoyu hu Email: im@huhaoyu.com #
# Address: Tsinghua University #
#########################################################
import pytunet_connect
import pytunet_query
import sys, getopt, getpass
def pytunet():
username = 'hhy14'
password = '123456'
try:
options, args = getopt.getopt(sys.argv[1:], 'achiop:qu:v', ['help', 'login', 'logout', 'check', 'version', 'query'])
except getopt.GetoptError:
pytunet_connect.tunet_others()
sys.exit(1)
want_login = False
want_query = False
flag = False
name, value = None, None
for name, value in options:
if name in ('-h', '--help'):
pytunet_connect.tunet_help()
sys.exit(0)
elif name in ('-v', '--version'):
pytunet_connect.tunet_version()
sys.exit(0)
elif name == '-a':
flag = True
elif name == '-u':
username = value
elif name == '-p':
password = value
elif name in ('-i', '--login'):
want_login = True
elif name in ('-o', '--logout'):
pytunet_connect.tunet_logout()
sys.exit(0)
elif name in ('-c', '--check'):
pytunet_connect.tunet_check()
sys.exit(0)
elif name in ('-q', '--query'):
want_query = True
if flag:
username = input('username: ')
password = getpass.getpass('password: ')
print (password)
if want_query:
pytunet_query.tunet_query(username, password)
if want_login:
pytunet_connect.tunet_connect(username, password)
if not want_query and not want_login:
print ('WARNING: YOU JUST DIDN\'T DO ANYTHING! IF YOU WANT TO CONNECT TO THE CAMPUS NETWORK, THE COMMAND MUST INCLUDE -i OR --login')
print()
pytunet_connect.tunet_help()
if __name__ == '__main__':
pytunet()
代码或分析有误请批评指正,谢谢!