python实现清华大学联网助手(二)——正则表达式re/网页跳转/cookiejar

本篇是第二篇

本篇要实现python实现清华大学联网助手(一)——urllib/hashlib/getopt/time/codecs的使用中提到的大部分功能。主要新增的功能是:查询流量、帐户余额、当前用户组等基本信息;查询本账号当前在线IP信息;查询每日流量使用明细,并生成直方图。而且需要访问的url不在是之前登录校园网的url。

如果你是希望找到以下问题的答案,请移步至python实现清华大学联网助手(一)——urllib/hashlib/getopt/time/codecs的使用

1 如何利用hashlib处理密码;

2 如何实现带账号密码登录的网页request和urlopen;

3 如何实现带参数输入的自定义命令;

4 如何实现当前日期和时间的获取;

5 if __name__ == '__main__'的作用。


总结以下需要实现上述功能所遇到的问题:

1 文件的打开和读写和关闭;

2 cookiejar实现网页登录和cookie状态的保存;

3 网页登录跳转;

4 正则表达式re模块的使用方法和注意事项。


1 文件的打开和读写和关闭:

a) 获取当前的目录的绝对路径

os模块带有一个获取当前执行脚本(argv[0])的绝对路径的函数方法os.path.abspath(os.path.dirname(sys.argv[0]))。它通过os.path.dirname(sys.argv[0])获取当前执行脚本的名字并通过abspath找到它的绝对地址。

现在我们已经有了当前目录的绝对地址了。我们只需要将它与相对地址结合,就得到了我们要打开的文件地址。用到了os.path.join(base_path, relative_path)。

def create_path(relative_path):
 	base_path = os.path.abspath(os.path.dirname(sys.argv[0]))
 	return os.path.join(base_path, relative_path)

b) 文件的打开和读写和关闭

各个语言对文件打开/读写/关闭的语法都大同小异。

具体的语法请见Python的文件IO相关操作说明,我这里主要用到了open(),write_inline(),write(),close()。

def save_query(contents):
	relative_path = 'USER_DETAIL_INFOMATION.LOG'
	file_handler = open(create_path(relative_path), 'w')
	write_inline(file_handler, info_header)
	file_handler.write('\t\t\t\tDatetime: ' + time.strftime('%Y-%m-%d %H:%M:%S') + '\r\n' + '-' * 89 + '\r\n')
	write_inline(file_handler, contents)
	file_handler.write('\r\n')
	file_handler.close()

我给存储查询结果的文件命名为USER_DETAIL_INFOMATION.LOG,打开通过上面的方法得到的文件绝对地址,写入数据后关闭文件。记住一定要关闭文件,否则会占用系统活跃文件inode列表,减少了系统可操作的文件数。具体写入的内容就是自己决定了。


2 cookiejar实现网页登录和cookie状态的保存:

需要载入http.cookiejar模块。

python实现清华大学联网助手(一)——urllib/hashlib/getopt/time/codecs的使用这篇文章中,由于我们只需要登录校园网一次就可以完成联网操作。因此不必要在登陆状态下进行任何其它页面内的作业。但现在我们必须维持当前的登录状态,并且在各个跳转页面中抓取所需信息。因此我们就需要cookiejar,而不能仅单纯地使用urlopen()。

先讲一讲cookiejar的工作机制

cookie是由服务器端生成并发送给User-Agent(比如浏览器),浏览器将这个键值对(username, password)保存到某个文本文件中,下次继续登录该网址基址下的各个页面时,都附带这个cookie状态,保证页面能够正常访问。cookiejar的实例就是模拟浏览器保存用户名和密码这个键值对的容器,我们将这个cookiejar实例作为参数传递给opener,那么我们只需一次登录就能记住这个cookie状态,下次请求页面时自动附带cookie状态

应注意,python3并没有urllib2,因此代码可能与python2有所不同

def create_opener():
	cookie = http.cookiejar.CookieJar()
	cookie_proc = urllib.request.HTTPCookieProcessor(cookie)
	return urllib.request.build_opener(cookie_proc)

def response_login(login_data):
	request_url = urllib.request.Request(query_login_url, login_data.encode())
	response_url = urllib.request.urlopen(request_url)
	return response_url.read().decode()

def query_login(username, password):
	hashcd_md5 = hashlib.md5()
	hashcd_md5.update(password.encode())
	tr_password = hashcd_md5.hexdigest()
	login_data = 'user_login_name=' + username + '&user_password=' + tr_password + '&action=login'
	urllib.request.install_opener(create_opener())
	answer = response_login(login_data)
	if answer == 'ok':
		return True
	else:
		return False

def query_logout():
	logout_data = 'action=logout'
	request_url = urllib.request.Request(query_logout_url, logout_data.encode())
	response_url = urllib.request.urlopen(request_url)
	print ('Your flux details and other infomations are saved in USER_DETAIL_INFOMATION.LOG under the SAME directory')
基本步骤

a) cookie = http.cookiejar.CookieJar();

b) cookie_proc = urllib.request.HTTPCookieProcessor(cookie),生成一个cookiejar实例;

c) bopener = urllib.request.build_opener(cookie_proc),创建一个opener;

d) urllib.request.install_opener(bopener),装载该opener;

e) 像普通访问需要登录url一样,request和urlopen,这是第一次登录,cookie信息会被记录和保存;

f) 所有其他子页面都用urlopen即可。


3 网页登录跳转:

网页的响应正文可能会提供跳转的url。普通的header跳转urllib会自动完成,而js跳转则需要你通过一些手段去获取真实的跳转网址。由于我们所需访问的网页并不存在这个问题,这里不再赘述。感兴趣可以google。


4 正则表达式re模块的使用方法和注意事项:

抓取了页面的所有信息,我们需要解析网页,提取对我们有用的信息。这也一度让我非常头疼。有很多可以用来解析网页内容的优秀模块,比如BeautifulSoup。但我必须考虑没有安装这个包的用户,因此最后决定用正则表达式re模块完成所需信息的提取。

首先需要学习正则表达式。看起来十分复杂,但会用到语法可能并不多,推荐这个教程:正则表达式30分钟入门教程。尤其应注意正则表达式的贪婪和懒惰特性

然后是re模块的几个常用函数。re.sub(),re.search(),re.findall()。

a) re.sub():

sub是substitute的缩写,表示代替。接受3个参数,一个正则表达式,一个代替字符串,一个待处理的字符串内容。比如done = re.sub('<[^>]+>|&nbsp;|[\n\t]+|-->',' ',raw)表示将raw字符串中符合这个'<[^>]+>|&nbsp;|[\n\t]+|-->'正则表达式的部分都用‘’代替(这意味着删除这个符合正则表达式的部分),同时返回处理后的字符串。如果学过上面正则表达式教程的话,应该明白'<[^>]+>|&nbsp;|[\n\t]+|-->'是表示删除raw中所有的标签、空格、回车、制表符和‘-->’形式的字符串。

b) re.search():

search()需要2参数,一个正则表达式和待处理字符串。返回匹配对象。通过group()方法获得字符串。group(n)表示返回匹配的第n个字符串,group()表示返回第一个匹配字符串。而groups()方法比较复杂,是返回所有成功匹配的字符串的一个组合。

match = re.search('用户名.*?(元) ', done)
done = match.group()
c) re.findall():

re.findall()是获取字符串中所有匹配的字符串。

d) re.compile():

这是将一个正则表达式编译成一个正则表达式对象,可以多次使用,提高效率。


BONUS:最后提示一个让脚本暂停一段时间继续执行的函数,time.sleep(gap),gap为秒数。这样可以防止过于频繁的请求动作。


提供实现代码,一个分为3个.py文件:

pytunet_query.py

import sys, time, os
import urllib.request, hashlib, http.cookiejar
import codecs, re

query_login_url  = 'https://usereg.tsinghua.edu.cn/do.php'
user_info_url = 'https://usereg.tsinghua.edu.cn/user_info.php'
online_state_url = 'https://usereg.tsinghua.edu.cn/online_user_ipv4.php'
query_logout_url = 'https://usereg.tsinghua.edu.cn/do.php'

info_header = [ '#' * 89,
				'#\t\t\t\tUser Flux Detail Infomation\t\t\t\t#',
				'#' * 89]

#########################################################
#					File I/O Modules					#
#########################################################

def create_path():
 	relative_path = 'USER_DETAIL_INFOMATION.LOG'
 	base_path = os.path.abspath(os.path.dirname(sys.argv[0]))
 	return os.path.join(base_path, relative_path)

def write_inline(file_handler, contents):
	for line in contents:
		file_handler.write(line + '\r\n')

def save_query(contents):
	file_handler = open(create_path(), 'w')
	write_inline(file_handler, info_header)
	file_handler.write('\t\t\t\tDatetime: ' + time.strftime('%Y-%m-%d %H:%M:%S') + '\r\n' + '-' * 89 + '\r\n')
	write_inline(file_handler, contents)
	file_handler.write('\r\n')
	file_handler.close()

#########################################################
#					Connection Modules					#
#########################################################

def create_opener():
	cookie = http.cookiejar.CookieJar()
	cookie_proc = urllib.request.HTTPCookieProcessor(cookie)
	return urllib.request.build_opener(cookie_proc)

def response_login(login_data):
	request_url = urllib.request.Request(query_login_url, login_data.encode())
	response_url = urllib.request.urlopen(request_url)
	return response_url.read().decode()

#########################################################
#				Main Login/Logout Modules				#
#########################################################

def query_login(username, password):
	hashcd_md5 = hashlib.md5()
	hashcd_md5.update(password.encode())
	tr_password = hashcd_md5.hexdigest()
	login_data = 'user_login_name=' + username + '&user_password=' + tr_password + '&action=login'
	urllib.request.install_opener(create_opener())
	answer = response_login(login_data)
	if answer == 'ok':
		return True
	else:
		return False

def query_logout():
	logout_data = 'action=logout'
	request_url = urllib.request.Request(query_logout_url, logout_data.encode())
	response_url = urllib.request.urlopen(request_url)
	print ('Your flux details and other infomations are saved in USER_DETAIL_INFOMATION.LOG under the SAME directory')

#########################################################
#				Data Post-Process Modules				#
#########################################################

def post_process(info):
	end_time = time.strftime('%Y-%m-%d')
	start_time = end_time[:8:] + '01'
	flux_detail_url = 'https://usereg.tsinghua.edu.cn/user_detail_list.php?action=balance2&user_login_name=&user_real_name=&desc=&order=&start_time=' + start_time + '&end_time=' + end_time + '&user_ip=&user_mac=&nas_ip=&nas_port=&is_ipv6=0&page=1&offset=200'	

	response_usr = urllib.request.urlopen(user_info_url)
	response_state = urllib.request.urlopen(online_state_url)
	response_details = urllib.request.urlopen(flux_detail_url)

	info = flux_account_query(info, response_usr)
	info = online_state_query(info, response_state)
	info = flux_detail_query(info, response_details)
	
	return info

#########################################################
#				Integrated Query Modules				#
#########################################################

flux_account_keys = ('用户名', '用户组', '姓名', '证件号', '当前计费组', '使用时长(IPV4)', '使用流量(IPV4)',
					 '使用时长(IPV6)', '使用流量(IPV6)', '帐户余额')
online_state_keys = ()
flux_detail_keys  = ()

#Auxiliary Function
def turn_key(key):
	if key[-5:-1] == 'byte':
		flux, unit = key.split('(')
		flux = float(flux) / 1024 / 1024
		new_key = '-->' + str(int(flux)) + '(MB)'
		key += new_key
	return key


def get_days(year, month):
	month_length = (31,28,31,30,31,30,31,31,30,31,30,31)
	month_length_leap = (31,29,31,30,31,30,31,31,30,31,30,31)
	if year % 400 == 0 or year % 100 != 0 and year % 4 == 0:
		return month_length_leap[month-1]
	else:
		return month_length[month-1]

def solve_flux(flux):
	unit = flux[-1]
	val = float(flux[:len(flux)-1:])

	if unit == 'B':
		val /= 1024 * 1024
	elif unit == 'K':
		val /= 1024
	elif unit == 'G':
		val *= 1024

	return int(val)

def trans_content(response):
	raw = response.read().decode('gb2312')
	raw = re.sub('<[^>]+>| |[\n\t]+|-->',' ',raw)
	raw = re.sub(' +', ' ', raw)
	return raw

def push_front(figure, line):
	tf = []
	tf.append(line)
	return tf + figure

def display_fluxAccount_onlineState(info):
	print()
	for line in info:
		if line[0] != '-':
			print(line)
		else:
			print()

def display_flux_detail(fluxin, year, month, day):
	maxflux = 0
	divide = 10
	figure = []
	for flux in fluxin:
		if flux > maxflux:
			maxflux = flux

	top = str(int(maxflux)) + 'MB|'
	length = len(top)
	mid = str(int(maxflux / 2)) + 'MB|'
	mid = ' ' * (length - len(mid)) + mid
	bottom = '0MB|'
	bottom = ' ' * (length - len(bottom)) + bottom
	unit = maxflux / divide

	for i in range(day):
		fluxin[i] = int(fluxin[i] / unit)

	for i in range(divide):
		line = ''
		if i == divide - 1:
			line = top
		elif i == int((divide - 1) / 2):
			line = mid
		elif i == 0:
			line = bottom
		else:
			line = ' ' * (length - 1) + '|'
		for j in range(day):
			if fluxin[j] > 0:
				line += '**'
				fluxin[j] -= 1
			else:
				line += '  '

		figure = push_front(figure, line)

	figure = push_front(figure, '**每日流量使用统计列表**')
	figure.append(' ' * length + '--' * day)
	date_front = str(year) + '-' + str(month) +'-' + '1'
	date_rear  = str(year) + '-' + str(month) +'-' + str(day)
	date_mid   = str(year) + '-' + str(month) +'-' + '15'
	figure.append(' %s\t\t\t%s\t\t\t%s' %(date_front, date_mid, date_rear))

	for line in figure:
		print(line)

	print()

#Integrated Query
def flux_account_query(info, response):
	info.append('**用户基本信息**')
	done = trans_content(response)
	match = re.search('用户名.*?(元) ', done)
	done = match.group()
	tlist = done.split(' ')
	line = ''

	for key in tlist:
		if line != '':
			key = turn_key(key)
			line = line + '\t: ' + key
			info.append(line)
			line = ''
		elif key in flux_account_keys:
			line = key

	info.append('-' * 89)
	return info

def online_state_query(info, response):
	info.append('**用户在线状态**')
	
	done = trans_content(response)
	match = re.search('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}.*\d{2}:\d{2}', done)

	if match == None:
		info.append('当前没有任何IP在线')
	else:
		info.append('在线IP地址\t登陆日期\t登陆时间')
		done = match.group()
		tlist = done.split(' ')
		line = ''
		count = 0

		for key in tlist:
			if count == 0:
				if re.search('\d{1,3}\.\d{1,3}\.\d{1,3}.\d{1,3}', key) != None:
					line = key
					count += 1
			elif count == 1:
				if re.search('\d{4}-\d{2}-\d{2}', key) != None:
					line += '\t' + key
					count += 1
			elif count == 2:
				if re.search('\d{2}:\d{2}:\d{2}', key) != None:
					line += '\t' + key
					info.append(line)
					line = ''
					count = 0

	info.append('-' * 89)
	display_fluxAccount_onlineState(info);
	return info

def flux_detail_query(info, response):
	info.append('**每日流量使用统计列表**')
	info.append('登出日期\t入流量\t出流量')
	done = trans_content(response)
	year, month = time.strftime('%Y %m').split(' ')
	days = get_days(int(year), int(month))
	tlist = done.split(' ')

	fluxin_perday = [0 for i in range(days)]
	fluxout_perday = [0 for i in range(days)]
	offline_date = True
	count = 0

	for key in tlist:
		if re.search('\d{4}-\d{2}-\d{2}', key) and offline_date:
			offline_date = False
		elif re.search('\d{4}-\d{2}-\d{2}', key) and not offline_date:
			offline_date = True
			year, month, day = key.split('-')
			iday = int(day)
		elif re.search('\d+[.]\d*[BKMG]', key) and count == 0:
			fluxin_perday[iday-1] += solve_flux(key)
			count += 1
		elif re.search('\d+[.]\d*[BKMG]', key) and count == 1:
			fluxout_perday[iday-1] += solve_flux(key)
			count += 1
		elif re.search('\d+[.]\d*[BKMG]', key) and count == 2:
			count = 0
	
	for i in range(days):
		if i + 1 < 10:
			d = '0' + str(i + 1)
		else:
			d = str(i + 1)
		info.append('%s\t%s\t%s' %(time.strftime('%Y-%m-') + d, str(fluxin_perday[i]) + 'MB', str(fluxout_perday[i]) + 'MB'))

	display_flux_detail(fluxin_perday, int(year), int(month), days)
	return info

#########################################################
#						Main Part						#
#########################################################

def tunet_query(username, password):
	print('FETCHING DATA FROM http://usereg.tsinghua.edu.cn, PLEASE WAIT A MOMENT...')
	is_login = query_login(username, password)
	if is_login:
		info = []
		info = post_process(info)
		save_query(info)
		query_logout()
	else:
		print ('CAN\'T CAPTURE YOUR FLUX DATA, PLEASE TRY AGAIN LATER')

def pytunet_query():
	username = 'hhy14'
	password = '123456'
	tunet_query(username, password)

if __name__ == '__main__':
	pytunet_query()


pytunet_connect.py:

import time
import urllib.request, hashlib
import codecs

login_url  = 'http://net.tsinghua.edu.cn/cgi-bin/do_login'
logout_url = 'http://net.tsinghua.edu.cn/cgi-bin/do_logout'
check_url  = 'http://net.tsinghua.edu.cn/cgi-bin/do_login'
query_url  = 'https://usereg.tsinghua.edu.cn/login.php'

times_cnt = {1: 'FIRST', 2: 'SECOND', 3: 'THIRD', 4: 'FORTH', 5: 'FIFTH'}
ret_type  = {'logout_ok'       : 'LOGOUT SUCCESS',
			'not_online_error' : 'NOT ONLINE',
			'ip_exist_error'   : 'IP ALREADY EXISTS',
			'user_tab_error'   : 'THE CERTIFICATION PROGRAM WAS NOT STARTED',
			'username_error'   : 'WRONG USERNAME',
			'user_group_error' : 'ACCOUNT INFOMATION INCORRECT',
			'password_error'   : 'WRONG PASSWORD',
			'status_error'     : 'ACCOUNT OVERDUE, PLEASE RECHARGE',
			'available_error'  : 'ACCOUNT HAS BEEN SUSPENDED',
			'delete_error'     : 'ACCOUNT HAS BEEN DELETED',
			'usernum_error'    : 'USERS NUMBER LIMITED',
			'online_num_error' : 'USERS NUMBER LIMITED',
			'mode_error'       : 'DISABLE WEB REGISTRY',
			'time_policy_error': 'CURRENT TIME IS NOT ALLOWED TO CONNECT',
			'flux_error'       : 'FLUX OVER',
			'ip_error'         : 'IP NOT VALID',
			'mac_error'        : 'MAC NOT VALID',
			'sync_error'       : 'YOUR INFOMATION HAS BEEN MODIFIED, PLEASE TRY AGAIN AFTER 2 MINUTES',
			'ip_alloc'         : 'THE IP HAS BEEN ASSIGNED TO OTHER USER'
			}

version  = '1.1'
sleep_time = 8

def trans_content(response):
	content = response.read().decode()
	ret = ''
	for ch in content:
		if ch.isalpha() or ch == '_':
			ret += ch
	return ret

def tunet_login(username, password):
	hashcd_md5 = hashlib.md5()
	hashcd_md5.update(password.encode())
	tr_password = hashcd_md5.hexdigest()
	login_data = 'username=' + username + '&password=' + tr_password + '&drop=0&type=1&n=100'
	login_data = login_data.encode()
	request_url = urllib.request.Request(login_url, login_data)
	response_url = urllib.request.urlopen(request_url)
	ret = trans_content(response_url)
	print (ret_type.get(ret, 'CONNECTED'))
	return ret

def tunet_logout():
	response_url = urllib.request.urlopen(logout_url)
	ret = trans_content(response_url)
	print (ret_type.get(ret, 'CONNECTED'))
	return ret

def tunet_check():
	check_data = 'action=check_online'
	check_data = check_data.encode()
	request_url = urllib.request.Request(check_url, check_data)
	response_url = urllib.request.urlopen(request_url)
	ret = trans_content(response_url)
	if ret == '':
		print ('NOT ONLINE')
	else:
		print (ret_type.get(ret, 'CONNECTED'))
	return ret

def tunet_help():
	print ('-h, --help   : show all options of Tsinghua University Internet Connector')
	print ('-v, --version: show version of Tsinghua University Internet Connector')
	print ('-u           : input your username after \'-u\'')
	print ('-p           : input your password after \'-p\'')
	print ('-a           : enter username and password later, you can login other campus network account')
	print ('-i, --login  : login operation')
	print ('-o, --logout : logout operation')
	print ('-c, --check  : check the internet')
	print ('-q, --query  : query basic infomation, online state and flux usage details')

def tunet_version():
	print ('Tsinghua University Internet Connector ', version)

def tunet_others():
	print ('UNKNOWN OPTIONS')
	print ('WHICH OPTION DO YOU WANT?')
	tunet_help()
	print ('IF ANY ERROR, PLEASE CONTACT im@huhaoyu.com.')

def tunet_connect(username, password):

	ret = 'ip_exist_error'
	for count in range(5):
		print ('%s attempts to connect...' % times_cnt.get(count + 1))
		if ret != tunet_login(username, password):
			break
		if count == 4:
			print ('please try to reconnect after 1 minute')
			break
		print ('try to reconnect after %s seconds' %sleep_time)
		time.sleep(sleep_time)
		print ()

pytunet.py:

#########################################################
#						Welcome							#
#	Tsinghua University Internet Connector in Python	#
#					 Version: v1.1 						#
#					Date: 2015/04/03					#
#			By: Haoyu hu	Email: im@huhaoyu.com		#
#			Address: Tsinghua University				#
#########################################################

import pytunet_connect
import pytunet_query
import sys, getopt, getpass

def pytunet():

	username = 'hhy14'
	password = '123456'
	try:
		options, args = getopt.getopt(sys.argv[1:], 'achiop:qu:v', ['help', 'login', 'logout', 'check', 'version', 'query'])
	except getopt.GetoptError:
		pytunet_connect.tunet_others()
		sys.exit(1)

	want_login = False
	want_query = False
	flag = False

	name, value = None, None

	for name, value in options:
		if name in ('-h', '--help'):
			pytunet_connect.tunet_help()
			sys.exit(0)
		elif name in ('-v', '--version'):
			pytunet_connect.tunet_version()
			sys.exit(0)
		elif name == '-a':
			flag = True
		elif name == '-u':
			username = value
		elif name == '-p':
			password = value
		elif name in ('-i', '--login'):
			want_login = True
		elif name in ('-o', '--logout'):
			pytunet_connect.tunet_logout()
			sys.exit(0)
		elif name in ('-c', '--check'):
			pytunet_connect.tunet_check()
			sys.exit(0)
		elif name in ('-q', '--query'):
			want_query = True

	if flag:
		username = input('username: ')
		password = getpass.getpass('password: ')
		print (password)

	if want_query:
		pytunet_query.tunet_query(username, password)

	if want_login:
		pytunet_connect.tunet_connect(username, password)

	if not want_query and not want_login:
		print ('WARNING: YOU JUST DIDN\'T DO ANYTHING! IF YOU WANT TO CONNECT TO THE CAMPUS NETWORK, THE COMMAND MUST INCLUDE -i OR --login')
		print()
		pytunet_connect.tunet_help()

if __name__ == '__main__':
	pytunet()

代码或分析有误请批评指正,谢谢!


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值