python简单速成,一行代码写爬虫

当初让我学python,我是拒绝的,因为我喜欢java,不能你让我学我就去学。但是队友跑路了,甩的摊子我来接,就速成了。
其实java也可以写爬虫,然而我并没有试过,不过这次爬虫需要布置在Django的后台上,但是代码思路都是一样的。
想深入学习建议看《python for informatic》 熟练学习requests ,urllib,urllib2,re模块。
不说废话了,直接切要害,谈速成。(对了,我学Python没几天,老鸟绕道。)

平时我们上网是通过浏览器上的点击、输入来向服务器发出请求、传输消息。
代码实现的爬虫其实原理就是直接对网站的服务器发出请求、传输消息。网站是通过ip和cookie来判断用户是谁的。cookie在保持登陆状态需要用到
最基本的爬虫
import urllib2
print  urllib2.urlopen('https://msdn.microsoft.com/magazine/default.aspx').read()
#没错,如果不算导入模块的话,只有一行代码。
#中文乱码的话,在windows下需要用到字码转型decode().encode(),
#decode()是解码,根据所爬网站的编码能看到对应的编码格式
#encode()是编码 即read().decode('gbk').encode('utf-8')
一般情况下直接爬下来的,是网页的源码,大多数网页都是html的框架,标签语言很适合用正则re来提取所需信息。不会正则建议自己去看一下。
但是基本上re.findall(pattern,string)方法够用,pattern是目标信息的大体格式,string是被检测字符。

接下来是模拟登陆。
模拟登陆、以及网上抢单、抢票一类的爬虫是好实现的,关键在于两点,抓包分析传输的数据和用机器学习来破解验证码(这个我还没做完,做完再更新)。
模拟登陆 ,无非是post ,用抓包软件看一下Post 的链接和表单里的内容。windows下Fiddler还是很好用的。(传送门:http://www.telerik.com/fiddler)
Fiddler教程(http://kb.cnblogs.com/page/130367/)十分钟看完
爬虫思路
# -*- coding:utf-8 -*-
#authonr : Max

import urllib2
import urllib
import cookielib
import re

#登录所需的url
url = 'http://202.195.144.163/jndx/default5.aspx'
filename = 'cookie.txt'
#新建cookie来保存登录状态
cookie = cookielib.MozillaCookieJar(filename)
#建立opener 相当于一个浏览器
#调用urllib2.HTTPCookieProcessor()处理cookie
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie))
account = '1070414532'
password = '511623191111111111'

#看完fiddler的教程就知道headers data的作用了,大体就是服务器识别,post传输信息
postdata = {'__VIEWSTATE':'dDwtNTgxODgzNDk1O3Q8O2w8aTwxPjs+O2w8dDw7bDxpPDQ+Oz47bDx0PHA8O3A8bDxvbmNsaWNrOz47bDx3aW5kb3cuY2xvc2UoKVw7Oz4+Pjs7Pjs+Pjs+Pjs+0L9OGiPtTSMlqZUfLGSIwTyi9hc=',
            'TextBox1':account,'TextBox2':password,'RadioButtonList1':'ѧ��','Button1':''}
headers = {'Connection': 'keep-alive','User-Agent':'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1;Miser Report)'}

data = urllib.urlencode(postdata)
request = urllib2.Request(url,data,headers)
response = opener.open(request).read().decode('gb2312').encode('utf-8')
print response
pattern = '<span id="xhxm">.*?  (.*?)同学</span></em>'
name = re.findall(pattern,response)
print name[0].decode('utf-8').encode('gb2313')
cookie.save(ignore_discard=True,ignore_expires=True)
for item in cookie:
    print 'Cookie.name='+item.name
    print 'Cookie.value='+item.value

postdata2 = {'__EVENTTARGET':'xqd','__EVENTARGUMENT':'','__VIEWSTATE':'dDwtMTY3ODA2Njg2OTt0PDtsPGk8MT47PjtsPHQ8O2w8aTwxPjtpPDI+O2k8ND47aTw3PjtpPDk+O2k8MTE+O2k8MTM+O2k8MTU+O2k8MjE+O2k8MjM+O2k8MjU+O2k8Mjc+O2k8Mjk+O2k8MzE+Oz47bDx0PHA8cDxsPFRleHQ7PjtsPDEyMDEyLTIwMTMwOz4+Oz47Oz47dDx0PHA8cDxsPERhdGFUZXh0RmllbGQ7RGF0YVZhbHVlRmllbGQ7PjtsPHhuO3huOz4+Oz47dDxpPDM+O0A8MjAxNi0yMDE3OzIwMTUtMjAxNjsyMDE0LTIwMTU7PjtAPDIwMTYtMjAxNzsyMDE1LTIwMTY7MjAxNC0yMDE1Oz4+O2w8aTwwPjs+Pjs7Pjt0PHQ8OztsPGk8MT47Pj47Oz47dDxwPHA8bDxUZXh0Oz47bDzlrablj7fvvJoxMDcwNDE0NTMyOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzlp5PlkI3vvJrlkajnp5Hnvr07Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPOWtpumZou+8mueJqeiBlOe9keW3peeoi+WtpumZojs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w85LiT5Lia77ya6Ieq5Yqo5YyWOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzooYzmlL/nj63vvJroh6rliqjljJYxNDA1Oz4+Oz47Oz47dDw7bDxpPDE+Oz47bDx0PEAwPDs7Ozs7Ozs7Ozs+Ozs+Oz4+O3Q8cDxsPFZpc2libGU7PjtsPG88Zj47Pj47bDxpPDE+Oz47bDx0PEAwPDs7Ozs7Ozs7Ozs+Ozs+Oz4+O3Q8QDA8cDxwPGw8UGFnZUNvdW50O18hSXRlbUNvdW50O18hRGF0YVNvdXJjZUl0ZW1Db3VudDtEYXRhS2V5czs+O2w8aTwxPjtpPDA+O2k8MD47bDw+Oz4+Oz47Ozs7Ozs7Ozs7Pjs7Pjt0PEAwPHA8cDxsPFBhZ2VDb3VudDtfIUl0ZW1Db3VudDtfIURhdGFTb3VyY2VJdGVtQ291bnQ7RGF0YUtleXM7PjtsPGk8MT47aTw0PjtpPDQ+O2w8Pjs+Pjs+Ozs7Ozs7Ozs7Oz47bDxpPDA+Oz47bDx0PDtsPGk8MT47aTwyPjtpPDM+O2k8ND47PjtsPHQ8O2w8aTwwPjtpPDE+O2k8Mj47aTwzPjtpPDQ+O2k8NT47aTw2Pjs+O2w8dDxwPHA8bDxUZXh0Oz47bDzljZXniYfmnLrljp/nkIblj4rlupTnlKjor77nqIvorr7orqE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPOaWueebiuawkTs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8MC41Oz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDwyMS0yMTs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Jm5ic3BcOzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Jm5ic3BcOzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w85peg5pa55ZCROz4+Oz47Oz47Pj47dDw7bDxpPDA+O2k8MT47aTwyPjtpPDM+O2k8ND47aTw1PjtpPDY+Oz47bDx0PHA8cDxsPFRleHQ7PjtsPOi/kOWKqOaOp+WItuezu+e7n+e7vOWQiOiuvuiuoTs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w85r2Y5bqt6b6ZOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDwwLjU7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDIwLTIwOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDwmbmJzcFw7Oz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDwmbmJzcFw7Oz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzml6DmlrnlkJE7Pj47Pjs7Pjs+Pjt0PDtsPGk8MD47aTwxPjtpPDI+O2k8Mz47aTw0PjtpPDU+O2k8Nj47PjtsPHQ8cDxwPGw8VGV4dDs+O2w855S15rCU5o6n5Yi25Y+KUExD57u85ZCI6K6+6K6hOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzotbXlv6Dnm5Y7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDAuNTs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8MDEtMTY7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPCZuYnNwXDs7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPCZuYnNwXDs7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPOaXoOaWueWQkTs+Pjs+Ozs+Oz4+O3Q8O2w8aTwwPjtpPDE+O2k8Mj47aTwzPjtpPDQ+O2k8NT47aTw2Pjs+O2w8dDxwPHA8bDxUZXh0Oz47bDzov4fnqIvmjqfliLbns7vnu5/nu7zlkIjorr7orqE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPOmprOS5heelpTs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8MC41Oz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDwxOS0xOTs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Jm5ic3BcOzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Jm5ic3BcOzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Jm5ic3BcOzs+Pjs+Ozs+Oz4+Oz4+Oz4+O3Q8QDA8cDxwPGw8UGFnZUNvdW50O18hSXRlbUNvdW50O18hRGF0YVNvdXJjZUl0ZW1Db3VudDtEYXRhS2V5czs+O2w8aTwxPjtpPDA+O2k8MD47bDw+Oz4+Oz47Ozs7Ozs7Ozs7Pjs7Pjt0PEAwPHA8cDxsPFBhZ2VDb3VudDtfIUl0ZW1Db3VudDtfIURhdGFTb3VyY2VJdGVtQ291bnQ7RGF0YUtleXM7PjtsPGk8MT47aTw0PjtpPDQ+O2w8Pjs+Pjs+Ozs7Ozs7Ozs7Oz47bDxpPDA+Oz47bDx0PDtsPGk8MT47aTwyPjtpPDM+O2k8ND47PjtsPHQ8O2w8aTwwPjtpPDE+O2k8Mj47aTwzPjtpPDQ+Oz47bDx0PHA8cDxsPFRleHQ7PjtsPDIwMTYtMjAxNzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Mjs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w86L+Q5Yqo5o6n5Yi257O757uf57u85ZCI6K6+6K6hOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzmvZjluq3pvpk7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDAuNTs+Pjs+Ozs+Oz4+O3Q8O2w8aTwwPjtpPDE+O2k8Mj47aTwzPjtpPDQ+Oz47bDx0PHA8cDxsPFRleHQ7PjtsPDIwMTYtMjAxNzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Mjs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w86L+H56iL5o6n5Yi257O757uf57u85ZCI6K6+6K6hOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzpqazkuYXnpaU7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDAuNTs+Pjs+Ozs+Oz4+O3Q8O2w8aTwwPjtpPDE+O2k8Mj47aTwzPjtpPDQ+Oz47bDx0PHA8cDxsPFRleHQ7PjtsPDIwMTYtMjAxNzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Mjs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w85Y2V54mH5py65Y6f55CG5Y+K5bqU55So6K++56iL6K6+6K6hOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzmlrnnm4rmsJE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDAuNTs+Pjs+Ozs+Oz4+O3Q8O2w8aTwwPjtpPDE+O2k8Mj47aTwzPjtpPDQ+Oz47bDx0PHA8cDxsPFRleHQ7PjtsPDIwMTYtMjAxNzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Mjs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w855S15rCU5o6n5Yi25Y+KUExD57u85ZCI6K6+6K6hOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzotbXlv6Dnm5Y7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDAuNTs+Pjs+Ozs+Oz4+Oz4+Oz4+Oz4+Oz4+Oz5ZZFyEFVR8MH9GkWWTFr2SyUKuGg==',
             'xnd':'2016-2017','xqd':'1'}
data2 = urllib.urlencode(postdata2)

url2 = 'http://202.195.144.163/jndx/xskbcx.aspx?xh='+account+'&xm=%D6%DC%BF%C6%D3%F0&gnmkdm=N121603'

headers2 = {'Connection': 'keep-alive','User-Agent':'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1;Miser Report)','Referer':'http://202.195.144.163/jndx/xs_main.aspx?xh=1070414532'}
request2 = urllib2.Request(url2,data2,headers2)
response2 = opener.open(request2)
page = response2.read().decode('gb2312','ignore').encode('utf-8')

pattern = '<td align="Center".*?>(.*?)</td>'
lessons = re.findall(pattern,page)
for item in lessons:
    print item

整理后的代码

# -*- coding:utf-8 -*-
# author :Max

import re
import urllib
import urllib2
import cookielib

class Spider:
    filename = 'cookie.txt'
    cookie = cookielib.MozillaCookieJar(filename)
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie))
    def __init__(self,url,account,password,xnd,xqd):
        self.url = url
        self.account = account
        self.password = password
        self.xnd = xnd
        self.xqd = xqd
    def loginWeb(self):
        headers = {'Connection': 'keep-alive','User-Agent':'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1;Miser Report)'}
        data = urllib.urlencode({'__VIEWSTATE':'dDwtNTgxODgzNDk1O3Q8O2w8aTwxPjs+O2w8dDw7bDxpPDQ+Oz47bDx0PHA8O3A8bDxvbmNsaWNrOz47bDx3aW5kb3cuY2xvc2UoKVw7Oz4+Pjs7Pjs+Pjs+Pjs+0L9OGiPtTSMlqZUfLGSIwTyi9hc=',
            'TextBox1':self.account,'TextBox2':self.password,'RadioButtonList1':'ѧ��','Button1':''})
        request = urllib2.Request(self.url,data,headers)
        try :
            response = Spider.opener.open(request).read().decode('gb2312').encode('utf-8')
            pattern = '<span id="xhxm">.*?  (.*?)同学</span></em>'
            name = re.findall(pattern, response)
            return name[0]
        except urllib2.HTTPError,e:
            print 'HTTPError = '+e.code
            return 'Error'


    def timeTable(self):
        headers = {'Connection': 'keep-alive','User-Agent':'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1;Miser Report)','Referer':'http://202.195.144.163/jndx/xs_main.aspx?xh=1070414532'}
        data = urllib.urlencode({'__EVENTTARGET':'xqd','__EVENTARGUMENT':'','__VIEWSTATE':'dDwtMTY3ODA2Njg2OTt0PDtsPGk8MT47PjtsPHQ8O2w8aTwxPjtpPDI+O2k8ND47aTw3PjtpPDk+O2k8MTE+O2k8MTM+O2k8MTU+O2k8MjE+O2k8MjM+O2k8MjU+O2k8Mjc+O2k8Mjk+O2k8MzE+Oz47bDx0PHA8cDxsPFRleHQ7PjtsPDEyMDEyLTIwMTMwOz4+Oz47Oz47dDx0PHA8cDxsPERhdGFUZXh0RmllbGQ7RGF0YVZhbHVlRmllbGQ7PjtsPHhuO3huOz4+Oz47dDxpPDM+O0A8MjAxNi0yMDE3OzIwMTUtMjAxNjsyMDE0LTIwMTU7PjtAPDIwMTYtMjAxNzsyMDE1LTIwMTY7MjAxNC0yMDE1Oz4+O2w8aTwwPjs+Pjs7Pjt0PHQ8OztsPGk8MT47Pj47Oz47dDxwPHA8bDxUZXh0Oz47bDzlrablj7fvvJoxMDcwNDE0NTMyOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzlp5PlkI3vvJrlkajnp5Hnvr07Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPOWtpumZou+8mueJqeiBlOe9keW3peeoi+WtpumZojs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w85LiT5Lia77ya6Ieq5Yqo5YyWOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzooYzmlL/nj63vvJroh6rliqjljJYxNDA1Oz4+Oz47Oz47dDw7bDxpPDE+Oz47bDx0PEAwPDs7Ozs7Ozs7Ozs+Ozs+Oz4+O3Q8cDxsPFZpc2libGU7PjtsPG88Zj47Pj47bDxpPDE+Oz47bDx0PEAwPDs7Ozs7Ozs7Ozs+Ozs+Oz4+O3Q8QDA8cDxwPGw8UGFnZUNvdW50O18hSXRlbUNvdW50O18hRGF0YVNvdXJjZUl0ZW1Db3VudDtEYXRhS2V5czs+O2w8aTwxPjtpPDA+O2k8MD47bDw+Oz4+Oz47Ozs7Ozs7Ozs7Pjs7Pjt0PEAwPHA8cDxsPFBhZ2VDb3VudDtfIUl0ZW1Db3VudDtfIURhdGFTb3VyY2VJdGVtQ291bnQ7RGF0YUtleXM7PjtsPGk8MT47aTw0PjtpPDQ+O2w8Pjs+Pjs+Ozs7Ozs7Ozs7Oz47bDxpPDA+Oz47bDx0PDtsPGk8MT47aTwyPjtpPDM+O2k8ND47PjtsPHQ8O2w8aTwwPjtpPDE+O2k8Mj47aTwzPjtpPDQ+O2k8NT47aTw2Pjs+O2w8dDxwPHA8bDxUZXh0Oz47bDzljZXniYfmnLrljp/nkIblj4rlupTnlKjor77nqIvorr7orqE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPOaWueebiuawkTs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8MC41Oz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDwyMS0yMTs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Jm5ic3BcOzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Jm5ic3BcOzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w85peg5pa55ZCROz4+Oz47Oz47Pj47dDw7bDxpPDA+O2k8MT47aTwyPjtpPDM+O2k8ND47aTw1PjtpPDY+Oz47bDx0PHA8cDxsPFRleHQ7PjtsPOi/kOWKqOaOp+WItuezu+e7n+e7vOWQiOiuvuiuoTs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w85r2Y5bqt6b6ZOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDwwLjU7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDIwLTIwOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDwmbmJzcFw7Oz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDwmbmJzcFw7Oz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzml6DmlrnlkJE7Pj47Pjs7Pjs+Pjt0PDtsPGk8MD47aTwxPjtpPDI+O2k8Mz47aTw0PjtpPDU+O2k8Nj47PjtsPHQ8cDxwPGw8VGV4dDs+O2w855S15rCU5o6n5Yi25Y+KUExD57u85ZCI6K6+6K6hOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzotbXlv6Dnm5Y7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDAuNTs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8MDEtMTY7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPCZuYnNwXDs7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPCZuYnNwXDs7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPOaXoOaWueWQkTs+Pjs+Ozs+Oz4+O3Q8O2w8aTwwPjtpPDE+O2k8Mj47aTwzPjtpPDQ+O2k8NT47aTw2Pjs+O2w8dDxwPHA8bDxUZXh0Oz47bDzov4fnqIvmjqfliLbns7vnu5/nu7zlkIjorr7orqE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPOmprOS5heelpTs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8MC41Oz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDwxOS0xOTs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Jm5ic3BcOzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Jm5ic3BcOzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Jm5ic3BcOzs+Pjs+Ozs+Oz4+Oz4+Oz4+O3Q8QDA8cDxwPGw8UGFnZUNvdW50O18hSXRlbUNvdW50O18hRGF0YVNvdXJjZUl0ZW1Db3VudDtEYXRhS2V5czs+O2w8aTwxPjtpPDA+O2k8MD47bDw+Oz4+Oz47Ozs7Ozs7Ozs7Pjs7Pjt0PEAwPHA8cDxsPFBhZ2VDb3VudDtfIUl0ZW1Db3VudDtfIURhdGFTb3VyY2VJdGVtQ291bnQ7RGF0YUtleXM7PjtsPGk8MT47aTw0PjtpPDQ+O2w8Pjs+Pjs+Ozs7Ozs7Ozs7Oz47bDxpPDA+Oz47bDx0PDtsPGk8MT47aTwyPjtpPDM+O2k8ND47PjtsPHQ8O2w8aTwwPjtpPDE+O2k8Mj47aTwzPjtpPDQ+Oz47bDx0PHA8cDxsPFRleHQ7PjtsPDIwMTYtMjAxNzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Mjs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w86L+Q5Yqo5o6n5Yi257O757uf57u85ZCI6K6+6K6hOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzmvZjluq3pvpk7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDAuNTs+Pjs+Ozs+Oz4+O3Q8O2w8aTwwPjtpPDE+O2k8Mj47aTwzPjtpPDQ+Oz47bDx0PHA8cDxsPFRleHQ7PjtsPDIwMTYtMjAxNzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Mjs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w86L+H56iL5o6n5Yi257O757uf57u85ZCI6K6+6K6hOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzpqazkuYXnpaU7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDAuNTs+Pjs+Ozs+Oz4+O3Q8O2w8aTwwPjtpPDE+O2k8Mj47aTwzPjtpPDQ+Oz47bDx0PHA8cDxsPFRleHQ7PjtsPDIwMTYtMjAxNzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Mjs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w85Y2V54mH5py65Y6f55CG5Y+K5bqU55So6K++56iL6K6+6K6hOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzmlrnnm4rmsJE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDAuNTs+Pjs+Ozs+Oz4+O3Q8O2w8aTwwPjtpPDE+O2k8Mj47aTwzPjtpPDQ+Oz47bDx0PHA8cDxsPFRleHQ7PjtsPDIwMTYtMjAxNzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Mjs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w855S15rCU5o6n5Yi25Y+KUExD57u85ZCI6K6+6K6hOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzotbXlv6Dnm5Y7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDAuNTs+Pjs+Ozs+Oz4+Oz4+Oz4+Oz4+Oz4+Oz5ZZFyEFVR8MH9GkWWTFr2SyUKuGg==',
             'xnd':self.xnd,'xqd':self.xqd})
        url = 'http://202.195.144.163/jndx/xskbcx.aspx?xh='+self.account+'&xm=%D6%DC%BF%C6%D3%F0&gnmkdm=N121603'
        request = urllib2.Request(url,data,headers)
        response = Spider.opener.open(request).read().decode('gb2312','ignore').encode('utf-8')
        pattern = '<td align="Center".*?>(.*?)</td>'
        lessons = re.findall(pattern, response)
        for item in lessons:
            print item
if __name__ == "__main__":
    print 'try spider'
    url = 'http://202.195.144.163/jndx/default5.aspx'
    account = ''#学号
    password = ''#密码
    xnd = '2016-2017'#学年
    xqd = '1'#学期
    spider = Spider(url,account,password,xnd,xqd)
    name = spider.loginWeb()
    if name == 'Error':
        print 'Web Login Failed'
    else:
        spider.timeTable()
print 'run over'

  • 2
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值