打开http://www.pythonchallenge.com/pc/def/ocr.html:
recognize the characters. maybe they are in the book,
but MAYBE they are in the page source.
赤果果的告诉我们在源代码中:
于是可以用request获取源代码:
import requests
def get_html(url):
r = requests.get(url, timeout = 30)
r.raise_for_status()
r.encoding = r.apparent_encoding
return r.text
if __name__ == '__main__':
txt = get_html('http://www.pythonchallenge.com/pc/def/ocr.html')
txt = txt.lower()
for i in '~!@#$%^&*()_+{}|:<>?`[]\'";,./\n':
txt = txt.replace(i, '')
print(txt)```
htmlhead titleocrtitle link rel=stylesheet type=textcss href=stylecssheadbodycenterimg src=ocrjpgbrfont color=c03000recognize the characters maybe they are in the book brbut maybe they are in the page sourcecenterbrbrbrfont size=-1 color=goldgeneral tipsliuse the hints they are helpful most of the timesliliinvestigate the data given to youliliavoid looking for spoilerslibrforums a href=httpwwwpythonchallengecomforumspython challenge forumsa read before you postbrirc ircfreenodenet pythonchallengebrbrto see the solutions to the previous level replace pc with pcc ie go to httpwwwpythonchallengecompccdefocrhtmlbodyhtml–find rare characters in the mess below----equality–
得到equality, 下一关入口,其实我这里代码不对,但是目前还不清楚如何专门获取注释,没有屏蔽equality之前的内容,希望大神指点(2019/6/31)
-----------------2017/07/21更新---------------------
一步到位提取
import requests
from lxml import etree
def get_html(url):
r = requests.get(url, timeout=30)
r.raise_for_status()
r.encoding = r.apparent_encoding
return r.text
if __name__ == '__main__':
txt1 = get_html('http://www.pythonchallenge.com/pc/def/ocr.html')
response = etree.HTML(txt1)
txt = response.xpath('//comment()')[1]
for i in '~-!@#$%^&*()_+{}|:<>?`[]\'";,./\n':
txt = str(txt).replace(i, '')
print(txt)
直接显示结果:equality