Python challenge game - mission2 第二关_recognize the characters. maybe they are in the bo-CSDN博客

本文链接：https://blog.csdn.net/weixin_44521703/article/details/94156085

打开http://www.pythonchallenge.com/pc/def/ocr.html：

在这里插入图片描述
recognize the characters. maybe they are in the book,
but MAYBE they are in the page source.

赤果果的告诉我们在源代码中：

于是可以用request获取源代码：

import requests


def get_html(url):
    r = requests.get(url, timeout = 30)
    r.raise_for_status()
    r.encoding = r.apparent_encoding
    return r.text

if __name__ == '__main__':
    txt = get_html('http://www.pythonchallenge.com/pc/def/ocr.html')
    txt = txt.lower()
    for i in '~!@#$%^&*()_+{}|:<>?`[]\'";,./\n':
        txt = txt.replace(i, '')
    print(txt)```

htmlhead titleocrtitle link rel=stylesheet type=textcss href=stylecssheadbodycenterimg src=ocrjpgbrfont color=c03000recognize the characters maybe they are in the book brbut maybe they are in the page sourcecenterbrbrbrfont size=-1 color=goldgeneral tipsliuse the hints they are helpful most of the timesliliinvestigate the data given to youliliavoid looking for spoilerslibrforums a href=httpwwwpythonchallengecomforumspython challenge forumsa read before you postbrirc ircfreenodenet pythonchallengebrbrto see the solutions to the previous level replace pc with pcc ie go to httpwwwpythonchallengecompccdefocrhtmlbodyhtml–find rare characters in the mess below----equality–

得到equality，下一关入口，其实我这里代码不对，但是目前还不清楚如何专门获取注释，没有屏蔽equality之前的内容，希望大神指点(2019/6/31)

-----------------2017/07/21更新---------------------

一步到位提取

import requests
from lxml import etree


def get_html(url):
    r = requests.get(url, timeout=30)
    r.raise_for_status()
    r.encoding = r.apparent_encoding
    return r.text


if __name__ == '__main__':
    txt1 = get_html('http://www.pythonchallenge.com/pc/def/ocr.html')
    response = etree.HTML(txt1)
    txt = response.xpath('//comment()')[1]
    for i in '~-!@#$%^&*()_+{}|:<>?`[]\'";,./\n':
        txt = str(txt).replace(i, '')
    print(txt)

直接显示结果：equality