抓取题目描述生成解题报告模板 python

最新推荐文章于 2024-09-25 15:24:27 发布

olahiuj

最新推荐文章于 2024-09-25 15:24:27 发布

阅读量326

点赞数

分类专栏： c++ 文章标签：爬虫 python 正则表达式题目描述洛谷

本文链接：https://blog.csdn.net/jpwang8/article/details/54851489

版权

c++ 专栏收录该内容

1073 篇文章 0 订阅

订阅专栏

BGround

偷懒是人类进步的阶梯！！
瞎搞是自然选择的必然！！
这绝对是我写过最丑的python代码，但是的确能用，忍忍看吧

Code

# -*- coding: utf-8 -*-

import requests, time, re
import HTMLParser

def getPage(html, url, headers, params = {}, timeout = 5, verify = True):
    response = html.get(url = url, headers = headers, params = params, timeout = timeout, verify = verify)
    page = response.content
    return page

def search(string, page, flags = 0):
    pattern = re.compile(string, flags = flags)
    results = re.findall(pattern, page)
    return results

def format(string):
    results = HTMLParser.HTMLParser().unescape(re.compile(r'<[^>]+>', re.S).sub('', re.sub('</strong>|</h2>', '\n---\n', re.sub('<strong>|<h2>', '\n##', string.decode('UTF-8')))))
    return results

def main():
    html = requests.session()
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.22 Safari/537.36 SE 2.X MetaSr 1.0'}
    probid = raw_input('id: ')
    url = 'https://www.luogu.org/problem/show?pid=%s#sub' % probid
    PD = search(r"<div class=\"lg-article am-g\">(.+?)<div class=\'lg-article-sub am-g\' id=\"sub\"></div>", getPage(html, url, headers), flags = re.S)[0]
    with open('/home/olahiuj/文档/progs/oi/luogu%s.cpp' % probid, 'r') as file:
        content = '''
        %s

## Analysis
---


## Code
---
` ``
%s
` ``
        ''' % (re.sub('    ', '', format(PD).encode('UTF-8')), file.read().encode('UTF-8'))
    with open('/home/olahiuj/文档/LPD/luogu%s.md' % probid, 'w') as file:
        file.write(content)
    print 'Problem %s is Done!' % probid

if __name__ == '__main__':
    main()