目录
1. 获取leetcode题目信息
由于手机端访问leetcdoe使用起来不是很方便,没法快速查到leetcode的每日一题,所以想着写一个爬虫来爬取每日一题并以邮件的形式发送到个人的邮箱。下面记录了相应的历程。
首先查看下需要爬取的内容,如下图,我们需要爬取方框中所选的内容,包括题名、内容、难易度等。
首先考虑到这个每日一题肯定不是硬编码在网页内容内而是动态获取的,ok,chrome开发工具开始干活,选择network-->xhr, 刷新之后发现出现大量graphql。GraphQL 既是一种用于 API 的查询语言也是一个满足你数据查询的运行时。 GraphQL 对你的 API 中的数据提供了一套易于理解的完整描述,使得客户端能够准确地获得它需要的数据,而且没有任何冗余,也让 API 更容易地随着时间推移而演进,还能用于构建强大的开发者工具。所以这就是获取数据的api:
当看到这里时,就知道这个就是在获取每日一题的api了,下面就是编写请求头,传入相应的请求数据了。
# 请求头
headers = {"accept": "*/*", "accept-encoding": "gzip, deflate, br", "accept-language":
"zh-CN,zh-TW;q=0.9,zh;q=0.8,en-US;q=0.7,en;q=0.6", "content-type": "application/json",
"origin": "https://leetcode-cn.com", "referer": "https://leetcode-cn.com/problemset/all/",
"sec-fetch-dest": "empty", "sec-fetch-mode": "cors", "sec-fetch-site": "same-origin",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36",
"x-csrftoken": "LsbG2DbnzPiovVWQU87oGrGpIRBo1NfcCIJV90lOOoSs76R6cT3ODnqjaNqwMbGj"}
# 发送的数据,graphql
postdata = {"operationName": "questionOfToday", "variables": {},
"query": "query questionOfToday {\n todayRecord {\n question {\n questionFrontendId\n"
" questionTitleSlug\n __typename\n }\n lastSubmission {\n id\n"
" __typename\n }\n date\n userStatus\n __typename\n }\n}\n"}
# 编码成json标准格式
postdata = json.dumps(postdata)
这里需要注意的是,在传入数据之间和返回数据之后都要通过json来进行以下相应的编码与解码。通过返回的结果可以获得每日一题的题目,接下来就是通过题目来获取整个题目的信息,获取方式类似。
problem_name = res['data']['todayRecord'][0]['question']['questionTitleSlug']
postdata = {"operationName": "questionData", "variables": {"titleSlug": problem_name},
"query": "query questionData($titleSlug: String!) {\n question(titleSlug: $titleSlug) "
"{\n questionId\n questionFrontendId\n boundTopicId\n title\n titleSlug\n"
" content\n translatedTitle\n translatedContent\n isPaidOnly\n difficulty\n"
" likes\n dislikes\n isLiked\n similarQuestions\n contributors {\n"
" username\n profileUrl\n avatarUrl\n __typename\n }\n"
" langToValidPlayground\n topicTags {\n name\n slug\n translatedName\n"
" __typename\n }\n companyTagStats\n codeSnippets {\n lang\n langSlug\n"
" code\n __typename\n }\n stats\n hints\n solution {\n id\n"
" canSeeDetail\n __typename\n }\n status\n sampleTestCase\n metaData\n"
" judgerAvailable\n judgeType\n mysqlSchemas\n enableRunCode\n envInfo\n book {\n"
" id\n bookName\n pressName\n source\n shortDescription\n"
" fullDescription\n bookImgUrl\n pressImgUrl\n productUrl\n __typename\n"
" }\n isSubscribed\n isDailyQuestion\n dailyRecordStatus\n editorType\n"
" ugcQuestionId\n style\n __typename\n }\n}\n"}
postdata = json.dumps(postdata)
res = request.post("https://leetcode-cn.com/graphql", headers=headers, data=postdata)
# print(res.content)
res = json.loads(res.content.decode('utf-8'))
下面为了方便邮件显示构造了一段html格式文件:
html_content = '<html><meta charset="UTF-8"><title>{}</title><body>{}</body>' \
'</html>'.format(res['data']['question']['translatedTitle'],
res['data']['question']['translatedContent'])
name = res['data']['question']['translatedTitle']
level = res['data']['question']['difficulty']
至此,题目的获取方面已经完成了,下面就是通过发送邮件的方式将题目发送出去了。
2.python操作发送邮箱
此次python操作发送邮箱主要是通过smtplib和email两个库完成的。
构建一个邮箱对象就是构建一个Message对象,MIMEText表示一个文本邮件对象,MIMEImage表示一个图片,要把多个对象组合起来,就需要用MIMEMultipart对象,MIMEBase表示任何对象。
通常用‘text/plain’和'text/html'处理文本对象,第一个参数是邮件正文,第二个参数是MIME的subtype。
# 例如:1. 添加普通文本
text = 'Hi! hello world'
text_plain = MIMEText(text, 'plain', 'utf-8')
# 2.添加超文本
html = """
<html>
<body>
<p> hello world!</p>
</body>
</html>
"""
text_html = MIMEText(html, 'html', 'utf-8')
# 3.添加附件
# a.
sendfile = open(r"./test.txt", 'rb').read()
text_att = MIMEText(sendfile, 'base64', 'utf-8') # 使用base64编码
text_att['Content-Type'] = 'application/octet-stream'
text_att['Content-Disposition'] = 'attachment; filename="显示的名字.txt"'
# b.
filename = 'words.txt'
with open(filename, 'r') as f:
part = MIMEApplication(f.read(), name=basename(filename))
part['content-Disposition'] = 'attachment; filename={}'.format(basename(filename))
msg.attach(part)
# 4.图片
# 图片显示在正文
sendfile = open(r"./test.jpg", 'rb').read()
image = MIMEImage(sendfile)
# 定义图片ID
image.add_header('Content-ID', '<image1>')
# 图片显示在附件
mime = MIMEBase('image', 'image', filename=image)
mime.add_header('Content-Disposition', 'attachment', filename=image)
mime.set_payload(f.read())
encoders.encode_base64(mime)
message.attach(mime)
MIMEMulpart说明:常见的mulpart类型有三种,mulpart/alternative包括纯文本正文(text/plain)和超文本正文(text/html), mulpart/related包括图片,声音等内嵌资源; mulpart/mixed包含附件。三种类型向上兼容。
将Subject,添加到MIMEText或者MIMEMulpart对象中,才能显示主题,发送人,收件人,时间。下面进行一个简单的发送邮件的测试程序。
class EmailDemo:
def __init__(self): # 初始化操作
self.smtp = smtplib.SMTP()
self.smtp.connect('smtp.qq.com', 25) # 链接邮件服务器
self.smtp.login("xxxx@qq.com", "xxxx") # 登录, 替换为自己的邮箱地址和授权码
def sendMsg(self, from_addr, to_addr, name, level, content, type='html', encode='utf-8', attach=None):
if attach is None:
msg = MIMEText(content, type, encode)
else:
msg = MIMEMultipart('mixed')
text = MIMEText(content, type, encode)
msg.attach(text)
# 添加图片
sendimagefile = open(r'test.jpg', 'rb').read()
image = MIMEImage(sendimagefile)
image.add_header('Content-ID', '<image1>')
# image["Content-Disposition"] = 'attachment; filename="testimage.jpg"' 以附件形式出现
msg.attach(image)
msg['Subject'] = name + time.strftime("(%Y-%m-%d)", time.localtime()) + ',' + level # 邮件主题
msg['From'] = from_addr
msg['To'] = to_addr
self.smtp.sendmail(from_addr, to_addr, msg.as_string()) # 发送邮件
if __name__ == "__main__":
email = EmailDemo()
email.sendMsg("xxxx@qq.com", 'xxxx@163.com', 'null name', 'meidum', '<p><img src="cid:image1"></p>',
"html", attach="yes")
3.源码:
import request
import json
import emailDemo
"""
自动爬取leetcode的每日一题,并且以邮件的形式发送到网易邮箱
"""
# 请求头
headers = {"accept": "*/*", "accept-encoding": "gzip, deflate, br", "accept-language":
"zh-CN,zh-TW;q=0.9,zh;q=0.8,en-US;q=0.7,en;q=0.6", "content-type": "application/json",
"origin": "https://leetcode-cn.com", "referer": "https://leetcode-cn.com/problemset/all/",
"sec-fetch-dest": "empty", "sec-fetch-mode": "cors", "sec-fetch-site": "same-origin",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36"}
# 发送的数据,graphql
postdata = {"operationName": "questionOfToday", "variables": {},
"query": "query questionOfToday {\n todayRecord {\n question {\n questionFrontendId\n"
" questionTitleSlug\n __typename\n }\n lastSubmission {\n id\n"
" __typename\n }\n date\n userStatus\n __typename\n }\n}\n"}
# 编码成json标准格式
postdata = json.dumps(postdata)
res = request.post("https://leetcode-cn.com/graphql", headers=headers, data=postdata)
# print(res.text)
# 解码json数据
res = json.loads(res.text)
# 题目的名字
problem_name = res['data']['todayRecord'][0]['question']['questionTitleSlug']
postdata = {"operationName": "questionData", "variables": {"titleSlug": problem_name},
"query": "query questionData($titleSlug: String!) {\n question(titleSlug: $titleSlug) "
"{\n questionId\n questionFrontendId\n boundTopicId\n title\n titleSlug\n"
" content\n translatedTitle\n translatedContent\n isPaidOnly\n difficulty\n"
" likes\n dislikes\n isLiked\n similarQuestions\n contributors {\n"
" username\n profileUrl\n avatarUrl\n __typename\n }\n"
" langToValidPlayground\n topicTags {\n name\n slug\n translatedName\n"
" __typename\n }\n companyTagStats\n codeSnippets {\n lang\n langSlug\n"
" code\n __typename\n }\n stats\n hints\n solution {\n id\n"
" canSeeDetail\n __typename\n }\n status\n sampleTestCase\n metaData\n"
" judgerAvailable\n judgeType\n mysqlSchemas\n enableRunCode\n envInfo\n book {\n"
" id\n bookName\n pressName\n source\n shortDescription\n"
" fullDescription\n bookImgUrl\n pressImgUrl\n productUrl\n __typename\n"
" }\n isSubscribed\n isDailyQuestion\n dailyRecordStatus\n editorType\n"
" ugcQuestionId\n style\n __typename\n }\n}\n"}
postdata = json.dumps(postdata)
res = request.post("https://leetcode-cn.com/graphql", headers=headers, data=postdata)
# print(res.content)
res = json.loads(res.content.decode('utf-8'))
# 构造html内容
html_content = '<html><meta charset="UTF-8"><title>{}</title><body>{}</body>' \
'</html>'.format(res['data']['question']['translatedTitle'],
res['data']['question']['translatedContent'])
name = res['data']['question']['translatedTitle']
level = res['data']['question']['difficulty']
# print(html_content)
# with open("test.html", "wb") as f:
# f.write(html_content.encode('utf-8'))
# 发送邮件
email = emailDemo.EmailDemo()
email.sendMsg("xxxx@qq.com", "xxxx@163.com", name, level, html_content)