本文按照下列项目来进行说明。
mysite2
- manage.py
- mysite2
- app01
1、打开今日头条,对网页进行分析并爬取
获取请求URL
分析网站的数据来源后。
开始构造headers,对及今日头条进行爬取,并把数据JSON格式化。
其中的Url对应的就是当前新闻内容的网址,Title对应的就是新闻的标题。
{"data":[
{
"ClusterId":7072942452532842023,
"Title":"沙特和阿联酋领导人拒接拜登电话",
"LabelUrl":"https://p26.toutiaoimg.com/img/mosaic-legacy/2b29200041b9c651e8148~cs_noop.png",
"Label":"hot",
"Url":"https://www.toutiao.com/amos_land_page/?category_name=topic_innerflow\u0026event_type=hot_board\u0026log_pb=%7B%22category_name%22%3A%22topic_innerflow%22%2C%22cluster_type%22%3A%2210%22%2C%22enter_from%22%3A%22click_category%22%2C%22entrance_hotspot%22%3A%22outside%22%2C%22event_type%22%3A%22hot_board%22%2C%22hot_board_cluster_id%22%3A%227072942452532842023%22%2C%22hot_board_impr_id%22%3A%222022030918321201021216216025C743EE%22%2C%22jump_page%22%3A%22hot_board_page%22%2C%22location%22%3A%22news_hot_card%22%2C%22page_location%22%3A%22hot_board_page%22%2C%22rank%22%3A%221%22%2C%22source%22%3A%22trending_tab%22%2C%22style_id%22%3A%2240132%22%2C%22title%22%3A%22%E6%B2%99%E7%89%B9%E5%92%8C%E9%98%BF%E8%81%94%E9%85%8B%E9%A2%86%E5%AF%BC%E4%BA%BA%E6%8B%92%E6%8E%A5%E6%8B%9C%E7%99%BB%E7%94%B5%E8%AF%9D%22%7D\u0026rank=1\u0026style_id=40132\u0026topic_id=7072942452532842023",
"HotValue":"6753999",
"Schema":"",
"LabelUri":{
"uri":"mosaic-legacy/2b29200041b9c651e8148",
"url":"https://p26.toutiaoimg.com/img/mosaic-legacy/2b29200041b9c651e8148~cs_noop.png",
"width":200,
"height":200,
"url_list":[
{"url":"https://p26.toutiaoimg.com/img/mosaic-legacy/2b29200041b9c651e8148~cs_noop.png"},
{"url":"https://p3.toutiaoimg.com/img/mosaic-legacy/2b29200041b9c651e8148~cs_noop.png"},
{"url":"https://p9.toutiaoimg.com/img/mosaic-legacy/2b29200041b9c651e8148~cs_noop.png"}
],
"image_type":1
},
"ClusterIdStr":"7072942452532842023",
"ClusterType":10,
"QueryWord":"沙特和阿联酋领导人拒接拜登电话",
"InterestCategory":["international"],
"Image":{
"uri":"tos-cn-i-qvj2lq49k0/a7e3f7e3e8c04c37bc7f88b2340ab999",
"url":"https://p6.toutiaoimg.com/img/tos-cn-i-qvj2lq49k0/a7e3f7e3e8c04c37bc7f88b2340ab999~cs_noop.png",
"width":0,
"height":0,
"url_list":[
{"url":"https://p6.toutiaoimg.com/img/tos-cn-i-qvj2lq49k0/a7e3f7e3e8c04c37bc7f88b2340ab999~cs_noop.png"},
{"url":"https://p9.toutiaoimg.com/img/tos-cn-i-qvj2lq49k0/a7e3f7e3e8c04c37bc7f88b2340ab999~cs_noop.png"},
{"url":"https://p3.toutiaoimg.com/img/tos-cn-i-qvj2lq49k0/a7e3f7e3e8c04c37bc7f88b2340ab999~cs_noop.png"}
],
"image_type":1
},
"LabelDesc":"热门事件"
},
{
},
2、在app01/views.py文件中添加一个函数用来爬取新闻并进行展示
#爬取今日头条的头条热榜,进行展示并附加链接
def news(req):
url = 'https://www.toutiao.com/hot-event/hot-board/?origin=toutiao_pc&_signature=_02B4Z6wo00f01yG9tdQAAIDCQrd1vxaJp9chmbFAAKpR4Dqk0c56dkhdlvNsoD3I03ygIjgUcxkM0VcFYKfO0a9iJRjnl1M9yxZvlq-pgzUXDOrpi1wKoYlCVC9.llzChJ7GmTYXIDMvE.c1a6'
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9", }
res = requests.get(url=url, headers=headers)
data_all_dict = res.json()
data_lists = dict(data_all_dict)['data']
return render(
req,
'news.html',
{
"news_dicts":data_lists
}
)
3、在app01/templates文件夹下新建一个news.html文件
其中style='text-decoration:none;color:black' ,作用是去掉超链接的下划线,并让超链接的颜色变成黑色。再使用Django的模板技术,对新闻字典进行遍历输出。
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Title</title>
</head>
<body>
<h1>今日头条</h1>
<ul>
{% for news in news_dicts %}
<li>
<a style='text-decoration:none;color:black' href = {{news.Url}} target="_blank">{{ news.Title }}</a><br>
</li>
{% endfor %}
</ul>
</body>
</html>
4、在mysite2/urls.py文件中构造url和函数的链接关系
path('news/',views.news)
5、启动服务python manage.py runserver 0.0.0.0:8000,在浏览器中输入http://127.0.0.1:8000/news/,查看是否成功。