Django项目实践（爬取今日头条的头条热榜）

最新推荐文章于 2024-04-26 15:23:45 发布

夕阳之后的黑夜

最新推荐文章于 2024-04-26 15:23:45 发布

阅读量2.2k

点赞数

分类专栏： Django 文章标签： python 开发语言

本文链接：https://blog.csdn.net/qq_41048761/article/details/123441543

版权

Django 专栏收录该内容

15 篇文章

订阅专栏

本文按照下列项目来进行说明。

mysite2

- manage.py

- mysite2

- app01

1、打开今日头条，对网页进行分析并爬取

获取请求URL

分析网站的数据来源后。

开始构造headers，对及今日头条进行爬取，并把数据JSON格式化。

其中的Url对应的就是当前新闻内容的网址，Title对应的就是新闻的标题。

{"data":[
                {
                    "ClusterId":7072942452532842023,
                    "Title":"沙特和阿联酋领导人拒接拜登电话",
                    "LabelUrl":"https://p26.toutiaoimg.com/img/mosaic-legacy/2b29200041b9c651e8148~cs_noop.png",
                    "Label":"hot",
                    "Url":"https://www.toutiao.com/amos_land_page/?category_name=topic_innerflow\u0026event_type=hot_board\u0026log_pb=%7B%22category_name%22%3A%22topic_innerflow%22%2C%22cluster_type%22%3A%2210%22%2C%22enter_from%22%3A%22click_category%22%2C%22entrance_hotspot%22%3A%22outside%22%2C%22event_type%22%3A%22hot_board%22%2C%22hot_board_cluster_id%22%3A%227072942452532842023%22%2C%22hot_board_impr_id%22%3A%222022030918321201021216216025C743EE%22%2C%22jump_page%22%3A%22hot_board_page%22%2C%22location%22%3A%22news_hot_card%22%2C%22page_location%22%3A%22hot_board_page%22%2C%22rank%22%3A%221%22%2C%22source%22%3A%22trending_tab%22%2C%22style_id%22%3A%2240132%22%2C%22title%22%3A%22%E6%B2%99%E7%89%B9%E5%92%8C%E9%98%BF%E8%81%94%E9%85%8B%E9%A2%86%E5%AF%BC%E4%BA%BA%E6%8B%92%E6%8E%A5%E6%8B%9C%E7%99%BB%E7%94%B5%E8%AF%9D%22%7D\u0026rank=1\u0026style_id=40132\u0026topic_id=7072942452532842023",
                    "HotValue":"6753999",
                    "Schema":"",
                    "LabelUri":{
                        "uri":"mosaic-legacy/2b29200041b9c651e8148",
                        "url":"https://p26.toutiaoimg.com/img/mosaic-legacy/2b29200041b9c651e8148~cs_noop.png",
                        "width":200,
                        "height":200,
                        "url_list":[
                            {"url":"https://p26.toutiaoimg.com/img/mosaic-legacy/2b29200041b9c651e8148~cs_noop.png"},
                            {"url":"https://p3.toutiaoimg.com/img/mosaic-legacy/2b29200041b9c651e8148~cs_noop.png"},
                            {"url":"https://p9.toutiaoimg.com/img/mosaic-legacy/2b29200041b9c651e8148~cs_noop.png"}
                        ],
                        "image_type":1
                        },
                    "ClusterIdStr":"7072942452532842023",
                    "ClusterType":10,
                    "QueryWord":"沙特和阿联酋领导人拒接拜登电话",
                    "InterestCategory":["international"],
                    "Image":{
                        "uri":"tos-cn-i-qvj2lq49k0/a7e3f7e3e8c04c37bc7f88b2340ab999",
                        "url":"https://p6.toutiaoimg.com/img/tos-cn-i-qvj2lq49k0/a7e3f7e3e8c04c37bc7f88b2340ab999~cs_noop.png",
                        "width":0,
                        "height":0,
                        "url_list":[
                            {"url":"https://p6.toutiaoimg.com/img/tos-cn-i-qvj2lq49k0/a7e3f7e3e8c04c37bc7f88b2340ab999~cs_noop.png"},
                            {"url":"https://p9.toutiaoimg.com/img/tos-cn-i-qvj2lq49k0/a7e3f7e3e8c04c37bc7f88b2340ab999~cs_noop.png"},
                            {"url":"https://p3.toutiaoimg.com/img/tos-cn-i-qvj2lq49k0/a7e3f7e3e8c04c37bc7f88b2340ab999~cs_noop.png"}
                            ],
                        "image_type":1
                        },
                    "LabelDesc":"热门事件"
                },
                {
                    
                }，

2、在app01/views.py文件中添加一个函数用来爬取新闻并进行展示

#爬取今日头条的头条热榜，进行展示并附加链接
def news(req):
    url = 'https://www.toutiao.com/hot-event/hot-board/?origin=toutiao_pc&_signature=_02B4Z6wo00f01yG9tdQAAIDCQrd1vxaJp9chmbFAAKpR4Dqk0c56dkhdlvNsoD3I03ygIjgUcxkM0VcFYKfO0a9iJRjnl1M9yxZvlq-pgzUXDOrpi1wKoYlCVC9.llzChJ7GmTYXIDMvE.c1a6'
    headers = {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9", }
    res = requests.get(url=url, headers=headers)
    data_all_dict = res.json()
    data_lists = dict(data_all_dict)['data']
    return render(
        req, 
        'news.html', 
        {
            "news_dicts":data_lists
        }
    )

3、在app01/templates文件夹下新建一个news.html文件

其中style='text-decoration:none;color:black' ，作用是去掉超链接的下划线，并让超链接的颜色变成黑色。再使用Django的模板技术，对新闻字典进行遍历输出。

<html lang="en">
    <head>
        <meta charset="UTF-8">
        <title>Title</title>
    </head>
    <body>
        <h1>今日头条</h1>
        <ul>
            {% for news in news_dicts %}
                <li>
                    <a style='text-decoration:none;color:black' href = {{news.Url}} target="_blank">{{ news.Title }}</a><br>
                </li>
            {% endfor %}
    </ul>
    </body>
</html>

4、在mysite2/urls.py文件中构造url和函数的链接关系