pycharm bs4爬取某瓣Top250通过djano传入数据库

最近学习了一下python觉得挺有意思,想巩固一下所以就有了这篇推文。

新建一个django项目

django-admin startproject HelloWord

在新建的django项目添加app

python manage.py startapp Hello

在django项目中添加app

在django项目中连接数据库

在模型中创建数据库模型

 python manage.py  makemigrations      

#生成相对应的表

 python manage.py   migrate                 

 #将表迁移至数据库

准备工作完成,进入代码环节,首先在app下建立一个文件用来存放代码

将需要的包导入

获取源代码

解析数据

将获取的数据存入数据库

在views添加视图

为视图添加映射

浏览器访问

完整代码附上:

models中的代码

from django.db iport models
​
class DouBan(models.Model):
    MovieName = models.CharField(max_length=1000, verbose_name='电影名称')
    MovieIntroduction = models.CharField(max_length=1000, verbose_name='电影简介')
    MovieScore = models.CharField(max_length=1000, verbose_name='电影评分')
    NumberOfPeople = models.CharField(max_length=1000, verbose_name='评分人数')
    ShortComment = models.CharField(max_length=1000, verbose_name='电影短评')
    Link = models.CharField(max_length=1000, verbose_name='图片连接')

自己建的python文件中的代码

from bs4 import BeautifulSoup
import requests
from polls.models import DouBan
​
​
def chen(start):
    h = requests.get(f'https://movie.douban.com/top250?start={start}&filter=',
                     headers={
                         'Cookie': 'll="118267"; bid=jw15TB8sVUw; _vwo_uuid_v2=D94CBBA27E5366B325CE07C9F52A0221E|cf24c085ef36d3b51b61aa1947842da0; __gads=ID=c4286c505b3ad3c6-22c6e40469ce008c:T=1635591422:RT=1635591422:S=ALNI_MZGBLD2re_6ml_7VGQq8XVtZZvLfQ; __utma=30149280.1023561960.1635591410.1635736050.1637721114.8; __utmc=30149280; __utmz=30149280.1637721114.8.3.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided); __utma=223695111.1501362362.1635591410.1635738021.1637721162.9; __utmc=223695111; __utmz=223695111.1637721162.9.4.utmcsr=douban.com|utmccn=(referral)|utmcmd=referral|utmcct=/; _pk_ref.100001.4cf6=%5B%22%22%2C%22%22%2C1637721163%2C%22https%3A%2F%2Fwww.douban.com%2F%22%5D; ap_v=0,6.0; ct=y; _pk_id.100001.4cf6=b3c785ff79730c1c.1635591410.9.1637721269.1635738021.'
                         ,
                         'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36'
                     })
    soup = BeautifulSoup(h.text, features='lxml')
    scs = soup.select('.grid_view .item')
    for i in scs:
        try:
            pm = i.select_one('.pic em').text
            s = i.select_one('.pic img')
            img = s['src']
            x = i.select('.hd span')[0].text
            xi = i.select('.hd span')[1].text
            # try:
            xin = i.select('.hd span')
            if len(xin) > 2:
                xin = xin[2].text
            else:
                xin = ''
            # except Exception as e:
            #     xin = ' '
            xinxi = f'{x} {xi} {xin}'
            jianjie = i.select_one('.bd p').get_text().replace("\n", "").strip()
            ping = i.select('.star span')[1].text
            pingfen = i.select('.star span')[3].text
            duanping = i.select('.quote span')[0].text
            douban = [pm, xinxi, jianjie, ping, pingfen, duanping, img]
            print(douban)
​
            DouBan(id=pm, MovieName=xinxi, MovieIntroduction=jianjie, MovieScore=ping, NumberOfPeople=pingfen,
                   ShortComment=duanping, Link=img).save()
        except Exception as e:
            print(f"出错了:{pm}")
            x = i.select('.hd span')[0].text
            print(x)
​
def Fei():
    start = 0
    while start <= 50:
        chen(start)
        start += 25

urls中的代码

from django.urls import path
from . import views
from polls import views
​
urlpatterns = [
    path('love',views.love)
]

views中的代码

from django.shortcuts import render
from django.http import HttpResponseRedirect
from django.http import HttpResponse
from polls.models import UserInfo
​
def love(request):
    Fei()
    return HttpResponse()

初学者,如有错误请在下方留言指出​!

  • 4
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值