Python练习题--爬取猫眼电影TOP100

题目要求:

爬取这个网站:http://maoyan.com/board/4?offset=0 上TOP100电影的①电影名②演员③日期④图片 并写入数据库。

代码如下:

#!/usr/bin/env python
# coding:utf-8

import re
import urllib
from urllib import *
from urllib import request

import pymysql

url = r'http://maoyan.com/board/4?offset='


def get_content(url):
    with request.urlopen(url) as f:
        content = f.read().decode('utf-8').replace(' ', '')
        return content


def create_url(url):
    url_li = []
    for i in range(0, 100, 10):
        newurl = url + '%d' % i
        url_li.append(newurl)
    return url_li


def get_film(content):
    # content = get_content(url)
    pattern = r'<pclass="name"><ahref=".*"title="(.*)"data-act="boarditem-click"data-val="{movieId:.*}">'
    return re.findall(pattern, content)


def get_date(content):
    # content = get_content(url)
    pattern = r'<pclass="releasetime">上映时间:(.*)\(?.*\)?</p></div>'
    return re.findall(pattern, content)


def get_act(content):
    # content = get_content(url)
    pattern = r'主演:(.+)'
    return re.findall(pattern, content)


def get_purl(content):
    # content = get_content(url)
    pattern = r'<imgdata-src="(.+)"alt=".*"class="board-img"/>'
    return re.findall(pattern, content)


if __name__ == '__main__':
    # urls = create_url(url)
    # purls = []
    # with open('E:\\wenjian.txt','a+') as f:
    #     for url in urls:
    #         f.write(get_content(url))


    with open('E:\\wenjian.txt','r') as f:
        neirong = f.read()
        films = get_film(neirong)
        acts = get_act(neirong)
        dates = get_date(neirong)
        print(dates)
        for i in range(0,len(dates)-1):
            if '(' in dates[i]:
                dates[i] = dates[i].replace(re.findall(r'\(.+\)',dates[i])[0],'')
        print(dates[12]+'-01-01')
        for i in range(0,len(dates)-1):
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值