豆瓣聚会的数据爬取代码

最新推荐文章于 2018-10-07 11:37:43 发布

a185934139

最新推荐文章于 2018-10-07 11:37:43 发布

阅读量167

点赞数

分类专栏：作业

本文链接：https://blog.csdn.net/a185934139/article/details/81974786

版权

作业专栏收录该内容

8 篇文章 0 订阅

订阅专栏

豆瓣：

from bs4 import BeautifulSoup
import requests
from day05.mysqlhelper import MysqlHelper
url = 'https://beijing.douban.com/events/week-party'

response = requests.get(url)
helper = MysqlHelper()
soup = BeautifulSoup(response.text,'lxml')
# print(soup)
# with open('douban.html','wb') as f:
#     f.write(response.content)

ul_tag = soup.find('ul',class_='events-list events-list-pic100 events-list-psmall')
li_tags = ul_tag.find_all('li',class_='list-entry')

for li_tag in li_tags:
    title = li_tag.select('div.title > a > span')[0].text
    # print(title)
    time = li_tag.select('li.event-time')[0].text.replace('\n','').replace(' ','')
    # print(time)
    address = li_tag.select('ul.event-meta > li:nth-of-type(2)')[0].text.replace('\n','').replace(' ','')
    # print(address)
    fee = li_tag.select('li.fee')[0].text.replace('\n','').replace(' ','')
    # print(fee)
    owner = li_tag.select('ul.event-meta > li:nth-of-type(4)')[0].text.replace('\n','').replace(' ','')
    # print(owner)
    data = (title, time, address, fee, owner)
    sql = 'insert into douban(title, `time`, address, fee, owner) values(%s, %s, %s, %s, %s)'
    helper.execute_modify_sql(sql, data)

Chrome Options：

from selenium import webdriver
import time
options_chrome = webdriver.ChromeOptions()
options_chrome.add_argument('--headless')

driver = webdriver.Chrome(chrome_options=options_chrome)
time.sleep(1)
url = 'http://www.baidu.com'
driver.get(url)

with open('baidu.html' ,'wb') as f:
    f.write(driver.page_source.encode('utf-8'))

a185934139

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
豆瓣聚会的数据爬取代码

豆瓣： from bs4 import BeautifulSoupimport requestsfrom day05.mysqlhelper import MysqlHelperurl = 'https://beijing.douban.com/events/week-party'response = requests.get(url)helper = MysqlHelper...
复制链接

扫一扫