知乎爬虫经验教程

最新推荐文章于 2024-09-30 17:08:00 发布

瑶瑶摇到外婆桥

最新推荐文章于 2024-09-30 17:08:00 发布

阅读量1.9k

点赞数 1

分类专栏：机器学习 python

本文链接：https://blog.csdn.net/weixin_41229479/article/details/88879917

版权

本文分享了一次编写知乎爬虫的经验，从导入Python包、读取用户URL、新建CSV文件存储数据，到遍历用户主页、解析网页信息，详述了爬取和解析过程，以及如何将信息写入CSV文件。通过for循环和定位网页元素实现用户信息的抓取，为后续的数据处理和分析打下基础。

摘要由CSDN通过智能技术生成

为了完成课程论文研究，暑假写了关于知乎的爬虫，把用户主页/回答中能爬到的数据全都爬下来了。接下来，把我踩过的坑跟大家分享，希望大家少走一些弯路！

1.导入一些必要的python包，其中会包括我们接下来要使用的函数；

import re
import urllib3
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.support.ui import WebDriverWait
import time
import sys
import pymysql 
#与数据库连接相关的模块
import random
#便于随机选择头部
import string
#修改数据类型时，对字符串的处理
import sqlite3
#写入csv操作
import csv

2.读取每个用户的url，便于之后直接登陆用户的个人主页，对其页面进行解析

db1 = pymysql.connect("。。。","。。。","。。。","。。。" )
cursor = db1.cursor()
sql_0="SELECT id from user"
cursor.execute(sql_0)
result_id = cursor.fetchall() #获取已经爬取的user_id存放在result_id中
result=list(result_id)
db1.clo