【实践爬虫笔记】
前言
小编也是零基础做一个爬虫来爬取豆瓣电影上的详情及海报等信息,刚开始会有一些急功近利,想找一些视频能够跟着做,就做出来的那种,但是似乎没有找到。于是综合了多篇资料的学习成果,做出以下分享,跟着做你也能做出来的。
环境搭建
安装pycharm python3.6***
https://blog.csdn.net/weixin_38285131/article/details/79427168
下载地址:
http://www.jetbrains.com/pycharm/download/#section=windows
安装过程:
破解过程:
-
修改hosts文件:将0.0.0.0 account.jetbrains.com添加到hosts文件最后。window的hosts文件目录:c:\windows\system32\drivers\etc
-
复制激活码:打开PyCharm选择Activation code激活,然后复制下面的激活码点击激活。
-
K71U8DBPNE-eyJsaWNlbnNlSWQiOiJLNzFVOERCUE5FIiwibGljZW5zZWVOYW1lIjoibGFuIHl1IiwiYXNzaWduZWVOYW1lIjoiIiwiYXNzaWduZWVFbWFpbCI6IiIsImxpY2Vuc2VSZXN0cmljdGlvbiI6IkZvciBlZHVjYXRpb25hbCB1c2Ugb25seSIsImNoZWNrQ29uY3VycmVudFVzZSI6ZmFsc2UsInByb2R1Y3RzIjpbeyJjb2RlIjoiSUkiLCJwYWlkVXBUbyI6IjIwMTktMDUtMDQifSx7ImNvZGUiOiJSUzAiLCJwYWlkVXBUbyI6IjIwMTktMDUtMDQifSx7ImNvZGUiOiJXUyIsInBhaWRVcFRvIjoiMjAxOS0wNS0wNCJ9LHsiY29kZSI6IlJEIiwicGFpZFVwVG8iOiIyMDE5LTA1LTA0In0seyJjb2RlIjoiUkMiLCJwYWlkVXBUbyI6IjIwMTktMDUtMDQifSx7ImNvZGUiOiJEQyIsInBhaWRVcFRvIjoiMjAxOS0wNS0wNCJ9LHsiY29kZSI6IkRCIiwicGFpZFVwVG8iOiIyMDE5LTA1LTA0In0seyJjb2RlIjoiUk0iLCJwYWlkVXBUbyI6IjIwMTktMDUtMDQifSx7ImNvZGUiOiJETSIsInBhaWRVcFRvIjoiMjAxOS0wNS0wNCJ9LHsiY29kZSI6IkFDIiwicGFpZFVwVG8iOiIyMDE5LTA1LTA0In0seyJjb2RlIjoiRFBOIiwicGFpZFVwVG8iOiIyMDE5LTA1LTA0In0seyJjb2RlIjoiR08iLCJwYWlkVXBUbyI6IjIwMTktMDUtMDQifSx7ImNvZGUiOiJQUyIsInBhaWRVcFRvIjoiMjAxOS0wNS0wNCJ9LHsiY29kZSI6IkNMIiwicGFpZFVwVG8iOiIyMDE5LTA1LTA0In0seyJjb2RlIjoiUEMiLCJwYWlkVXBUbyI6IjIwMTktMDUtMDQifSx7ImNvZGUiOiJSU1UiLCJwYWlkVXBUbyI6IjIwMTktMDUtMDQifV0sImhhc2giOiI4OTA4Mjg5LzAiLCJncmFjZVBlcmlvZERheXMiOjAsImF1dG9Qcm9sb25nYXRlZCI6ZmFsc2UsImlzQXV0b1Byb2xvbmdhdGVkIjpmYWxzZX0=-Owt3/+LdCpedvF0eQ8635yYt0+ZLtCfIHOKzSrx5hBtbKGYRPFDrdgQAK6lJjexl2emLBcUq729K1+ukY9Js0nx1NH09l9Rw4c7k9wUksLl6RWx7Hcdcma1AHolfSp79NynSMZzQQLFohNyjD+dXfXM5GYd2OTHya0zYjTNMmAJuuRsapJMP9F1z7UTpMpLMxS/JaCWdyX6qIs+funJdPF7bjzYAQBvtbz+6SANBgN36gG1B2xHhccTn6WE8vagwwSNuM70egpahcTktoHxI7uS1JGN9gKAr6nbp+8DbFz3a2wd+XoF3nSJb/d2f/6zJR8yJF8AOyb30kwg3zf5cWw==-MIIEPjCCAiagAwIBAgIBBTANBgkqhkiG9w0BAQsFADAYMRYwFAYDVQQDDA1KZXRQcm9maWxlIENBMB4XDTE1MTEwMjA4MjE0OFoXDTE4MTEwMTA4MjE0OFowETEPMA0GA1UEAwwGcHJvZDN5MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAxcQkq+zdxlR2mmRYBPzGbUNdMN6OaXiXzxIWtMEkrJMO/5oUfQJbLLuMSMK0QHFmaI37WShyxZcfRCidwXjot4zmNBKnlyHodDij/78TmVqFl8nOeD5+07B8VEaIu7c3E1N+e1doC6wht4I4+IEmtsPAdoaj5WCQVQbrI8KeT8M9VcBIWX7fD0fhexfg3ZRt0xqwMcXGNp3DdJHiO0rCdU+Itv7EmtnSVq9jBG1usMSFvMowR25mju2JcPFp1+I4ZI+FqgR8gyG8oiNDyNEoAbsR3lOpI7grUYSvkB/xVy/VoklPCK2h0f0GJxFjnye8NT1PAywoyl7RmiAVRE/EKwIDAQABo4GZMIGWMAkGA1UdEwQCMAAwHQYDVR0OBBYEFGEpG9oZGcfLMGNBkY7SgHiMGgTcMEgGA1UdIwRBMD+AFKOetkhnQhI2Qb1t4Lm0oFKLl/GzoRykGjAYMRYwFAYDVQQDDA1KZXRQcm9maWxlIENBggkA0myxg7KDeeEwEwYDVR0lBAwwCgYIKwYBBQUHAwEwCwYDVR0PBAQDAgWgMA0GCSqGSIb3DQEBCwUAA4ICAQC9WZuYgQedSuOc5TOUSrRigMw4/+wuC5EtZBfvdl4HT/8vzMW/oUlIP4YCvA0XKyBaCJ2iX+ZCDKoPfiYXiaSiH+HxAPV6J79vvouxKrWg2XV6ShFtPLP+0gPdGq3x9R3+kJbmAm8w+FOdlWqAfJrLvpzMGNeDU14YGXiZ9bVzmIQbwrBA+c/F4tlK/DV07dsNExihqFoibnqDiVNTGombaU2dDup2gwKdL81ua8EIcGNExHe82kjF4zwfadHk3bQVvbfdAwxcDy4xBjs3L4raPLU3yenSzr/OEur1+jfOxnQSmEcMXKXgrAQ9U55gwjcOFKrgOxEdek/Sk1VfOjvS+nuM4eyEruFMfaZHzoQiuw4IqgGc45ohFH0UUyjYcuFxxDSU9lMCv8qdHKm+wnPRb0l9l5vXsCBDuhAGYD6ss+Ga+aDY6f/qXZuUCEUOH3QUNbbCUlviSz6+GiRnt1kA9N2Qachl+2yBfaqUqr8h7Z2gsx5LcIf5kYNsqJ0GavXTVyWh7PYiKX4bs354ZQLUwwa/cG++2+wNWP+HtBhVxMRNTdVhSm38AknZlD+PTAsWGu9GyLmhti2EnVwGybSD2Dxmhxk3IPCkhKAK+pl0eWYGZWG3tJ9mZ7SowcXLWDFAk0lRJnKGFMTggrWjV8GYpw5bq23VmIqqDLgkNzuoog==
安装python***
安装wheel,Lxml,Twisted,scrapy***
参考:
https://www.cnblogs.com/MC-Curry/p/8503813.html
https://baijiahao.baidu.com/s?id=1597465401467369572&wfr=spider&for=pc
解决pip升级报错:
python -m pip install –upgrade pip
安装pywin32***
参考:https://www.cnblogs.com/yrqiang/p/5295252.html
“ImportError: DLL load failed: %1 不是有效的 Win32 应用程序”解决办法:https://blog.csdn.net/sinat_34615726/article/details/67636949
安装request
步骤:命令行在python安装的目录下输入pip install request
安装requests***
步骤:命令行在python安装的scripts文件目录下输入pip install requests
安装json***
步骤:https://blog.csdn.net/zhouzhiwengang/article/details/72357608
安装pyMysql***
步骤:pip install pymysql
安装pillow
步骤:https://blog.csdn.net/qq_39720249/article/details/86159852
安装matplotlib
没成功
安装webstorm***
安装:https://jingyan.baidu.com/article/7f41ecec323801593d095c28.html
激活:https://blog.csdn.net/voke_/article/details/76418116
汉化:https://www.jb51.net/softs/606355.html#downintro2
卸载MYSQL***
https://blog.csdn.net/qq_41140741/article/details/81489531
安装MYSQL并连接客户端***
参考:https://jingyan.baidu.com/album/fc07f989bf2cc712ffe51902.html?
安装时出现系统错误2:
picindex=8https://blog.csdn.net/zhouyufengqingyang/article/details/46291057
新建连接时出现错误代码2058:
https://blog.csdn.net/weixin_38635565/article/details/80704082
新建连接出现错误代码1045:
找了很多资料最后发现是我的密码和之前安装的Mysql密码不一致(哭笑不得,希望各位亲不会像我一样浪费那么多时间)
代码运行(试试手)
源码链接:
https://download.csdn.net/download/ljm_9615/9921754(下载后直接在pycharm工具中打开)
报错解决:
https://blog.csdn.net/sinat_27693393/article/details/56852547
https://blog.csdn.net/x_kh_2001/article/details/81146916
不同目录下调用类(import sys…):
https://blog.csdn.net/winycg/article/details/78512300
数据库中表的更新:
https://blog.csdn.net/qq_36523839/article/details/70510822
开始做东西
在sqlyog上新建数据库新建表,设置变量和数据类型:
https://zhidao.baidu.com/question/69046104.html
在pyycharm中新建项目:(部分注释ctrl+/)
https://blog.csdn.net/cskywit/article/details/80960575
https://www.cnblogs.com/airnew/p/9981599.html
https://blog.csdn.net/gx864102252/article/details/73017989
https://blog.csdn.net/dreambyday/article/details/83210494
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘desc VARCHAR(1000))’ at line 2
显示该error表示有mysql的语法错误在 'desc VARCHAR(1000))'附近,最终发现是desc命名不规范改名就OK了
爬虫刚开始 可以爬取到数据,一两次过后html response[403]表示被服务器拒绝访问,所以写爬虫要防止被封:***(最简单的就是慢慢爬,利用time.sleep方法睡眠,防止怕太快被网站封了)
https://blog.csdn.net/ZhangQiye1993/article/details/83583913
https://www.imooc.com/article/38989
爬取图片链接存入本地文件:
https://blog.csdn.net/weixin_39777626/article/details/79300856
常见User-Agent大全
https://blog.csdn.net/rookie_is_me/article/details/81634048
python之_requests库学习_5(超时与异常)
https://blog.csdn.net/ddq_dq/article/details/78643128
pycharm插入数据到数据库中:
https://blog.csdn.net/zyz_home/article/details/79779936
Xpath解析小知识:
https://blog.csdn.net/qw_xingzhe/article/details/53056548?locationNum=4&fps=1
https://zhuanlan.zhihu.com/p/29436838
右键-Copy-Copy Xpath,得到xpath路径
https://blog.csdn.net/bf02jgtrs00xktcx/article/details/83663400
Python数组取一个或几个元素值的例子,word[1:]
https://blog.csdn.net/qq_27361945/article/details/83012977
re.findall()返回一个数组,可以遍历(借鉴去组成串)
https://blog.csdn.net/eastmount/article/details/51082253
for a in range(10,15)用法
https://blog.csdn.net/heartyhu/article/details/50988007
mysql忽略主键冲突、避免重复插入的几种方式?
https://www.zhihu.com/question/41053844#answer-31186496