项目实践笔记

【实践爬虫笔记】

前言

小编也是零基础做一个爬虫来爬取豆瓣电影上的详情及海报等信息,刚开始会有一些急功近利,想找一些视频能够跟着做,就做出来的那种,但是似乎没有找到。于是综合了多篇资料的学习成果,做出以下分享,跟着做你也能做出来的。

环境搭建

安装pycharm python3.6***

https://blog.csdn.net/weixin_38285131/article/details/79427168

下载地址:
http://www.jetbrains.com/pycharm/download/#section=windows

安装过程:
在这里插入图片描述在这里插入图片描述在这里插入图片描述

破解过程:

  1.  修改hosts文件:将0.0.0.0 account.jetbrains.com添加到hosts文件最后。window的hosts文件目录:c:\windows\system32\drivers\etc
    
  2.  复制激活码:打开PyCharm选择Activation code激活,然后复制下面的激活码点击激活。
    
  3.  K71U8DBPNE-eyJsaWNlbnNlSWQiOiJLNzFVOERCUE5FIiwibGljZW5zZWVOYW1lIjoibGFuIHl1IiwiYXNzaWduZWVOYW1lIjoiIiwiYXNzaWduZWVFbWFpbCI6IiIsImxpY2Vuc2VSZXN0cmljdGlvbiI6IkZvciBlZHVjYXRpb25hbCB1c2Ugb25seSIsImNoZWNrQ29uY3VycmVudFVzZSI6ZmFsc2UsInByb2R1Y3RzIjpbeyJjb2RlIjoiSUkiLCJwYWlkVXBUbyI6IjIwMTktMDUtMDQifSx7ImNvZGUiOiJSUzAiLCJwYWlkVXBUbyI6IjIwMTktMDUtMDQifSx7ImNvZGUiOiJXUyIsInBhaWRVcFRvIjoiMjAxOS0wNS0wNCJ9LHsiY29kZSI6IlJEIiwicGFpZFVwVG8iOiIyMDE5LTA1LTA0In0seyJjb2RlIjoiUkMiLCJwYWlkVXBUbyI6IjIwMTktMDUtMDQifSx7ImNvZGUiOiJEQyIsInBhaWRVcFRvIjoiMjAxOS0wNS0wNCJ9LHsiY29kZSI6IkRCIiwicGFpZFVwVG8iOiIyMDE5LTA1LTA0In0seyJjb2RlIjoiUk0iLCJwYWlkVXBUbyI6IjIwMTktMDUtMDQifSx7ImNvZGUiOiJETSIsInBhaWRVcFRvIjoiMjAxOS0wNS0wNCJ9LHsiY29kZSI6IkFDIiwicGFpZFVwVG8iOiIyMDE5LTA1LTA0In0seyJjb2RlIjoiRFBOIiwicGFpZFVwVG8iOiIyMDE5LTA1LTA0In0seyJjb2RlIjoiR08iLCJwYWlkVXBUbyI6IjIwMTktMDUtMDQifSx7ImNvZGUiOiJQUyIsInBhaWRVcFRvIjoiMjAxOS0wNS0wNCJ9LHsiY29kZSI6IkNMIiwicGFpZFVwVG8iOiIyMDE5LTA1LTA0In0seyJjb2RlIjoiUEMiLCJwYWlkVXBUbyI6IjIwMTktMDUtMDQifSx7ImNvZGUiOiJSU1UiLCJwYWlkVXBUbyI6IjIwMTktMDUtMDQifV0sImhhc2giOiI4OTA4Mjg5LzAiLCJncmFjZVBlcmlvZERheXMiOjAsImF1dG9Qcm9sb25nYXRlZCI6ZmFsc2UsImlzQXV0b1Byb2xvbmdhdGVkIjpmYWxzZX0=-Owt3/+LdCpedvF0eQ8635yYt0+ZLtCfIHOKzSrx5hBtbKGYRPFDrdgQAK6lJjexl2emLBcUq729K1+ukY9Js0nx1NH09l9Rw4c7k9wUksLl6RWx7Hcdcma1AHolfSp79NynSMZzQQLFohNyjD+dXfXM5GYd2OTHya0zYjTNMmAJuuRsapJMP9F1z7UTpMpLMxS/JaCWdyX6qIs+funJdPF7bjzYAQBvtbz+6SANBgN36gG1B2xHhccTn6WE8vagwwSNuM70egpahcTktoHxI7uS1JGN9gKAr6nbp+8DbFz3a2wd+XoF3nSJb/d2f/6zJR8yJF8AOyb30kwg3zf5cWw==-MIIEPjCCAiagAwIBAgIBBTANBgkqhkiG9w0BAQsFADAYMRYwFAYDVQQDDA1KZXRQcm9maWxlIENBMB4XDTE1MTEwMjA4MjE0OFoXDTE4MTEwMTA4MjE0OFowETEPMA0GA1UEAwwGcHJvZDN5MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAxcQkq+zdxlR2mmRYBPzGbUNdMN6OaXiXzxIWtMEkrJMO/5oUfQJbLLuMSMK0QHFmaI37WShyxZcfRCidwXjot4zmNBKnlyHodDij/78TmVqFl8nOeD5+07B8VEaIu7c3E1N+e1doC6wht4I4+IEmtsPAdoaj5WCQVQbrI8KeT8M9VcBIWX7fD0fhexfg3ZRt0xqwMcXGNp3DdJHiO0rCdU+Itv7EmtnSVq9jBG1usMSFvMowR25mju2JcPFp1+I4ZI+FqgR8gyG8oiNDyNEoAbsR3lOpI7grUYSvkB/xVy/VoklPCK2h0f0GJxFjnye8NT1PAywoyl7RmiAVRE/EKwIDAQABo4GZMIGWMAkGA1UdEwQCMAAwHQYDVR0OBBYEFGEpG9oZGcfLMGNBkY7SgHiMGgTcMEgGA1UdIwRBMD+AFKOetkhnQhI2Qb1t4Lm0oFKLl/GzoRykGjAYMRYwFAYDVQQDDA1KZXRQcm9maWxlIENBggkA0myxg7KDeeEwEwYDVR0lBAwwCgYIKwYBBQUHAwEwCwYDVR0PBAQDAgWgMA0GCSqGSIb3DQEBCwUAA4ICAQC9WZuYgQedSuOc5TOUSrRigMw4/+wuC5EtZBfvdl4HT/8vzMW/oUlIP4YCvA0XKyBaCJ2iX+ZCDKoPfiYXiaSiH+HxAPV6J79vvouxKrWg2XV6ShFtPLP+0gPdGq3x9R3+kJbmAm8w+FOdlWqAfJrLvpzMGNeDU14YGXiZ9bVzmIQbwrBA+c/F4tlK/DV07dsNExihqFoibnqDiVNTGombaU2dDup2gwKdL81ua8EIcGNExHe82kjF4zwfadHk3bQVvbfdAwxcDy4xBjs3L4raPLU3yenSzr/OEur1+jfOxnQSmEcMXKXgrAQ9U55gwjcOFKrgOxEdek/Sk1VfOjvS+nuM4eyEruFMfaZHzoQiuw4IqgGc45ohFH0UUyjYcuFxxDSU9lMCv8qdHKm+wnPRb0l9l5vXsCBDuhAGYD6ss+Ga+aDY6f/qXZuUCEUOH3QUNbbCUlviSz6+GiRnt1kA9N2Qachl+2yBfaqUqr8h7Z2gsx5LcIf5kYNsqJ0GavXTVyWh7PYiKX4bs354ZQLUwwa/cG++2+wNWP+HtBhVxMRNTdVhSm38AknZlD+PTAsWGu9GyLmhti2EnVwGybSD2Dxmhxk3IPCkhKAK+pl0eWYGZWG3tJ9mZ7SowcXLWDFAk0lRJnKGFMTggrWjV8GYpw5bq23VmIqqDLgkNzuoog==
    

安装python***

安装wheel,Lxml,Twisted,scrapy***

参考:
https://www.cnblogs.com/MC-Curry/p/8503813.html
https://baijiahao.baidu.com/s?id=1597465401467369572&wfr=spider&for=pc

解决pip升级报错:

python -m pip install –upgrade pip

安装pywin32***

参考:https://www.cnblogs.com/yrqiang/p/5295252.html
“ImportError: DLL load failed: %1 不是有效的 Win32 应用程序”解决办法:https://blog.csdn.net/sinat_34615726/article/details/67636949

安装request

步骤:命令行在python安装的目录下输入pip install request

安装requests***

步骤:命令行在python安装的scripts文件目录下输入pip install requests	

安装json***

步骤:https://blog.csdn.net/zhouzhiwengang/article/details/72357608

安装pyMysql***

步骤:pip install pymysql

安装pillow

步骤:https://blog.csdn.net/qq_39720249/article/details/86159852

安装matplotlib

没成功

安装webstorm***

安装:https://jingyan.baidu.com/article/7f41ecec323801593d095c28.html
激活:https://blog.csdn.net/voke_/article/details/76418116
汉化:https://www.jb51.net/softs/606355.html#downintro2

卸载MYSQL***

https://blog.csdn.net/qq_41140741/article/details/81489531

安装MYSQL并连接客户端***

参考:https://jingyan.baidu.com/album/fc07f989bf2cc712ffe51902.html?

安装时出现系统错误2:

picindex=8https://blog.csdn.net/zhouyufengqingyang/article/details/46291057

新建连接时出现错误代码2058:

https://blog.csdn.net/weixin_38635565/article/details/80704082

新建连接出现错误代码1045:

找了很多资料最后发现是我的密码和之前安装的Mysql密码不一致(哭笑不得,希望各位亲不会像我一样浪费那么多时间)

代码运行(试试手)

源码链接:

https://download.csdn.net/download/ljm_9615/9921754(下载后直接在pycharm工具中打开)

报错解决:

https://blog.csdn.net/sinat_27693393/article/details/56852547
https://blog.csdn.net/x_kh_2001/article/details/81146916

不同目录下调用类(import sys…):

https://blog.csdn.net/winycg/article/details/78512300

数据库中表的更新:

https://blog.csdn.net/qq_36523839/article/details/70510822

开始做东西

在sqlyog上新建数据库新建表,设置变量和数据类型:

https://zhidao.baidu.com/question/69046104.html

在pyycharm中新建项目:(部分注释ctrl+/)

https://blog.csdn.net/cskywit/article/details/80960575
https://www.cnblogs.com/airnew/p/9981599.html
https://blog.csdn.net/gx864102252/article/details/73017989
https://blog.csdn.net/dreambyday/article/details/83210494

You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘desc VARCHAR(1000))’ at line 2

显示该error表示有mysql的语法错误在 'desc VARCHAR(1000))'附近,最终发现是desc命名不规范改名就OK了

爬虫刚开始 可以爬取到数据,一两次过后html response[403]表示被服务器拒绝访问,所以写爬虫要防止被封:***(最简单的就是慢慢爬,利用time.sleep方法睡眠,防止怕太快被网站封了)

https://blog.csdn.net/ZhangQiye1993/article/details/83583913
https://www.imooc.com/article/38989

爬取图片链接存入本地文件:

https://blog.csdn.net/weixin_39777626/article/details/79300856

常见User-Agent大全

https://blog.csdn.net/rookie_is_me/article/details/81634048

python之_requests库学习_5(超时与异常)

https://blog.csdn.net/ddq_dq/article/details/78643128

pycharm插入数据到数据库中:

https://blog.csdn.net/zyz_home/article/details/79779936

Xpath解析小知识:

https://blog.csdn.net/qw_xingzhe/article/details/53056548?locationNum=4&fps=1
https://zhuanlan.zhihu.com/p/29436838

右键-Copy-Copy Xpath,得到xpath路径

https://blog.csdn.net/bf02jgtrs00xktcx/article/details/83663400

Python数组取一个或几个元素值的例子,word[1:]

https://blog.csdn.net/qq_27361945/article/details/83012977

re.findall()返回一个数组,可以遍历(借鉴去组成串)

https://blog.csdn.net/eastmount/article/details/51082253

for a in range(10,15)用法

https://blog.csdn.net/heartyhu/article/details/50988007

mysql忽略主键冲突、避免重复插入的几种方式?

https://www.zhihu.com/question/41053844#answer-31186496
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值