import requests
爬取阳光电影
html = requests.get("https://www.ygdy8.com/index.html")
print(html.text)
运行发现,打印乱码.
<a href='/html/gndy/jddy/20160320/50541.html'>IMDBÆÀ·Ö8·Ö×óÓÒӰƬ400Óಿ</a><br/>
<a href='/html/gndy/jddy/20200627/60172.html'>2020Äêϲ¾ç¡¶Ôã¸â×Éѯ/²»Á¼ÂÉ</a><br/>
<a href='/html/gndy/dyzz/20200627/60171.html'>2019Äê»ñ½±¾çÇéÒôÀÖ¡¶ÃÛ·äÓë</a><br/>
<a href='/html/gndy/dyzz/20200627/60170.html'>2019Ä궯×÷ÔÖÄÑ¡¶¼«ÏÞÌÓÉú¡·B</a><br/>
<a href='/html/gndy/jddy/20200627/60169.html'>2008Äê¸ß·ÖÐüÒÉ¡¶ÏÓÒÉÈËXµÄÏ×</a><br/>
<a href='/html/gndy/jddy/20200627/60168.html'>2020Äê¿Æ»ÃÐüÒÉ¡¶»úе»Æ¤¡·H</a><br/>
<a href='/html/gndy/jddy/20200627/60167.html'>2020Äê¿Æ»ÃÐüÒÉ¡¶»úе»Æ¤¡·H</a><br/>
<a href='/html/gndy/jddy/20200627/60166.html'>2020ÄêÆæ»Ã¡¶ÏÉÊéÆæÌ·/×½ÏɼÇ</a><br/>
<a href='/html/gndy/jddy/20200626/60164.html'>2020Äê¾çÇé·¸×¶ñÃû/ÄÏ·½Ö®</a><br/>
首先我们需要查看网站是什么编码,这个时候用会用到charset我们发现charset=gb2312.
这个时候我们只需要添加一行html.encoding = "gb2312"即可
import requests
# 爬取阳光电影
html = requests.get("https://www.ygdy8.com/index.html")
html.encoding = "gb2312"
print(html.text)