参考视频:零基础入门学Python (作者:小甲鱼)
导入模块
1.import 模块名字
2.from 模块名字 import 函数名/*
3.import 模块名字 as 新名字
爬虫
例子1:如何访问互联网
urllib
>>> import urllib.request
>>>> request = urllib.request.urlopen("http://www.fishc.com")
>>> html = request.read()
>>> print(html)
b'\xef\xbb\xbf<!DOCTYPE html>\r\n<html lang="en">\r\n<head>\r\n\t<meta charset="UTF-8">\r\n\t<title>\xe9\xb1\xbcC\xe5\xb7\xa5\xe4\xbd\x9c\xe5\xae\xa4-7\xe5\x91\xa8\xe5\xb9\xb4\xe4\xb8\xbb\xe9\xa1\xb5</title>\r\n \r\n <meta name="viewport" content="width=device-width,initial-scale=1.0,maximum-scale=1.0,user-scalable=no">\r\n <meta name="apple-mobile-app-capable" content="yes">\r\n <meta name="apple-mobile-app-status-bar-style" content="black">\r\n \r\n <meta name="description" content="\xe9\xb1\xbcC\xe5\xb7\xa5\xe4\xbd\x9c\xe5\xae\xa4|\xe5\x85\x8d\xe8\xb4\xb9\xe7\xbc\x96\xe7\xa8\x8b\xe8\xa7\x86\xe9\xa2\x91\xe6\x95\x99\xe5\xad\xa6|\xe7\xbc\x96\xe7\xa8\x8b\xe6\x8a\x80\xe6\x9c\xaf\xe4\xba\xa4\xe6\xb5\x81|C\xe8\xaf\xad\xe8\xa8\x80|\xe6\xb1\x87\xe7\xbc\x96|Python|win32|Delphi|\xe5\x8a\xa0\xe5\xaf\x86\xe4\xb8\x8e\xe8\xa7\xa3\xe5\xaf\x86|Linux">\r\n <meta name="keywords" content="C,C++,Python,Scratch,HTML5,CSS3,JavaScript,Qt,\xe6\xb1\x87\xe7\xbc\x96,WinSDK">\r\n <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1" />\r\n <meta http-equiv="X-UA-Compatible" content="IE=edge">\r\n <meta name="author" content="cononico">\r\n <meta name="application-name" content="Cononico" >\r\n\r\n <link rel="stylesheet" type="text/css" href="css/main.css">\r\n <link rel="stylesheet" type="text/css" href="css/process.css">\r\n <link rel="shortcut icon" type="image/x-icon" href="favicon.ico" />\r\n \r\n <script type="text/javascript">\r\n //\xe8\xae\xbe\xe5\xae\x9arem\xe5\x80\xbc\r\n document.getElementsByTagName("html")[0].style.fontSize = document.documentElement.clientWi
… …
此处为二进制代码
解码为
>>> html = html.decode("utf-8")
>>> print(html)
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>鱼C工作室-7周年主页</title>
<meta name="viewport" content="width=device-width,initial-scale=1.0,maximum-scale=1.0,user-scalable=no">
<meta name="apple-mobile-app-capable" content="yes">
<meta name="apple-mobile-app-status-bar-style" content="black">
<meta name="description" content="鱼C工作室|免费编程视频教学|编程技术交流|C语言|汇编|Python|win32|Delphi|加密与解密|Linux">
<meta name="keywords" content="C,C++,Python,Scratch,HTML5,CSS3,JavaScript,Qt,汇编,WinSDK">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1" />
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="author" content="cononico">
<meta name="application-name" content="Cononico" >
<link rel="stylesheet" type="text/css" href="css/main.css">
<link rel="stylesheet" type="text/css" href="css/process.css">
<link rel="shortcut icon" type="image/x-icon" href="favicon.ico" />
<script type="text/javascript">
//设定rem值
document.getElementsByTagName("html")[0].style.fontSize = document.documentElement.clientWidth/20 + 'px';
</script>
</head>
<body>
<header class="head">
<div class="head_logo_div">
<img class="logo_img" src="images/upload/FishC.png"></a>
</div>
<div class="head_nav_div">
… …
例子2:从网站中获取图片
import urllib.request
response = urllib.request.urlopen('http://placekitten.com/g/500/500')
cat_img = response.read()
with open('cat_500_500.jpg','wb') as f:
f.write(cat_img)
把地址传给urlopen
用read方法读取内容
例子3:利用有道翻译
import urllib.request
import urllib.parse
url = 'http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule'
data ={}
data['doctype']='json'
data['from']='AUTO'
data['i']='I+LOVE+FishC.com!'
data['keyfrom']='fanyi.web'
data['to']='AUTO'
data['typoResult']='false'
data['version']='2.1'
data = urllib.parse.urlencode(data).encode('utf-8')
response = urllib.request.urlopen(url,data)
html = response.read().decode('utf-8')
print(html)
出错了不知道为什么
隐藏:防止被屏蔽
修改headers
1.通过request的headers 参数修改 : 加一个user-agent
2.通过request.add_header()方法修改:追加header
代理
1.设置字典{‘类型’:‘代理ip:端口号’}
proxy_support = urllib.request.ProxyHandle({})
2.定制,创建一个opener
opener = urllib.request.build_opener(proxy_suppory)
3..安装opener
urllib.request.install_opener(opener)
///
调用opener
opener.open(url)