爬虫基本原理(一)

什么是爬虫

请求网站并提取数据的自动化程序

爬虫基本流程

  1. 发起请求
    通过HTTP库向目标站点发起请求,即发送一个Request,请求可以包含额外的headers等信息,等待服务器的响应。
  2. 获取响应内容
    如果服务器正常响应,会得到一个Response,Response的内容便是所要获取的页面内容,类型可能有HTML、JSON、二进制文件(如图片、视频等类型)。
  3. 解析内容
    得到的内容可能是HTML,可以用正则表达式、网页解析库进行解析。可能是JSON,可以直接转成JOSN对象进行解析,可能是二进制数据,可以保存或者进一步处理
  4. 保存内容
    保存形式多样,可以保存成文本,也可以保存至数据库,或者保存成特定格式的文件。

request和response

request
response
服务器
  1. 浏览器发送消息给该网址的服务器,这个过程叫HTTP Request
  2. 服务器收到浏览器发送的消息后,能够根据浏览器发送消息的内容做相应处理,在把消息回传给浏览器,这个过程叫HTTP Response
  3. 浏览器收到Response后,对信息进行相应处理,然后展示

request中包含什么?

  1. 请求方式
    主要是GET、POST两种类型,另外还有HEAD、PUT、DELETE、OPTIONS等。(get可以直接请求url,post要表单,比如登录窗口,再请求)
  2. 请求URL
    URL全称是统一资源定位符,如一个网页文档、一张图片、一个视频等都可以用URL来唯一来确定
  3. 请求头
    包含请求时的头部信息,如User-Agent、Host、Cookies等信息
  4. 请求体
    请求时额外携带的数据,如表单提交时的表单数据

Response包含什么?

  1. 响应状态
    有多种响应状态,如200代表成功,301代表跳转,404代表找不到页面,502代表服务器错误等
  2. 响应头
    如内容类型、内容长度、服务器信息、设置cookies等等
  3. 响应体
    最主要的部分,包含了请求资源的内容,如网页HTML、图片二进制数据等。

  • input
import requests
response=requests.get("http://www.baidu.com")#向百度服务器发送请求
print(response.text)#响应体(网页源代码)
print(response.headers)#响应头(字典形式)
print(response.status_code)#状态码
  • output
<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css><title>ç™¾åº¦ä¸€ä¸‹ï¼Œä½ å°±çŸ¥é“</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus></span><span class="bg s_btn_wr"><input type=submit id=su value=百度一下 class="bg s_btn"></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav>æ–°é—»</a> <a href=http://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav>地图</a> <a href=http://v.baidu.com name=tj_trvideo class=mnav>视频</a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav>贴吧</a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&amp;tpl=mn&amp;u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb>登录</a> </noscript> <script>document.write('<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u='+ encodeURIComponent(window.location.href+ (window.location.search === "" ? "?" : "&")+ "bdorz_come=1")+ '" name="tj_login" class="lb">登录</a>');</script> <a href=//www.baidu.com/more/ name=tj_briicon class=bri style="display: block;">更多产品</a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com>å
³äºŽç™¾åº¦</a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>&copy;2017&nbsp;Baidu&nbsp;<a href=http://www.baidu.com/duty/>使用百度前å¿
读</a>&nbsp; <a href=http://jianyi.baidu.com/ class=cp-feedback>意见反馈</a>&nbsp;京ICP证030173号&nbsp; <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body>
</html>

{'Pragma': 'no-cache', 'Server': 'bfe/1.0.8.18', 'Cache-Control': 'private, no-cache, no-store, proxy-revalidate, no-transform', 'Content-Encoding': 'gzip', 'Set-Cookie': 'BDORZ=27315; max-age=86400; domain=.baidu.com; path=/', 'Last-Modified': 'Mon, 23 Jan 2017 13:27:36 GMT', 'Date': 'Wed, 31 Oct 2018 15:52:04 GMT', 'Content-Type': 'text/html', 'Connection': 'Keep-Alive', 'Transfer-Encoding': 'chunked'}

200

能抓到怎样的数据?

  1. 网页文本
    如HTML文档、JSON格式文本等
  2. 图片文件
    获取的是二进制文件,保存为图片格式
  • input
import requests
response=requests.get('https://www.baidu.com/img/baidu_jgylogo3.gif')
print(response.content)#响应体的二进制格式
with open('D:/pycharm/projects/爬虫/1-基础知识/baidu.gif','wb') as f:
    f.write(response.content)#写入
    f.close()
  • output(D:/pycharm/projects/爬虫/1-基础知识/baidu.gif目录下出现百度的图片)
b'GIF89au\x00&\x00\xa2\x00\x00\xe62/\xea\xd4\xe2Y`\xe8\x99\x9d\xf1\xefvt)2\xe1\xe1\x06\x02\xff\xff\xff!\xf9\x04\x00\x00\x00\x00\x00,\x00\x00\x00\x00u\x00&\x00\x00\x03\xffx\xba\xdc\xfe0\xb6 J\x190\x04\xc9\xbb\xff`\xc8UV!L\xa4\xb0\x89l\xeb\xbe\xcbP\x96\xeb\x11\xccV\r\xef|\x7f\xe0\x96\x93\x824\xc3\xf8\x8eH\xd0\r(\x94\x01\x0b\xc9\xa8\xf4\xb1\x04\x0e\x9f\x05\xddt{\xac\xe2\x14\xd8,w\x8c|\n\xb1B\xb2Z\xa4a\x10K\xc67|M\xf7Ph\n\xaf%\xf6\xd4\xd6\xffn8FN&:@F\x80\x89|V1~w\x85y\x03\x92\x8a\x7fr\x16\x88 \x849\x94khlO\x9cj\x9e!\x9as\xa1\\\xa3\x0c\x1am\x0e\x96&\xa7\\\xae\x98\x8fAZ\xaePQ\x04\xba~\x7f\xa5\x9byX\x98\xb7R\x06\xc5\xbc\xc5\xc8\xc9\x06\x00\x00\x04\xbc\x1f\x00\xc8WE\x0bz85\xae\'\xca\xdb\xdc\x06\n\xd1\xdd\xe1\xe0\xe1\xc8\x04.\xe3\x0bD\xc2a\xbf\xd9\x07\xe4\xe1\xdf\xf0\xdc\xe3\xf3\xe6,\xe8\x12\xbeL\n\xfb\x18\xcc\x00\x01&\x0b\x08P\x9e\xb7<\xacT\xb1\x1aH\xb0\x9e\x81g\x12t\xe9\xd2gj\x94\x9c4\x0e\x02 \x83\x08.\xffO1\x00\x13>*HF%\xd9\xbd$\x9a\x8c\x84i2@\x80\x00L\r\xc7\xc5\\\x00N\xe3\xbc\x8f$\x1f\x10(\x97\'&3g\x0c\xf29(\xa5r\xa5\x84\x9b9\x0f\xd4D\xba,i\x83q+l\x86;)4\x90 0\xec06`Z\x8cfW\x1b"U\x85M\xb6\xaa\xecNi\x1e\xe1\xad\xa8jC\x9d\x83\x0bX\xe1\xfex\xf5\x00m\x04\xbb\x1d\xc1.\x0b\xb9\xf7\x1d\xd2\x93\x11\xce\xf6eK\x84\x97\xdc\x8a\xb5\x1c\xd8\x85\x80\xf7\xab\xd4n\xf7\xfeBC[\x95\xd0\xa0 `\x8a\x1e.\xa1\xd5\xef\xc1\xbb_\x95:&\x07`\x05S\xc0\x11\xf2\t\xf5\xb2\xa1\xca\x19\xcd\x9a/m\xfd\xe8\x93Y\xe3\xcf\x1f\xc8\x96\xd5 \xf8a\xb5\xde\xdc\x0c~\xd6\xb4\x81\xd0\xeb\xb8\xb1\xf70\xe0\xfa\xb9\xa3\xc3p\x8f!\x08\x06i\xe3\xf96\xe1\xe9f\xb4\xe6\x9c\x19y\\<\x0b\x98{\xf5f\x9d[t\x9d\xc8\xa8O\'\x98S\xe8\x9bA\x15\x8e\x1f \x91\\L\xf8\xd0\x10\xf2\x8a\xf6\x16`\x1c\xd0\x00\x86\xd3m\xe0T52\x19D]P\x94\xd9U\x8aa\x9d}\xf7\xcbH\xf8=\xa0\x9f~\xbdUx\x1fm\xec\x99d\xa0\x16\xaa\xd9e\xcd\x00Z$\xd6\x00#\x17r0!~\x00*\x83\x1aW+\x00\xd7\rv\xd9\xb1#\xe3U\xcba\xe8\xd3mf=\xe7\xcc*L\xd5\xd0\xdf<0\x023\xe3\x90\xf6]\x88\xd4x\x8f\x95\xf3\\y\xe9\xed2\x81\x8b\xca\x04)\xe4J\xb7h\xa7\xd8iH\x0et\x12\x01\xf5\x1c\x98HK&\xbc\x04\xa2\x16\x01\xb4D\xc4K\x10\xad\x91\x00\x00;'

  1. 视频
    同为二进制文件,保存为视频格式即可
  2. 其他
    只要能够请求到的,都能够获取到

怎样来解析?

  1. 直接处理
  2. Json解析
  3. 正则表达式
  4. BeautifulSoup
  5. PyQuery
  6. XPath

为什么我抓到的数据和浏览器看到的不一样?

通过JS动态加载渲染后的代码跟我们后台直接请求的源码不一样(通过接口拿到的)

怎样解决JavaScript渲染问题?

  1. 分析Ajax请求 #返回Json格式字符串
  2. selenium/WebDriver
  • input
from selenium import webdriver
driver=webdriver.Chrome()
driver.get('http://www.zhihu.com')
print(driver.page_source)#跟网页elements源码一致,不用担心渲染问题
  • output
<!DOCTYPE html><html xmlns="http://www.w3.org/1999/xhtml" lang="zh" data-hairline="true" data-theme="light"><head><meta charset="utf-8" /><title>知乎 - 有问题上知乎</title><meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1" /><meta name="renderer" content="webkit" /><meta name="force-rendering" content="webkit" /><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1" /><meta name="google-site-verification" content="FTeR0c8arOPKh8c5DYh_9uu98_zJbaWw53J-Sch9MTg" /><title>知乎 - 有问题上知乎</title><meta name="description" content="有问题,上知乎。知乎是中文互联网知名知识分享平台,以「知识连接一切」为愿景,致力于构建一个人人都可以便捷接入的知识分享网络,让人们便捷地与世界分享知识、经验和见解,发现更大的世界。" /><link rel="shortcut icon" type="image/x-icon" href="https://static.zhihu.com/static/favicon.ico" /><link rel="search" type="application/opensearchdescription+xml" href="https://static.zhihu.com/static/search.xml" title="知乎" /><link rel="dns-prefetch" href="//static.zhimg.com" /><link rel="dns-prefetch" href="//pic1.zhimg.com" /><link rel="dns-prefetch" href="//pic2.zhimg.com" /><link rel="dns-prefetch" href="//pic3.zhimg.com" /><link rel="dns-prefetch" href="//pic4.zhimg.com" /><link href="https://static.zhihu.com/heifetz/main.app.da5a5359d85bcfbceffd.css" rel="stylesheet" /><script defer="" crossorigin="anonymous" src="https://unpkg.zhimg.com/@cfe/sentry-script@latest/dist/init.js" data-sentry-config="{&quot;dsn&quot;:&quot;https://65e244586890460588f00f2987137aa8@crash2.zhihu.com/193&quot;,&quot;sampleRate&quot;:0.1,&quot;release&quot;:&quot;243-a985d8a2&quot;,&quot;ignoreErrorNames&quot;:[&quot;NetworkError&quot;,&quot;SecurityError&quot;],&quot;ignoreErrors&quot;:[&quot;origin message&quot;,&quot;Network request failed&quot;,&quot;Loading chunk&quot;,&quot;这个系统不支持该功能。&quot;,&quot;Can't find variable: webkit&quot;,&quot;Can't find variable: $&quot;,&quot;内存不足&quot;,&quot;out of memory&quot;,&quot;DOM Exception 18&quot;,&quot;zfeedback sdk 初始化失败!&quot;,&quot;zfeedback sdk 加载失败!&quot;,&quot;The operation is insecure&quot;,&quot;[object Event]&quot;,&quot;[object FileError]&quot;,&quot;[object DOMError]&quot;,&quot;[object Object]&quot;,&quot;拒绝访问。&quot;,&quot;Maximum call stack size exceeded&quot;,&quot;UploadError&quot;,&quot;无法 fetch&quot;,&quot;draft-js&quot;,&quot;缺少 JavaScript 对象&quot;,&quot;componentWillEnter&quot;,&quot;componentWillLeave&quot;,&quot;componentWillAppear&quot;,&quot;getInlineStyleAt&quot;,&quot;getCharacterList&quot;],&quot;whitelistUrls&quot;:[&quot;static.zhihu.com&quot;]}"></script><script type="text/javascript" charset="utf-8" async="" src="https://static.zhihu.com/heifetz/main.signflow.37b0e299b3a85808a9f9.js"></script></head><body class="EntrySign-body"><div id="root"><div data-zop-usertoken="{}" data-reactroot=""><div class="LoadingBar"></div><div><header role="banner" class="Sticky AppHeader" data-za-module="TopNavBar"><div class="AppHeader-inner"><a href="//www.zhihu.com" aria-label="知乎"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 200 91" class="Icon ZhihuLogo ZhihuLogo--blue Icon--logo" style="height:30px;width:64px" width="64" height="30" aria-hidden="true"><title/><g><path d="M53.29 80.035l7.32.002 2.41 8.24 13.128-8.24h15.477v-67.98H53.29v67.978zm7.79-60.598h22.756v53.22h-8.73l-8.718 5.473-1.587-5.46-3.72-.012v-53.22zM46.818 43.162h-16.35c.545-8.467.687-16.12.687-22.955h15.987s.615-7.05-2.68-6.97H16.807c1.09-4.1 2.46-8.332 4.1-12.708 0 0-7.523 0-10.085 6.74-1.06 2.78-4.128 13.48-9.592 24.41 1.84-.2 7.927-.37 11.512-6.94.66-1.84.785-2.08 1.605-4.54h9.02c0 3.28-.374 20.9-.526 22.95H6.51c-3.67 0-4.863 7.38-4.863 7.38H22.14C20.765 66.11 13.385 79.24 0 89.62c6.403 1.828 12.784-.29 15.937-3.094 0 0 7.182-6.53 11.12-21.64L43.92 85.18s2.473-8.402-.388-12.496c-2.37-2.788-8.768-10.33-11.496-13.064l-4.57 3.627c1.363-4.368 2.183-8.61 2.46-12.71H49.19s-.027-7.38-2.372-7.38zm128.752-.502c6.51-8.013 14.054-18.302 14.054-18.302s-5.827-4.625-8.556-1.27c-1.874 2.548-11.51 15.063-11.51 15.063l6.012 4.51zm-46.903-18.462c-2.814-2.577-8.096.667-8.096.667s12.35 17.2 12.85 17.953l6.08-4.29s-8.02-11.752-10.83-14.33zM199.99 46.5c-6.18 0-40.908.292-40.953.292v-31.56c1.503 0 3.882-.124 7.14-.376 12.773-.753 21.914-1.25 27.427-1.504 0 0 3.817-8.496-.185-10.45-.96-.37-7.24 1.43-7.24 1.43s-51.63 5.153-72.61 5.64c.5 2.756 2.38 5.336 4.93 6.11 4.16 1.087 7.09.53 15.36.277 7.76-.5 13.65-.76 17.66-.76v31.19h-41.71s.88 6.97 7.97 7.14h33.73v22.16c0 4.364-3.498 6.87-7.65 6.6-4.4.034-8.15-.36-13.027-.566.623 1.24 1.977 4.496 6.035 6.824 3.087 1.502 5.054 2.053 8.13 2.053 9.237 0 14.27-5.4 14.027-14.16V53.93h38.235c3.026 0 2.72-7.432 2.72-7.432z" fill-rule="evenodd"/></g></svg></a><nav role="navigation" class="AppHeader-nav"><a class="AppHeader-navItem" href="//www.zhihu.com/" data-za-not-track-link="true">首页</a><a class="AppHeader-navItem" href="//www.zhihu.com/explore" data-za-not-track-link="true">发现</a><a href="//www.zhihu.com/topic" class="AppHeader-navItem" data-za-not-track-link="true">话题</a></nav><div class="SearchBar" role="search" data-za-module="PresetWordItem"><div class="SearchBar-toolWrapper"><form class="SearchBar-tool"><div><div class="Popover"><div class="SearchBar-input Input-wrapper Input-wrapper--grey"><input type="text" maxlength="100" value="" autocomplete="off" role="combobox" aria-expanded="false" aria-autocomplete="list" aria-activedescendant="AutoComplete2--1" id="Popover1-toggle" aria-haspopup="true" aria-owns="Popover1-content" class="Input" placeholder="" /><div class="Input-after"><button aria-label="搜索" type="button" class="Button SearchBar-searchIcon Button--primary"><span style="display:inline-flex;align-items:center">​<svg xmlns="http://www.w3.org/2000/svg" class="Zi Zi--Search" fill="currentColor" viewBox="0 0 24 24" width="18" height="18"><path d="M17.068 15.58a8.377 8.377 0 0 0 1.774-5.159 8.421 8.421 0 1 0-8.42 8.421 8.38 8.38 0 0 0 5.158-1.774l3.879 3.88c.957.573 2.131-.464 1.488-1.49l-3.879-3.878zm-6.647 1.157a6.323 6.323 0 0 1-6.316-6.316 6.323 6.323 0 0 1 6.316-6.316 6.323 6.323 0 0 1 6.316 6.316 6.323 6.323 0 0 1-6.316 6.316z" fill-rule="evenodd"/></svg></span></button></div></div></div></div></form></div></div><div class="AppHeader-userInfo"><div class="AppHeader-profile"><div><button type="button" class="Button AppHeader-login Button--blue">登录</button><button type="button" class="Button Button--primary Button--blue">加入知乎</button></div></div></div></div></header></div><main role="main" class="App-main"><div class="SignFlowHomepage"><div class="SignFlowHomepage-content"><div class="Card SignContainer-content"><div class="SignFlowHeader" style="padding-bottom:5px"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 200 91" class="Icon ZhihuLogo ZhihuLogo--blue Icon--logo" style="height:65.625px;width:140px" width="140" height="65.625" aria-hidden="true"><title/><g><path d="M53.29 80.035l7.32.002 2.41 8.24 13.128-8.24h15.477v-67.98H53.29v67.978zm7.79-60.598h22.756v53.22h-8.73l-8.718 5.473-1.587-5.46-3.72-.012v-53.22zM46.818 43.162h-16.35c.545-8.467.687-16.12.687-22.955h15.987s.615-7.05-2.68-6.97H16.807c1.09-4.1 2.46-8.332 4.1-12.708 0 0-7.523 0-10.085 6.74-1.06 2.78-4.128 13.48-9.592 24.41 1.84-.2 7.927-.37 11.512-6.94.66-1.84.785-2.08 1.605-4.54h9.02c0 3.28-.374 20.9-.526 22.95H6.51c-3.67 0-4.863 7.38-4.863 7.38H22.14C20.765 66.11 13.385 79.24 0 89.62c6.403 1.828 12.784-.29 15.937-3.094 0 0 7.182-6.53 11.12-21.64L43.92 85.18s2.473-8.402-.388-12.496c-2.37-2.788-8.768-10.33-11.496-13.064l-4.57 3.627c1.363-4.368 2.183-8.61 2.46-12.71H49.19s-.027-7.38-2.372-7.38zm128.752-.502c6.51-8.013 14.054-18.302 14.054-18.302s-5.827-4.625-8.556-1.27c-1.874 2.548-11.51 15.063-11.51 15.063l6.012 4.51zm-46.903-18.462c-2.814-2.577-8.096.667-8.096.667s12.35 17.2 12.85 17.953l6.08-4.29s-8.02-11.752-10.83-14.33zM199.99 46.5c-6.18 0-40.908.292-40.953.292v-31.56c1.503 0 3.882-.124 7.14-.376 12.773-.753 21.914-1.25 27.427-1.504 0 0 3.817-8.496-.185-10.45-.96-.37-7.24 1.43-7.24 1.43s-51.63 5.153-72.61 5.64c.5 2.756 2.38 5.336 4.93 6.11 4.16 1.087 7.09.53 15.36.277 7.76-.5 13.65-.76 17.66-.76v31.19h-41.71s.88 6.97 7.97 7.14h33.73v22.16c0 4.364-3.498 6.87-7.65 6.6-4.4.034-8.15-.36-13.027-.566.623 1.24 1.977 4.496 6.035 6.824 3.087 1.502 5.054 2.053 8.13 2.053 9.237 0 14.27-5.4 14.027-14.16V53.93h38.235c3.026 0 2.72-7.432 2.72-7.432z" fill-rule="evenodd"/></g></svg><div class="SignFlowHeader-slogen">注册<!-- -->知乎,发现更大的世界</div></div><div class="SignContainer-inner"><div class="Register"><div><div class="Register-content"><form novalidate=""><div class="SignFlow-account"><div class="SignFlow-supportedCountriesSelectContainer"></div><div class="SignFlowInput SignFlow-accountInputContainer"><div class="SignFlow-accountInput Input-wrapper"><input type="tel" value="" name="phoneNo" class="Input" placeholder="手机号" /></div><div class="SignFlowInput-errorMask SignFlowInput-requiredErrorMask SignFlowInput-errorMask--hidden"></div></div></div><div class="Captcha SignFlow-captchaContainer Register-captcha" style="width:0;height:0;opacity:0;overflow:hidden;margin:0;padding:0;border:0"><div><div class="SignFlowInput"><span class="Captcha-info">请点击图中倒立的文字</span><button type="button" class="Button Captcha-chineseRefreshButton Button--plain"><svg xmlns="http://www.w3.org/2000/svg" class="Zi Zi--Refresh" fill="currentColor" viewBox="0 0 24 24" width="20" height="20"><path d="M20 12.878C20 17.358 16.411 21 12 21s-8-3.643-8-8.122c0-4.044 3.032-7.51 6.954-8.038.034-1.185.012-1.049.012-1.049-.013-.728.461-1.003 1.057-.615l3.311 2.158c.598.39.596 1.026 0 1.418l-3.31 2.181c-.598.393-1.08.12-1.079-.606 0 0 .006-.606-.003-1.157-2.689.51-4.675 2.9-4.675 5.708 0 3.21 2.572 5.822 5.733 5.822 3.163 0 5.733-2.612 5.733-5.822 0-.633.51-1.148 1.134-1.148.625 0 1.133.515 1.133 1.148" fill-rule="evenodd"/></svg></button></div><div class="Captcha-chineseContainer"><img data-tooltip="看不清楚?换一张" class="Captcha-chineseImg" src="https://img-blog.csdnimg.cn/2022010703270825259.jpg" alt="图形验证码" /></div></div></div><div class="Register-SMSInput"><div class="SignFlow SignFlow-smsInputContainer"><div class="SignFlowInput SignFlow-smsInput"><div class="Input-wrapper"><input type="number" value="" name="digits" class="Input" placeholder="输入 6 位短信验证码" /></div><div class="SignFlowInput-errorMask SignFlowInput-requiredErrorMask SignFlowInput-errorMask--hidden"></div></div><button type="button" class="Button CountingDownButton SignFlow-smsInputButton Button--plain">获取短信验证码</button></div><div class="Register-smsBackUp"><span>接收<!-- -->语音<!-- -->验证码</span></div></div><button type="submit" class="Button Register-submitButton Button--primary Button--blue">注册</button></form><div class="Register-footer"><span class="Register-declaration">注册即代表同意<a href="https://www.zhihu.com/terms">《知乎协议》</a><a href="https://www.zhihu.com/terms/privacy">《隐私政策》</a></span><a class="Register-org" href="https://www.zhihu.com/org/signup">注册机构号</a></div></div></div></div><div class="SignContainer-switch">已有帐号?<span>登录</span></div><div class="SignFlowHomepage-qrImage SignFlowHomepage-qrImageHidden"><div></div></div></div></div><button type="button" class="Button SignFlowHomepage-downloadBtn">下载知乎 App</button></div><footer class="SignFlowHomepage-footer"><div class="ZhihuLinks"><a target="_blank" rel="noopener noreferrer" href="https://zhuanlan.zhihu.com">知乎专栏</a><a target="_blank" rel="noopener noreferrer" href="/roundtable">圆桌</a><a target="_blank" rel="noopener noreferrer" href="/explore">发现</a><a target="_blank" rel="noopener noreferrer" href="/app">移动应用</a><a target="_blank" rel="noopener noreferrer" href="/contact">联系我们</a><a target="_blank" rel="noopener noreferrer" href="https://app.mokahr.com/apply/zhihu">来知乎工作</a><a target="_blank" rel="noopener noreferrer" href="/org/signup">注册机构号</a></div><div class="ZhihuRights"><span>© 2018 知乎</span><a target="_blank" rel="noopener noreferrer" href="http://www.miibeian.gov.cn/">京 ICP 证 110745 号</a><span>京公网安备 11010802010035 号<a href="https://zhstatic.zhihu.com/assets/zhihu/publish-license.jpg" target="_blank" rel="noopener noreferrer">出版物经营许可证</a></span></div><div class="ZhihuReports"><a target="_blank" rel="noopener noreferrer" href="https://zhuanlan.zhihu.com/p/28852607">侵权举报</a><a target="_blank" rel="noopener noreferrer" href="http://www.12377.cn">网上有害信息举报专区</a><a target="_blank" rel="noopener noreferrer" href="/jubao">儿童色情信息举报专区</a><span>违法和不良信息举报:010-82716601</span></div><div class="ZhihuIntegrity"><div><img src="https://static.zhihu.com/static/revved/img/index/chengxing_logo@2x.65dc76e8.png" alt="诚信网站示范企业" /><a href="https://credit.szfw.org/CX20170607038331320388.html">诚信网站示范企业</a></div></div></footer></div></main><div class="CornerButtons"><div class="CornerAnimayedFlex CornerAnimayedFlex--hidden"><button data-tooltip="回到顶部" data-tooltip-position="left" data-tooltip-will-hide-on-click="true" aria-label="回到顶部" type="button" class="Button CornerButton Button--plain"><svg xmlns="http://www.w3.org/2000/svg" class="Zi Zi--BackToTop" title="回到顶部" fill="currentColor" viewBox="0 0 24 24" width="24" height="24"><path d="M16.036 19.59a1 1 0 0 1-.997.995H9.032a.996.996 0 0 1-.997-.996v-7.005H5.03c-1.1 0-1.36-.633-.578-1.416L11.33 4.29a1.003 1.003 0 0 1 1.412 0l6.878 6.88c.782.78.523 1.415-.58 1.415h-3.004v7.005z"/></svg></button></div></div></div></div><div id="data" style="display:none" data-useragent="{&quot;os&quot;:{&quot;name&quot;:&quot;Windows&quot;,&quot;version&quot;:&quot;8.1&quot;},&quot;browser&quot;:{&quot;name&quot;:&quot;Chrome&quot;,&quot;version&quot;:&quot;70.0.3538.77&quot;,&quot;major&quot;:&quot;70&quot;}}"></div><script src="https://static.zhihu.com/heifetz/vendor.7c9abc3e398528f8abf1.js"></script><script src="https://static.zhihu.com/heifetz/main.app.dd9f8ea93b40f742ea31.js"></script><script></script><div><div style="display: none;">想来知乎工作?请发送邮件到 jobs@zhihu.com</div></div><div><div><span></span></div></div><script src="//unpkg.zhimg.com/za-js-sdk@latest/dist/zap.js"></script><script src="//zhstatic.zhihu.com/assets/zfeedback/3.0.16/zfeedback.js"></script></body></html>

  1. splash
  2. PyV8、Ghost.py

如何保存数据?

  1. 文本:纯文本,json,xml……
  2. 关系型数据库:mysql,oracle,sql server等具有结构化表结构形式存储
  3. 非关系型数据库:MongoDB,Redis等key-value形式存储
  4. 二进制文件;如图片,视频,音频等直接保存或特定格式即可
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值