有些地区会出现百度网盘打开资源被屏蔽无法下载保存。我这做个第三方保存渠道。
在我服务器中打开链接,将登录码返回页面,扫码登录后,自动将资源保存。
还有百度文库,文章内容提取。以上均使用selenium。
HTML:
<div id="wangpan" class="white_content"> <a href="javascript:void(0)" οnclick="document.getElementById('wangpan').style.display='none';document.getElementById('fade').style.display='none'">点这里关闭本窗口</a> <h3>请输入百度网盘网址以及提取码,期间会弹出登录二维码,请用百度app扫码登录(1分钟内有效)。(本网站不记录任何用户信息。)</h3> <input type="text" name="url" id = 'wpurl' style="width:500px" placeholder="请输入网址"/></br> <input type="text" name="password" id = 'wppassword' placeholder="请输入提取码"/></br> <input type="button" value="提交" id="wpsub"/> <div></div> <img id="baidu" value="custom" > <div></div> <button type="button" id="login" style="float: left">扫码登录后点击</button> <div id="code"></div> </div> <div id="wenku" class="white_content"> <a href="javascript:void(0)" οnclick="document.getElementById('wenku').style.display='none';document.getElementById('fade').style.display='none'">点这里关闭本窗口</a> <h3>现只支持文本格式,表格,pdf暂不支持。</h3> <input type="text" name="url" id = 'wkurl' style="width:500px" placeholder="请输入网址"/></br> <input type="button" value="提交" id="wksub"/> <div> <textarea id = 'wktittle'></textarea> </div> <textarea id = 'down' style="margin: 0px; width: 100%; height: 100%; "> </textarea>
jQuery:
$('#wpsub').click(function () { var wpurl = $("#wpurl").val(); var wppass = $("#wppassword").val(); $.ajax({ url: 'baiduwangpan', type: 'POST', data: { wpurl:wpurl, wppass :wppass }, //headers:{"X-CSRFToken":$.cookie("csrftoken")}, success: function (data) { $('#baidu').attr('src', data) } } ) } ); $('#login').click(function () { $.ajax({ url: 'login', type: 'POST', data: { wpurl:'1', }, //headers:{"X-CSRFToken":$.cookie("csrftoken")}, success: function (data) { $('#code').val(data) } } ) } ); $('#wksub').click(function () { var wkurl = $("#wkurl").val(); $.ajax({ url: 'baiduwenku', type: 'POST', data: { wkurl:wkurl, }, //headers:{"X-CSRFToken":$.cookie("csrftoken")}, success: function (data) { $('#wktittle').val(data[0]); $('#down').val(data[1]); } } ) } );
views:
def baiduwangpan(request): url = request.POST.get("wpurl") print(url) try: password = request.POST.get("wppass") print(password) driver.get(url) driver.find_element_by_xpath("//input[@class='QKKaIE LxgeIt']").send_keys(password) driver.find_element_by_xpath("//span[@class='g-button-right']").click() time.sleep(5) driver.current_window_handle time.sleep(5) driver.find_element_by_xpath("//em[@class='icon icon-save-disk']").click() time.sleep(5) driver.current_window_handle img = driver.find_element_by_xpath("//img[@class='tang-pass-qrcode-img']").get_attribute('src') except: img = '/static/wp.JPG' return JsonResponse(img, json_dumps_params={'ensure_ascii': False}, safe=False) def login(request): try: driver.current_window_handle # js = 'document.getElementBycLass("g-button g-button-blue").click()' # element = driver.find_element_by_xpath("//span[@class='g-button-right']") # driver.execute_script(js) driver.find_element_by_xpath("//a[@class ='g-button g-button-blue-large']").click() try: driver.close() except: pass print('first') code = 'success' except: try: driver.find_element_by_xpath("//span[@class='zbyDdwb']").click() time.sleep(3) driver.find_element_by_xpath("//span[@class='g-button-right']").click() time.sleep(3) driver.find_element_by_xpath("//a[@class ='g-button g-button-blue-large']").click() try: driver.close() except: pass print('second') code = 'success' except: print('fail') code = 'success' pass print('over') return JsonResponse(code, json_dumps_params={'ensure_ascii': False}, safe=False) def baiduwenku(request): driver = webdriver.Chrome(options=driverOptions) url = request.POST.get("wkurl") print(url) driver.get(url) page = driver.find_element_by_xpath("//div[@class='doc-summary-wrap']/span[5]") page = page.text.replace('页', '') PAGE = driver.page_source js = "var q=document.documentElement.scrollTop=4000" driver.execute_script(js) time.sleep(2) driver.find_element_by_xpath("//span[@class='read-all']").click() lists = '' pageSource = driver.page_source for i in range(0, int(page)): js = "var q=document.documentElement.scrollTop=" + str(1375 * i) driver.execute_script(js) time.sleep(1) pageSource = driver.page_source pattern1 = re.compile(r'<p class="reader-word-layer reader-word-s' + str(i + 1) + '-.*?>(.*?)</p>') content = pattern1.findall(pageSource) for a in content: lists += a pattern = re.compile(r'<h3 class="doc-title">(.+?)</h3>') tittle = pattern.findall(PAGE) print(tittle) Tittle = str( tittle[0].replace('+', '').replace(' ', '').replace('+', '').replace('/', '').replace('?', '').replace('?', '').replace( '%', '').replace('#', '').replace('&', '').replace('=', '')) Lists = [tittle, lists.replace(' ', '')] print(lists) try: driver.close() except: pass return JsonResponse(Lists, json_dumps_params={'ensure_ascii': False}, safe=False)
urls:
path('baiduwangpan', views.baiduwangpan, name='baiduwangpan'), path('baiduwenku', views.baiduwenku, name='baiduwenku'), path('login', views.login, name='login'),
网页大体内容就都搞完了,现在前端的页面设计是真的难搞,在慢慢摸索当中。