python中selenium中使用ajax_需要使用python(selenium)抓取通过ajax加载的表

1586010002-jmsa.png

I have a page that has a table (table id= "ctl00_ContentPlaceHolder_ctl00_ctl00_GV" class="GridListings" )i need to scrap.

I usually use BeautifulSoup & urllib for it,but in this case the problem is that the table takes some time to load ,so it isnt captured when i try to fetch it using BS.

I cannot use PyQt4,drysracpe or windmill because of some installation issues,so the only possible way is to use Selenium/PhantomJS

I tried the following,still no success:

from selenium.webdriver.common.by import By

from selenium.webdriver.support.wait import WebDriverWait

from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.PhantomJS()

driver.get(url)

wait = WebDriverWait(driver, 10)

table = wait.until(EC.presence_of_element_located(By.CSS_SELECTOR, 'table#ctl00_ContentPlaceHolder_ctl00_ctl00_GV'))

The above code doesnt give me the desired contents of the table.

How do i go about achieveing this???

解决方案

You can get the data using requests and bs4,, with almost if not all asp sites there are a few post params that always need to be provided like __EVENTTARGET, __EVENTVALIDATION etc.. :

from bs4 import BeautifulSoup

import requests

data = {"__EVENTTARGET": "ctl00$ContentPlaceHolder$ctl00$ctl00$RadAjaxPanel_GV",

"__EVENTARGUMENT": "LISTINGS;0",

"ctl00$ContentPlaceHolder$ctl00$ctl00$ctl00$hdnProductID": "139",

"ctl00$ContentPlaceHolder$ctl00$ctl00$hdnProductID": "139",

"ctl00$ContentPlaceHolder$ctl00$ctl00$drpSortField": "Listing Number",

"ctl00$ContentPlaceHolder$ctl00$ctl00$drpSortDirection": "A-Z, Low-High",

"__ASYNCPOST": "true"}

And for the actual post, we need to add a few more values to out post data:

post = "https://seahawks.strmarketplace.com/Charter-Seat-Licenses/Charter-Seat-Licenses.aspx"

with requests.Session() as s:

s.headers.update({"User-Agent":"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0"})

soup = BeautifulSoup(s.get(post).content)

data["__VIEWSTATEGENERATOR"] = soup.select_one("#__VIEWSTATEGENERATOR")["value"]

data["__EVENTVALIDATION"] = soup.select_one("#__EVENTVALIDATION")["value"]

data["__VIEWSTATE"] = soup.select_one("#__VIEWSTATE")["value"]

r = s.post(post, data=data)

soup2 = BeautifulSoup(r.content)

table = soup2.select_one("div.GridListings")

print(table)

You will see the table printed when you run the code.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值