目标地址:
https://www.jsyks.com/kmy-mnks
例如:
url='https://www.jsyks.com/kmy-mnks' # kmy-mnks 科目一-模拟考试
url='https://www.jsyks.com/kms-mnks' # kms-mnks 科目四-模拟考试
一、获取资源
先从本题分析里面得到解析答案【通过div.Exam ul li里面列表的c值得到href】
查看本题解析
https://tiba.jsyks.com/Post/c6f5b.htm
由此:
http://tiba.jsyks.com/Post/"+$(a).attr("c")+".htm
取得c值即可知道解析答案的url地址。
二、发送请求
url='https://www.jsyks.com/kms-mnks'
driver = webdriver.Firefox()
driver.get(url)
三、数据解析
pip install selenium
更新定位方法:
find_element_by_css_selector() 是 Selenium WebDriver 提供的一种方法,用于通过 css 选择器定位页面元素。在最新的 Selenium 版本中,这个方法已经被弃用,并被新的方法所替代。在 Selenium 4 之后,推荐使用 find_element() 方法配合 By 类。
from selenium.webdriver.common.by import By
element = driver.find_element(By.CSS_SELECTOR, "your_css_selector")
例如:使用 find_element() 方法和 By.CSS_SELECTOR 常量来定位页面上的一个 css 选择器是"button.submit" 的元素,并对其执行点击操作。
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get("http://www.example.com")
# 找到页面上的一个元素,比如一个按钮
element = driver.find_element(By.CSS_SELECTOR, "button.submit")
# 对找到的元素进行操作,比如点击
element.click()
# 关闭浏览器
driver.quit()
四、实现代码
科目四模拟考:
import requests
import parsel
from selenium import webdriver
from selenium.webdriver.common.by import By
url='https://www.jsyks.com/kms-mnks'
driver = webdriver.Firefox()
# 没啥大问题,运行正常,只是会出现一句 The version of firefox cannot be detected. Trying with latest driver version
driver.get(url)
driver.maximize_window()# 最大化浏览器
#lis = driver.find_element(By.CSS_SELECTOR, '.Content li')
#lis = driver.find_elements_by_css_selector('div.Exam ul.Content li')
#lis = driver.find_elements(By.CSS_SELECTOR, '.Content li').text
lis = driver.find_elements(By.CSS_SELECTOR, '.Content li')
#answer_url = [f'http://tiba.jsyks.com/Post/{li.get_attribute("c")}.htm' for li in lis]
#print(lis)
#for li in lis:
# path= f'http://tiba.jsyks.com/Post/{li.get_attribute("c")}.htm'
# answer_url = answer_url.append(path)
def get_all_answer(answer_url_list):
answer_list = []
for answer_url in answer_url_list:
html_data = requests.get(url=answer_url).text
#print(html_data)
selcetor = parsel.Selector(html_data)
question = selcetor.css('#question h1 strong a::text').get()
answer = selcetor.css('#question h1 u::text').get()
if answer=='对':
answer='正确'
elif answer=='错':
answer='错误'
else: #elif len(answer)>2: # 多选
answer=answer
dict={'问题':question, '答案':answer}
#print(dict)
answer_list.append(dict)
return answer_list
#answer_url_list=[f'http://tiba.jsyks.com/Post/{li.get_attribute("c")}.htm' for li in lis]
answer_url=[f"http://tiba.jsyks.com/Post/{li.get_attribute('c')}.htm" for li in lis]
answer_list =get_all_answer(answer_url)
page=1
for li, answer in zip(lis,answer_list):
elements = li.find_elements(By.CSS_SELECTOR, 'b') # 查找<b>
num=1
for i in elements:
choose = i.text
if len(choose)>2:
choose=choose[:1] #取最前面的那个A B C D
if choose in answer['答案']:
driver.find_element(By.CSS_SELECTOR, f'#LI{page} b:nth-child({num+2})').click()
num+=1
#print(choose)
page+=1
driver.find_element(By.CLASS_NAME, 'btn_JJ').click()
print('----------- finished -----------')
运行结果
科目四考试为满分
科目一考试为99