这是一段令人不快的代码:
data = requests.get(searchURL, auth=HTTPBasicAuth(config.flxusername, config.flxpassword), verify=False)
feed_data = data.content
d = feedparser.parse(feed_data)
tickets=[]
for ticketNum in d['entries'] :
tickets.append(ticketNum['title'])
s = requests.Session()
s.get(ticketsBaseUrl, auth=HTTPBasicAuth(config.flxusername, config.flxpassword), verify=False)
for ticket in tickets :
ticket_page = s.get(ticketsBaseUrl+ticket, auth=HTTPBasicAuth(config.flxusername, config.flxpassword), verify=False )
if ticket_page.status_code == 404 :
print('ticket %s data 404, skipping' %ticket)
continue
现在,这段代码本身会导致404响应的预期3跳。
但是,当我添加另一个时:
data = requests.get(searchURL, auth=HTTPBasicAuth(config.flxusername,
config.flxpassword), verify=False)
feed_data = data.content
d = feedparser.parse(feed_data)
tickets=[]
for ticketNum in d['entries'] :
tickets.append(ticketNum['title'])
s = requests.Session()
s.get(ticketsBaseUrl, auth=HTTPBasicAuth(config.flxusername, config.flxpassword), verify=False)
for ticket in tickets :
ticket_page = s.get(ticketsBaseUrl+ticket, auth=HTTPBasicAuth(config.flxusername, config.flxpassword), verify=False )
if ticket_page.status_code == 404 :
print('ticket %s data 404, skipping' %ticket)
continue
else :
etree = ET.fromstring(ticket_page.content)
print(etree)
最后的404页内容将传递给etree,脚本出错。
当我只做一个其他:打印(票据页面。状态代码)时,它会打印3条错误消息,其余的则打印200条。当我放入etree片段时,它只开始尝试解析最后的404。真让人发狂。
我这里缺什么?
尝试了另一种选择:
data = requests.get(searchURL, auth=HTTPBasicAuth(config.flxusername, config.flxpassword), verify=False)
feed_data = data.content
d = feedparser.parse(feed_data)
tickets=[]
for ticketNum in d['entries'] :
tickets.append(ticketNum['title'])
s = requests.Session()
s.get(ticketsBaseUrl, auth=HTTPBasicAuth(config.flxusername, config.flxpassword), verify=False)
for ticket in tickets :
ticket_page = s.get(ticketsBaseUrl+ticket, auth=HTTPBasicAuth(config.flxusername, config.flxpassword), verify=False )
if ticket_page.status_code == 404 :
print('ticket %s data 404, skipping' %ticket)
continue
etree = ET.fromstring(ticket_page.content)
这也不会跳过最后的404。
测试了较小的代码部分:
if ticket_page.status_code == 404 :
print(str(ticket_page.status_code) + ' ' + ticket)
continue
else :
print(ET.fromstring(ticket_page.content))
失败;试图从列表中的最后404开始。
if ticket_page.status_code == 404 :
print(str(ticket_page.status_code) + ' ' + ticket)
continue
else :
print('continued')
工作,打印3400的,其他的都继续打印。(这在技术上是不正确的;它实际上处理了所有其他事情)
尝试相反的方法:
if ticket_page.status_code == 200:
print(ET.fromstring(ticket_page.content))
else :
print(str(ticket_page.status_code) + ' ' + ticket)
continue
if ticket_page.status_code != 200:
print(str(ticket_page.status_code) + ' ' + ticket)
continue
else :
print(ET.fromstring(ticket_page.content))
if ticket_page.status_code != 200:
print(str(ticket_page.status_code) + ' ' + ticket)
continue
print(ET.fromstring(ticket_page.content))
同样的结果。最终404仍失败
即使
for ticket in tickets :
ticket_page = s.get(ticketsBaseUrl+ticket, auth=HTTPBasicAuth(config.flxusername, config.flxpassword), verify=False )
if ticket_page.status_code != 200:
tickets.pop()
在列表中留下404。
这是引发分析错误的XML:
b'<?xml version="1.0" standalone="yes"?>\n\n404Not FoundThe server has not found anything matching the request URI: Ticket not found\n\n'
最新测试:
if 'statusCode' in tree_root.decode() :
print(ticket)
continue
这给了我3张预期的票。
if 'statusCode' in tree_root.decode() :
print(ticket)
continue
etree = ET.fromstring(ticket_page.content.decode())
print(etree)
这在第3张404票上失败了。加上一个延迟,认为这是因为在404决赛之前有200吨,并没有改变结果。