User Agent 库:.md

User Agent 库:

封装方法:

import random
def get_random_agent():
    agent_list = [
        "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
        "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Acoo Browser; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)",
        "Mozilla/4.0 (compatible; MSIE 7.0; AOL 9.5; AOLBuild 4337.35; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
        "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)",
        "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 2.0.50727; Media Center PC 6.0)",
        "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 1.0.3705; .NET CLR 1.1.4322)",
        "Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.04506.30)",
        "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN) AppleWebKit/523.15 (KHTML, like Gecko, Safari/419.3) Arora/0.3 (Change: 287 c9dfb30)",
        "Mozilla/5.0 (X11; U; Linux; en-US) AppleWebKit/527+ (KHTML, like Gecko, Safari/419.3) Arora/0.6",
        "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2pre) Gecko/20070215 K-Ninja/2.1.1",
        "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9) Gecko/20080705 Firefox/3.0 Kapiko/3.0",
        "Mozilla/5.0 (X11; Linux i686; U;) Gecko/20070322 Kazehakase/0.4.5",
        "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.8) Gecko Fedora/1.9.0.8-1.fc10 Kazehakase/0.5.6",
        "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11",
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.20 (KHTML, like Gecko) Chrome/19.0.1036.7 Safari/535.20",
        "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; fr) Presto/2.9.168 Version/11.52",
    ]
    agent = random.choice(agent_list)
    return agent

if __name__ == '__main__':
    agent = get_random_agent()
    print(agent)
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
一个简单的User-Agent,可以从一条User-Agent字符串获取该用户的相关信息。 支持平台: Node.JS / (Windows) Classical ASP / (Windows) WScript / (Windows) CScript / Internet Explorer 6 / Google Chrome / Mozilla Firefox / Apple Safari 安装方式:npm: $ npm install useragent.js bower: $ bower install useragent.js支持检测列表: Tested Browsers: 114Browser / 115Browser / 2345Chrome / 2345Explorer / 360 Aphone Browser / 360 Explorer / Abolimba / Acoo Browser / Alienforce / Amaya / Amazon Silk / America Online Browser / Amiga / Android Webkit / AOL / Arora / Atomic Web Browser / Avant Browser / Baidu Browser / Barca Proxxxx / BarcaC3 / Beamrise / Beonex / BlackBerry / Blackbird / BlackHawk / Blazer / Bolt / BonEchob2 / BrowseX / Browzar / Bunjalloo / Camino / Charon / Cheshire / Chimera / Chrome Mobile / ChromePlus / Chromium / Classilla / Coast / Columbus / CometBird / Comodo Dragon / Conkeror / CoolNovo / CoRom / Crazy Browser / curl / Cyberdog / Deepnet Explorer / Demeter / DeskBrowse / Dillo / DoCoMo / DocZilla / Dooble / Doris / Dorothy / Edbrowse / Element Browser / Elinks / Enigma / Epic / Epiphany / Escape / Fennec / Firebird / Firefox / Fireweb Navigator / Flock / Fluid / Galaxy / Galaxy Nexus / Galeon / GlobalMojo / GNU IceCat / GO Browser / Google Chrome / Google Chrome Frame / Google CriOS / GoSurf / GranParadiso / GreenBrowser / Gtk WebCore / Hana / HotJava / Hv3 Build / IBM WebExplorer / IBrowse / iCab / Iceape / IceBrowser v6 / IceWeasel / IEMobile / iNet Browser / Internet Explorer / Internet Explorer Spartan / InternetSurfboard / iRider / Iris / JuziBrowser / Kapiko / Kazehakase / Kirix Strata / KKman / K-Meleon / KMLite / K-Ninja / Konqueror / LBrowser / LeechCraft / Liebao Browser / Liebaofast / Links / Lobo / lolifox / Lorentz / Lunascape / Lynx / Madfox / Maemo Browser / Maple Browser / Maxthon / Maxthon / MIB / Midori / Midori / Minefieldb4pre / Minimo / MiuiBrowser / Mobile Safari / Mosaic / Mozilla Developer Preview / MQQBrowser / Multi-Browser XP / MultiZilla / MxNitro / myibrowalpha2 /
这段代码是用来爬取新闻网站文章并保存到本地的,但是有一些问题需要修改。 1. 需要将导入 requests 和 bs4 的代码分别写在两行。 2. 在 url 变量中,链接中有空格,需要去除。 3. 在 range 函数中,第二个参数应该是 4580786,因为 range 函数不包括最后一个数。 4. 在写文件时,文件名应该为 f"{page_count + 1}.txt",而不是 f"{i}.txt",因为每一页可能会保存多篇文章。 5. 在爬取过程中,应该加上异常处理,防止程序因为网络连接问题而中断。 修改后的代码如下: ``` import requests from bs4 import BeautifulSoup import os import time headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.3 Safari/605.1.15'} page_count = 0 for page_num in range(1,10000): for i in range(3579989,4580786): url = f"https://www.antaranews.com/berita/{i}/sekjen-puji-indonesia-selenggarakan-ktt-ke-42-asean-dengan-baik?utm_source=antaranews&utm_medium=desktop&utm_campaign=menu_news" try: res = requests.get(url, headers=headers) soup = BeautifulSoup(res.text, "html.parser") div = soup.find("div", {"class": "col-md-8"}) if not div: continue text = div.text file = f"{page_count + 1}.txt" with open(file, "w", encoding="utf-8") as f: f.write(text) print(f"{i} saved successfully.") page_count += 1 if page_count >= 500: break time.sleep(15) except Exception as e: print(f"Error occurred: {e}") continue if page_count >= 500: break print("All pages saved successfully.") ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值