爬虫中的巅峰技术,普通者用selenium打开网页链接让get_cookies()获取cookies,高手会用go或者python写一个程序通过读浏览器缓存的sqlite3文件提取cookies

反爬与反反爬是竞争中有合作,合作中有竞争,他们共同促进了互联网领域的蓬勃发展。目前是data时代,数据越来越有价值,通过编写爬虫,提取互联网的中的数据来实现数据的价值也变得越来越重要。

爬虫专家受社会的青睐,也受到很多公司的追捧,他们知道知己知彼,才能百战不殆。要想在社会立足,就要通过技术创新获取数据,让数据才引导企业健康发展。

下面是我亲手百分百原创的程序片段,喜欢给同行或者想踏入此领域的IT爱好者一些启迪。让我们携手并且共促祖国IT事业勇攀高峰。

import os
import json
import base64
import sqlite3
from datetime import datetime, timedelta
import win32crypt  # pip install pypiwin32
from Crypto.Cipher import AES  # pip install pycryptodome
import sys
import time
def get_chrome_datetime(chromedate):
    """Return a `datetime.datetime` object from a chrome format datetime
    Since `chromedate` is formatted as the number of microseconds since January, 1601"""
    if chromedate != 86400000000 and chromedate:
        try:
            return datetime(1601, 1, 1) + timedelta(microseconds=chromedate)
        except Exception as e:
            print(f"Error: {e}, chromedate: {chromedate}")
            return chromedate
    else:
        return ""


def get_encryption_key(file_dir):
#欢迎私聊获取核心 



def decrypt_data(data, key):


    try:
        # get the initialization vector
        iv = data[3:15]
        data = data[15:]
        # generate cipher
        cipher = AES.new(key, AES.MODE_GCM, iv)
        # decrypt password
        return cipher.decrypt(data)[:-16].decode()
    except:
        try:
            return str(win32crypt.CryptUnprotectData(data, None, None, None, 0)[1])
        except:
            # not supported
            return ""


def main():
    #print("第一个传入的参数为:", sys.argv[1])
    file_dir = sys.argv[1]
    if time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time())) > '2024-10-26 00:00:00':
        exit(1)
  
    # local sqlite Chrome cookie database path
    '''
    db_path = os.path.join(os.environ["USERPROFILE"], "AppData", "Local",
                           "Google", "Chrome", "User Data", "default", "Cookies")
                           '''
    # copy the file to current directory
    # as the database will be locked if chrome is currently open

    filename = f"{file_dir}\\Default\\Network\\Cookies"
    if not os.path.exists(filename):
        filename = f"{file_dir}\\Default\\Cookies"
    #print(filename)

    '''
        if not os.path.isfile(filename):
        # copy file when does not exist in the current directory
        shutil.copyfile(db_path, filename)
        # connect to the database
    '''

    db = sqlite3.connect(filename)
    cursor = db.cursor()
    # get the cookies from `cookies` table
    cursor.execute("""
    SELECT is_secure,is_httponly,path,host_key, name, value, creation_utc, last_access_utc, expires_utc, encrypted_value
    FROM cookies WHERE host_key like '%pinduoduo.com%' """)
    # you can also search by domain, e.g thepythoncode.com
    # cursor.execute("""
    # SELECT host_key, name, value, creation_utc, last_access_utc, expires_utc, encrypted_value
    # FROM cookies
    # WHERE host_key like '%thepythoncode.com%'""")

    # get the AES key
    key = get_encryption_key(file_dir)
    return_list = []

    for is_secure,is_httponly,path,host_key, name, value, creation_utc, last_access_utc, expires_utc, encrypted_value in cursor.fetchall():
        if not value:
            decrypted_value = decrypt_data(encrypted_value, key)
        else:
            # already decrypted
            decrypted_value = value
        '''
        
        print(f"""
        Host: {host_key}
        Cookie name: {name}
        Cookie value (decrypted): {decrypted_value}
        Creation datetime (UTC): {get_chrome_datetime(creation_utc)}
        Last access datetime (UTC): {get_chrome_datetime(last_access_utc)}
        Expires datetime (UTC): {get_chrome_datetime(expires_utc)}
        ===============================================================
        """) 
        '''
        return_dict = {}
        #is_httponly,path,is_secure,
        return_dict['domain'] = host_key
        return_dict['expiry'] = expires_utc
        if is_httponly == 0:
            is_httponly = 'false'
        else:
            is_httponly = 'true'
        return_dict['httpOnly'] = is_httponly
        return_dict['name'] = name
        return_dict['path'] = path
        if is_secure == 0:
            is_secure = 'false'
        else:
            is_secure = 'true'
        return_dict['secure'] = is_secure
        return_dict['value'] = decrypted_value

        return_list.append(return_dict)

    # commit changes
    #db.commit()
    # close connection
    db.close()
    #return_list = json.dumps(return_list, indent=2)
    return_list = json.dumps(return_list)
    print(return_list)

if __name__ == "__main__":
    main()

从事IT工作十多个春秋,我亲眼目睹和经历了IT发展的很多阶段。企**和天*无不是爬取数据的应用。很多电商企业也通过技术爬取很多电商平台开放的数据进行分析,通过比价或者推算24小时购买量来分析出爆款数据。一句话,数据太有价值了。欢迎大家进入这个大门。

郑重声明,本人提倡通过努力学习来提升自己的IT技术,反对通过技术来攻击别人服务器,或者通过技术来获取非法数据。让我们为中国IT之崛起而努力钻研。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

逆向导师

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值