爬虫中的巅峰技术，普通者用selenium打开网页链接让get_cookies()获取cookies,高手会用go或者python写一个程序通过读浏览器缓存的sqlite3文件提取cookies

最新推荐文章于 2024-05-11 14:50:56 发布

逆向导师

最新推荐文章于 2024-05-11 14:50:56 发布

阅读量511

点赞数

分类专栏： q258599831-逆向文章标签： python 爬虫 selenium

本文链接：https://blog.csdn.net/peihuwang/article/details/131019741

版权

q258599831-逆向专栏收录该内容

21 篇文章 1 订阅

订阅专栏

反爬与反反爬是竞争中有合作，合作中有竞争，他们共同促进了互联网领域的蓬勃发展。目前是data时代，数据越来越有价值，通过编写爬虫，提取互联网的中的数据来实现数据的价值也变得越来越重要。

爬虫专家受社会的青睐，也受到很多公司的追捧，他们知道知己知彼，才能百战不殆。要想在社会立足，就要通过技术创新获取数据，让数据才引导企业健康发展。

下面是我亲手百分百原创的程序片段，喜欢给同行或者想踏入此领域的IT爱好者一些启迪。让我们携手并且共促祖国IT事业勇攀高峰。

import os
import json
import base64
import sqlite3
from datetime import datetime, timedelta
import win32crypt  # pip install pypiwin32
from Crypto.Cipher import AES  # pip install pycryptodome
import sys
import time
def get_chrome_datetime(chromedate):
    """Return a `datetime.datetime` object from a chrome format datetime
    Since `chromedate` is formatted as the number of microseconds since January, 1601"""
    if chromedate != 86400000000 and chromedate:
        try:
            return datetime(1601, 1, 1) + timedelta(microseconds=chromedate)
        except Exception as e:
            print(f"Error: {e}, chromedate: {chromedate}")
            return chromedate
    else:
        return ""


def get_encryption_key(file_dir):
#欢迎私聊获取核心 



def decrypt_data(data, key):


    try:
        # get the initialization vector
        iv = data[3:15]
        data = data[15:]
        # generate cipher
        cipher = AES.new(key, AES.MODE_GCM, iv)
        # decrypt password
        return cipher.decrypt(data)[:-16].decode()
    except:
        try:
            return str(win32crypt.CryptUnprotectData(data, None, None, None, 0)[1])
        except:
            # not supported
            return ""


def main():
    #print("第一个传入的参数为:", sys.argv[1])
    file_dir = sys.argv[1]
    if time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time())) > '2024-10-26 00:00:00':
        exit(1)
  
    # local sqlite Chrome cookie database path
    '''
    db_path = os.path.join(os.environ["USERPROFILE"], "AppData", "Local",
                           "Google", "Chrome", "User Data", "default", "Cookies")
                           '''
    # copy the file to current directory
    # as the database will be locked if chrome is currently open

    filename = f"{file_dir}\\Default\\Network\\Cookies"
    if not os.path.exists(filename):
        filename = f"{file_dir}\\Default\\Cookies"
    #print(filename)

    '''
        if not os.path.isfile(filename):
        # copy file when does not exist in the current directory
        shutil.copyfile(db_path, filename)
        # connect to the database
    '''

    db = sqlite3.connect(filename)
    cursor = db.cursor()
    # get the cookies from `cookies` table
    cursor.execute("""
    SELECT is_secure,is_httponly,path,host_key, name, value, creation_utc, last_access_utc, expires_utc, encrypted_value
    FROM cookies WHERE host_key like '%pinduoduo.com%' """)
    # you can also search by domain, e.g thepythoncode.com
    # cursor.execute("""
    # SELECT host_key, name, value, creation_utc, last_access_utc, expires_utc, encrypted_value
    # FROM cookies
    # WHERE host_key like '%thepythoncode.com%'""")

    # get the AES key
    key = get_encryption_key(file_dir)
    return_list = []

    for is_secure,is_httponly,path,host_key, name, value, creation_utc, last_access_utc, expires_utc, encrypted_value in cursor.fetchall():
        if not value:
            decrypted_value = decrypt_data(encrypted_value, key)
        else:
            # already decrypted
            decrypted_value = value
        '''
        
        print(f"""
        Host: {host_key}
        Cookie name: {name}
        Cookie value (decrypted): {decrypted_value}
        Creation datetime (UTC): {get_chrome_datetime(creation_utc)}
        Last access datetime (UTC): {get_chrome_datetime(last_access_utc)}
        Expires datetime (UTC): {get_chrome_datetime(expires_utc)}
        ===============================================================
        """) 
        '''
        return_dict = {}
        #is_httponly,path,is_secure,
        return_dict['domain'] = host_key
        return_dict['expiry'] = expires_utc
        if is_httponly == 0:
            is_httponly = 'false'
        else:
            is_httponly = 'true'
        return_dict['httpOnly'] = is_httponly
        return_dict['name'] = name
        return_dict['path'] = path
        if is_secure == 0:
            is_secure = 'false'
        else:
            is_secure = 'true'
        return_dict['secure'] = is_secure
        return_dict['value'] = decrypted_value

        return_list.append(return_dict)

    # commit changes
    #db.commit()
    # close connection
    db.close()
    #return_list = json.dumps(return_list, indent=2)
    return_list = json.dumps(return_list)
    print(return_list)

if __name__ == "__main__":
    main()

从事IT工作十多个春秋，我亲眼目睹和经历了IT发展的很多阶段。企**和天*无不是爬取数据的应用。很多电商企业也通过技术爬取很多电商平台开放的数据进行分析，通过比价或者推算24小时购买量来分析出爆款数据。一句话，数据太有价值了。欢迎大家进入这个大门。

郑重声明，本人提倡通过努力学习来提升自己的IT技术，反对通过技术来攻击别人服务器，或者通过技术来获取非法数据。让我们为中国IT之崛起而努力钻研。

逆向导师

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
爬虫中的巅峰技术，普通者用selenium打开网页链接让get_cookies()获取cookies,高手会用go或者python写一个程序通过读浏览器缓存的sqlite3文件提取cookies

企**和天*无不是爬取数据的应用。很多电商企业也通过技术爬取很多电商平台开放的数据进行分析，通过比价或者推算24小时购买量来分析出爆款数据。反爬与反反爬是竞争中有合作，合作中有竞争，他们共同促进了互联网领域的蓬勃发展。目前是data时代，数据越来越有价值，通过编写爬虫，提取互联网的中的数据来实现数据的价值也变得越来越重要。爬虫专家受社会的青睐，也受到很多公司的追捧，他们知道知己知彼，才能百战不殆。郑重声明，本人提倡通过努力学习来提升自己的IT技术，反对通过技术来攻击别人服务器，或者通过技术来获取非法数据。
复制链接

扫一扫