如何在 C++ 中调用 python 解析器来执行 python 代码(六)?

今天轮到讨论安全问题了。 python 代码中包含有害内容该怎么办?常用技术是沙箱(Sandboxing)。本文从一些基础设施讲起。

目录

基础设施:seccomp-bpf

seccomp is a computer security facility in the Linux kernel. seccomp allows a process to make a one-way transition into a “secure” state where it cannot make any system calls except exit, sigreturn, read and write to already-open file descriptors.

Seccomp BPF 全称 SECure COMPuting with filters,它产生的背景是:操作系统给应用层暴露了数百个系统调用接口,但是大部分应用程序只需要访问其中一个子集。Seccomp BPF 提供了一个过滤器接口,用于描述允许应用程序使用哪些系统调用接口。

prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, prog);

其中,prog 指向一个 struct sock_fprog,里面定义了过滤器。考虑到 Berkeley Packet Filter (BPF) 已经在 socket 过滤领域使用多年,拥有非常强大的描述能力,接口过滤器也使用了 BPF 格式(kernel design choice)。

BPF 比较有意思,它有一套自己的指令集,用于编写 FILTER 程序,举个例子(来自这里),下面这段程序禁止execve系统调用:

#include <stdio.h>
#include <sys/prctl.h>
#include <linux/seccomp.h>
#include <linux/filter.h>
#include <stdlib.h>
#include <unistd.h>
int main()
{
struct sock_filter filter[] = {
    BPF_STMT(BPF_LD+BPF_W+BPF_ABS,0), //将帧的偏移0处,取4个字节数据,也就是系统调用号的值载入累加器
    BPF_JUMP(BPF_JMP+BPF_JEQ,59,0,1), //判断系统调用号是否为59(execve),是则顺序执行,否则跳过下一条
    BPF_STMT(BPF_RET+BPF_K,SECCOMP_RET_KILL), //返回KILL
    BPF_STMT(BPF_RET+BPF_K,SECCOMP_RET_ALLOW), //返回ALLOW
};

struct sock_fprog prog = {
    .len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),//规则条数
    .filter = filter,                                         //结构体数组指针
};

    prctl(PR_SET_NO_NEW_PRIVS,1,0,0,0);             //设置NO_NEW_PRIVS
    prctl(PR_SET_SECCOMP,SECCOMP_MODE_FILTER,&prog);
    write(0,"test\n",5);
    system("/bin/sh");
    return 0;
}

小结:有了 seccomp-bpf 后,我们就可以针对系统调用做一些定制化的约束,在安全和功能之间取得平衡。

关于 seccomp-bpf 更多讨论,这篇文章非常好:https://xz.aliyun.com/t/11480

基础设施: Linux Namespaces

Namespaces are a feature of the Linux kernel that partitions kernel resources such that one set of processes sees one set of resources while another set of processes sees a different set of resources. The feature works by having the same namespace for a set of resources and processes, but those namespaces refer to distinct resources. Resources may exist in multiple spaces. Examples of such resources are process IDs, host-names, user IDs, file names, some names associated with network access, and Inter-process communication
.
Namespaces are a fundamental aspect of containers in Linux.

这篇文章里摘来一个总结,包含了比较全面的 namespace 资源:

关于 namespace 的更多概念,参考 wiki

NSJail

编译依赖

bison 3.0+
libnl3-devel.x86_64
protobuf-devel.x86_64

特别注意:protobuf 的 library 版本要和 protoc 文件的版本一致,不然会各种链接报错。
编译命令: PATH=/.vos/.dep_cache/7d6d26725ac1e91bc824e1be337cf31e/bin/:/share/nsjail/bison/bin/:$PATH make -j

在我的系统上,clone(flags=CLONE_NEWUSER) 还不支持,所以需要用 --disable_clone_newuser 把这个 flag 过滤掉。

[xiaochu.yh ~/tools/nsjail] (master) $sudo LD_LIBRARY_PATH=/.vos/.dep_cache/7d6d26725ac1e91bc824e1be337cf31e/var/usr/local/gcc-5.2.0/lib64/ nsjail -Mr --chroot / -R /tmp/ --user 99999 --group 99999 --disable_clone_newuser  -- /bin/sh -i
[I][2023-03-07T21:56:30+0800] Mode: STANDALONE_RERUN
[I][2023-03-07T21:56:30+0800] Jail parameters: hostname:'NSJAIL', chroot:'/', process:'/bin/sh', bind:[::]:0, max_conns:0, max_conns_per_ip:0, time_limit:0, personality:0, daemonize:false, clone_newnet:true, clone_newuser:false, clone_newns:true, clone_newpid:true, clone_newipc:true, clone_newuts:true, clone_newcgroup:true, clone_newtime:false, keep_caps:false, disable_no_new_privs:false, max_cpus:0
[I][2023-03-07T21:56:30+0800] Mount: '/' -> '/' flags:MS_RDONLY|MS_BIND|MS_REC|MS_PRIVATE type:'' options:'' dir:true
[I][2023-03-07T21:56:30+0800] Mount: '/proc' flags:MS_RDONLY type:'proc' options:'' dir:true
[I][2023-03-07T21:56:30+0800] Uid map: inside_uid:99999 outside_uid:0 count:1 newuidmap:false
[I][2023-03-07T21:56:30+0800] Gid map: inside_gid:99999 outside_gid:0 count:1 newgidmap:false
[W][2023-03-07T21:56:30+0800][1] initNs():223 prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_CLEAR_ALL): Invalid argument
[I][2023-03-07T21:56:30+0800] Executing '/bin/sh' for '[STANDALONE MODE]'
sh: cannot set terminal process group (-1): Inappropriate ioctl for device
sh: no job control in this shell
sh-4.2$ ls
bin  boot  data  dev  etc  home  lib  lib64  lost+found  media  mnt  ob  opt  proc  root  run  sbin  share  srv  sys  tmp  u01  usr  var

sh-4.2$ ps wuax
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
99999         1  0.1  0.0  13760  1744 ?        SNs  21:59   0:00 /bin/sh -i
99999         2  0.0  0.0  49556  1716 ?        RN   21:59   0:00 ps wuax

sh-4.2$ id
uid=99999 gid=99999 groups=99999

sh-4.2$ echo "abc" > /tmp/abc.txt
sh: /tmp/abc.txt: Read-only file system

nsjail 的选项如下:

Usage: nsjail [options] -- path_to_command [args]
 Options:
  --help|-h
        Help plz..
  --mode|-M VALUE
        Execution mode (default: 'o' [MODE_STANDALONE_ONCE]):
        l: Wait for connections on a TCP port (specified with --port) [MODE_LISTEN_TCP]
        o: Launch a single process on the console using clone/execve [MODE_STANDALONE_ONCE]
        e: Launch a single process on the console using execve [MODE_STANDALONE_EXECVE]
        r: Launch a single process on the console with clone/execve, keep doing it forever [MODE_STANDALONE_RERUN]
  --config|-C VALUE
        Configuration file in the config.proto ProtoBuf format (see configs/ directory for examples)
  --exec_file|-x VALUE
        File to exec (default: argv[0])
  --execute_fd
        Use execveat() to execute a file-descriptor instead of executing the binary path. In such case argv[0]/exec_file denotes a file path before mount namespacing
  --chroot|-c VALUE
        Directory containing / of the jail (default: none)
  --no_pivotroot
        When creating a mount namespace, use mount(MS_MOVE) and chroot rather than pivot_root. Usefull when pivot_root is disallowed (e.g. initramfs). Note: escapable is some configuration
  --rw
        Mount chroot dir (/) R/W (default: R/O)
  --user|-u VALUE
        Username/uid of processes inside the jail (default: your current uid). You can also use inside_ns_uid:outside_ns_uid:count convention here. Can be specified multiple times
  --group|-g VALUE
        Groupname/gid of processes inside the jail (default: your current gid). You can also use inside_ns_gid:global_ns_gid:count convention here. Can be specified multiple times
  --hostname|-H VALUE
        UTS name (hostname) of the jail (default: 'NSJAIL')
  --cwd|-D VALUE
        Directory in the namespace the process will run (default: '/')
  --port|-p VALUE
        TCP port to bind to (enables MODE_LISTEN_TCP) (default: 0)
  --bindhost VALUE
        IP address to bind the port to (only in [MODE_LISTEN_TCP]), (default: '::')
  --max_conns VALUE
        Maximum number of connections across all IPs (only in [MODE_LISTEN_TCP]), (default: 0 (unlimited))
  --max_conns_per_ip|-i VALUE
        Maximum number of connections per one IP (only in [MODE_LISTEN_TCP]), (default: 0 (unlimited))
  --log|-l VALUE
        Log file (default: use log_fd)
  --log_fd|-L VALUE
        Log FD (default: 2)
  --time_limit|-t VALUE
        Maximum time that a jail can exist, in seconds (default: 600)
  --max_cpus VALUE
        Maximum number of CPUs a single jailed process can use (default: 0 'no limit')
  --daemon|-d
        Daemonize after start
  --verbose|-v
        Verbose output
  --quiet|-q
        Log warning and more important messages only
  --really_quiet|-Q
        Log fatal messages only
  --keep_env|-e
        Pass all environment variables to the child process (default: all envars are cleared)
  --env|-E VALUE
        Additional environment variable (can be used multiple times). If the envar doesn't contain '=' (e.g. just the 'DISPLAY' string), the current envar value will be used
  --keep_caps
        Don't drop any capabilities
  --cap VALUE
        Retain this capability, e.g. CAP_PTRACE (can be specified multiple times)
  --silent
        Redirect child process' fd:0/1/2 to /dev/null
  --stderr_to_null
        Redirect child process' fd:2 (STDERR_FILENO) to /dev/null
  --skip_setsid
        Don't call setsid(), allows for terminal signal handling in the sandboxed process. Dangerous
  --pass_fd VALUE
        Don't close this FD before executing the child process (can be specified multiple times), by default: 0/1/2 are kept open
  --disable_no_new_privs
        Don't set the prctl(NO_NEW_PRIVS, 1) (DANGEROUS)
  --rlimit_as VALUE
        RLIMIT_AS in MB, 'max' or 'hard' for the current hard limit, 'def' or 'soft' for the current soft limit, 'inf' for RLIM64_INFINITY (default: 4096)
  --rlimit_core VALUE
        RLIMIT_CORE in MB, 'max' or 'hard' for the current hard limit, 'def' or 'soft' for the current soft limit, 'inf' for RLIM64_INFINITY (default: 0)
  --rlimit_cpu VALUE
        RLIMIT_CPU, 'max' or 'hard' for the current hard limit, 'def' or 'soft' for the current soft limit, 'inf' for RLIM64_INFINITY (default: 600)
  --rlimit_fsize VALUE
        RLIMIT_FSIZE in MB, 'max' or 'hard' for the current hard limit, 'def' or 'soft' for the current soft limit, 'inf' for RLIM64_INFINITY (default: 1)
  --rlimit_nofile VALUE
        RLIMIT_NOFILE, 'max' or 'hard' for the current hard limit, 'def' or 'soft' for the current soft limit, 'inf' for RLIM64_INFINITY (default: 32)
  --rlimit_nproc VALUE
        RLIMIT_NPROC, 'max' or 'hard' for the current hard limit, 'def' or 'soft' for the current soft limit, 'inf' for RLIM64_INFINITY (default: 'soft')
  --rlimit_stack VALUE
        RLIMIT_STACK in MB, 'max' or 'hard' for the current hard limit, 'def' or 'soft' for the current soft limit, 'inf' for RLIM64_INFINITY (default: 'soft')
  --rlimit_memlock VALUE
        RLIMIT_MEMLOCK in KB, 'max' or 'hard' for the current hard limit, 'def' or 'soft' for the current soft limit, 'inf' for RLIM64_INFINITY (default: 'soft')
  --rlimit_rtprio VALUE
        RLIMIT_RTPRIO, 'max' or 'hard' for the current hard limit, 'def' or 'soft' for the current soft limit, 'inf' for RLIM64_INFINITY (default: 'soft')
  --rlimit_msgqueue VALUE
        RLIMIT_MSGQUEUE in bytes, 'max' or 'hard' for the current hard limit, 'def' or 'soft' for the current soft limit, 'inf' for RLIM64_INFINITY (default: 'soft')
  --disable_rlimits
        Disable all rlimits, default to limits set by parent
  --persona_addr_compat_layout
        personality(ADDR_COMPAT_LAYOUT)
  --persona_mmap_page_zero
        personality(MMAP_PAGE_ZERO)
  --persona_read_implies_exec
        personality(READ_IMPLIES_EXEC)
  --persona_addr_limit_3gb
        personality(ADDR_LIMIT_3GB)
  --persona_addr_no_randomize
        personality(ADDR_NO_RANDOMIZE)
  --disable_clone_newnet|-N
        Don't use CLONE_NEWNET. Enable global networking inside the jail
  --disable_clone_newuser
        Don't use CLONE_NEWUSER. Requires euid==0
  --disable_clone_newns
        Don't use CLONE_NEWNS
  --disable_clone_newpid
        Don't use CLONE_NEWPID
  --disable_clone_newipc
        Don't use CLONE_NEWIPC
  --disable_clone_newuts
        Don't use CLONE_NEWUTS
  --disable_clone_newcgroup
        Don't use CLONE_NEWCGROUP. Might be required for kernel versions < 4.6
  --enable_clone_newtime
        Use CLONE_NEWTIME. Supported with kernel versions >= 5.3
  --uid_mapping|-U VALUE
        Add a custom uid mapping of the form inside_uid:outside_uid:count. Setting this requires newuidmap (set-uid) to be present
  --gid_mapping|-G VALUE
        Add a custom gid mapping of the form inside_gid:outside_gid:count. Setting this requires newgidmap (set-uid) to be present
  --bindmount_ro|-R VALUE
        List of mountpoints to be mounted --bind (ro) inside the container. Can be specified multiple times. Supports 'source' syntax, or 'source:dest'
  --bindmount|-B VALUE
        List of mountpoints to be mounted --bind (rw) inside the container. Can be specified multiple times. Supports 'source' syntax, or 'source:dest'
  --tmpfsmount|-T VALUE
        List of mountpoints to be mounted as tmpfs (R/W) inside the container. Can be specified multiple times. Supports 'dest' syntax. Alternatively, use '-m none:dest:tmpfs:size=8388608'
  --mount|-m VALUE
        Arbitrary mount, format src:dst:fs_type:options
  --symlink|-s VALUE
        Symlink, format src:dst
  --disable_proc
        Disable mounting procfs in the jail
  --proc_path VALUE
        Path used to mount procfs (default: '/proc')
  --proc_rw
        Is procfs mounted as R/W (default: R/O)
  --seccomp_policy|-P VALUE
        Path to file containing seccomp-bpf policy (see kafel/)
  --seccomp_string VALUE
        String with kafel seccomp-bpf policy (see kafel/)
  --seccomp_log
        Use SECCOMP_FILTER_FLAG_LOG. Log all actions except SECCOMP_RET_ALLOW). Supported since kernel version 4.14
  --nice_level VALUE
        Set jailed process niceness (-20 is highest -priority, 19 is lowest). By default, set to 19
  --cgroup_mem_max VALUE
        Maximum number of bytes to use in the group (default: '0' - disabled)
  --cgroup_mem_memsw_max VALUE
        Maximum number of memory+swap bytes to use (default: '0' - disabled)
  --cgroup_mem_swap_max VALUE
        Maximum number of swap bytes to use (default: '-1' - disabled)
  --cgroup_mem_mount VALUE
        Location of memory cgroup FS (default: '/sys/fs/cgroup/memory')
  --cgroup_mem_parent VALUE
        Which pre-existing memory cgroup to use as a parent (default: 'NSJAIL')
  --cgroup_pids_max VALUE
        Maximum number of pids in a cgroup (default: '0' - disabled)
  --cgroup_pids_mount VALUE
        Location of pids cgroup FS (default: '/sys/fs/cgroup/pids')
  --cgroup_pids_parent VALUE
        Which pre-existing pids cgroup to use as a parent (default: 'NSJAIL')
  --cgroup_net_cls_classid VALUE
        Class identifier of network packets in the group (default: '0' - disabled)
  --cgroup_net_cls_mount VALUE
        Location of net_cls cgroup FS (default: '/sys/fs/cgroup/net_cls')
  --cgroup_net_cls_parent VALUE
        Which pre-existing net_cls cgroup to use as a parent (default: 'NSJAIL')
  --cgroup_cpu_ms_per_sec VALUE
        Number of milliseconds of CPU time per second that the process group can use (default: '0' - no limit)
  --cgroup_cpu_mount VALUE
        Location of cpu cgroup FS (default: '/sys/fs/cgroup/cpu')
  --cgroup_cpu_parent VALUE
        Which pre-existing cpu cgroup to use as a parent (default: 'NSJAIL')
  --cgroupv2_mount VALUE
        Location of cgroupv2 directory (default: '/sys/fs/cgroup')
  --use_cgroupv2
        Use cgroup v2
  --detect_cgroupv2
        Use cgroupv2, if it is available. (Specify instead of use_cgroupv2)
  --iface_no_lo
        Don't bring the 'lo' interface up
  --iface_own VALUE
        Move this existing network interface into the new NET namespace. Can be specified multiple times
  --macvlan_iface|-I VALUE
        Interface which will be cloned (MACVLAN) and put inside the subprocess' namespace as 'vs'
  --macvlan_vs_ip VALUE
        IP of the 'vs' interface (e.g. "192.168.0.1")
  --macvlan_vs_nm VALUE
        Netmask of the 'vs' interface (e.g. "255.255.255.0")
  --macvlan_vs_gw VALUE
        Default GW for the 'vs' interface (e.g. "192.168.0.1")
  --macvlan_vs_ma VALUE
        MAC-address of the 'vs' interface (e.g. "ba:ad:ba:be:45:00")
  --macvlan_vs_mo VALUE
        Mode of the 'vs' interface. Can be either 'private', 'vepa', 'bridge' or 'passthru' (default: 'private')
  --disable_tsc
        Disable rdtsc and rdtscp instructions. WARNING: To make it effective, you also need to forbid `prctl(PR_SET_TSC, PR_TSC_ENABLE, ...)` in seccomp rules! (x86 and x86_64 only). Dynamic binaries produced by GCC seem to rely on RDTSC, but static ones should work.
  --forward_signals
        Forward fatal signals to the child process instead of always using SIKGILL.

 Examples:
  Wait on a port 31337 for connections, and run /bin/sh
   nsjail -Ml --port 31337 --chroot / -- /bin/sh -i
  Re-run echo command as a sub-process
   nsjail -Mr --chroot / -- /bin/echo "ABC"
  Run echo command once only, as a sub-process
   nsjail -Mo --chroot / -- /bin/echo "ABC"
  Execute echo command directly, without a supervising process
   nsjail -Me --chroot / --disable_proc -- /bin/echo "ABC"
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值