BGPstream-快速处理你需要的BGP数据流

代码不行的搬运工

已于 2024-01-28 14:47:15 修改

阅读量1.8k

点赞数 24

分类专栏：数据收集-BGP数据流文章标签：网络运维 python github

于 2024-01-28 10:00:00 首次发布

本文链接：https://blog.csdn.net/qq_41914036/article/details/135887726

版权

数据收集-BGP数据流专栏收录该内容

1 篇文章

订阅专栏

本文介绍了用于实时和历史BGP数据分析的开源软件框架BGPStream，包括其框架组件布局的三个概念层，即数据访问层、记录提取与打包层、记录处理层，还介绍了安装方法、BGPReader使用方法、数据编码，以及Python API的基础演示和使用。BGPStream可助力高效调查事件和构建监控应用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

对于长期研究BGP协议的研究人员而言，想必对于BGPstream并不陌生。BGPStream是一个可以用于实时和历史 BGP 数据分析的开源软件框架，支持科学研究、运营监控和事后分析。BGPStream拥有强大的工具和API。它使用 C/C++ API 等从命令行快速检查原始 BGP 数据、开发 Python 应用程序或构建复杂系统。只要你给 BGPStream 一个时间范围，它将自动获取正确的数据并将其流式传输给您，还可以通过更改单个参数实现实时监控。 BGPStream 框架组件布局从上到下有三个概念层：记录处理层、记录提取与打包层、数据访问层，其框架图如下所示。

记录处理层使用 libBGPStream 处理 BGP 数据的组件。例如 BGPReader、PyBGPStream。
记录提取与打包层由 libBGPStream 实现，是BGPStream 框架的核心。
数据访问层提供对各种 BGP 数据源的访问的组件。

接下来，我们从下往上简单的较少一下各个层，以及其简单用法。

1. 数据访问层

数据和元数据访问层提供有关可用 BGP 数据的信息。它还提供数据注释，如收集项目和获取数据的路由器。BGPStream提供了四个数据接口来识别可供处理的 BGP 数据：

BGPStream Broker，默认数据接口，提供开箱即用，无缝访问公共数据提供程序。
Single File，一个数据接口，提供对单个MRT转储文件（本地或通过HTTP）的访问。
CSV 文件，适用于将 BGPStream 与本地可用（例如私有）数据文件一起使用的伪数据库。
SQLite DB，类似于 CSV 文件，但元数据存储在 SQLite DB 中，从而提供更好的可扩展性。

1.1 BGPStream 代理

BGPStream代理是一种网络服务，它提供一个统一的查询接口，以便从不同的公共数据提供者（如 Route View 和 RIPE RIS）以及 RIS-Live 和 RouteViews BMP 流等实时流资源检索数据流。代理接口支持 BGPStream 的几个关键功能：

开箱即用地访问RouteViews和 RIPE RIS 数据。
数据归档镜像之间的负载平衡。
支持实时数据处理。

CAIDA 运行着一个可公开访问的经纪人实例，libBGPStream 默认配置为查询该实例。有关从第三方应用程序使用经纪人的信息，请参阅查询 API 文档。

1.2 Single-File

Single-File允许用户访问单个本地或远程（通过 HTTP）RIB 转储和/或更新转储（类似于 bgpdump 工具的操作方式）。

1.3 CSV File

对于拥有少量本地/私有 MRT 转储文件的用户，CSV 文件界面可能比较合适。只需按照以下格式创建 CSV 文件（每个文件一行）即可：

<dump-path>,<project>,<bgp-type>,<collector>,<dump-ts>,<duration>,<insertion-ts>

1.4 示例

以下一行提供了存储在本地目录中的路由视图更新转储的元数据。该转储由 Jinx 收集器生成，包含 1427846400 至 1427846400 + 900（即 15 分钟的更新）区间内的更新。

routeviews.route-views.jinx.updates.1427846400.bz2,routeviews,updates,route-views.jinx,1427846400,900,1430438400

1.5 SQLite

SQLite 接口与 CSV 文件接口类似，但可用于较大的数据集，并能更好地支持实时模式（即在有新文件时用新文件更新数据库）。为了便于将数据插入具有适当模式的 SQLite 数据库，BGPStream提供了一个脚本（tools/bgpstream_sqlite_mgmt.py），用于向数据库添加新的转储元数据。bgpstream_sqlite_mgmt.py 每次调用时都会插入新转储的元数据，如果数据库文件不存在，则会创建数据库文件。

usage: bgpstream_sqlite_mgmt.py [-h] [-l] [-M ADD_MRT_FILE] [-p PROJ]
                                [-c COLL] [-t BGP_TYPE] [-T FILE_TIME]
                                [-u UPDATES_TIME_SPAN]
                                sqlite_db
positional arguments:
  sqlite_db             file containing the sqlite database

optional arguments:
  -h, --help            show this help message and exit
  -l, --list_files      list the mrt files in the database
  -M ADD_MRT_FILE, --add_mrt_file ADD_MRT_FILE
                        path to the mrt file to add to the database
  -p PROJ, --proj PROJ  bgp project
  -c COLL, --coll COLL  bgp collector
  -t BGP_TYPE, --bgp_type BGP_TYPE
                        bgp type
  -T FILE_TIME, --file_time FILE_TIME
                        time associated with the mrt file
  -u UPDATES_TIME_SPAN, --updates_time_span UPDATES_TIME_SPAN
                        updates time span

2.记录提取与打包层

记录提取和打包层由该框架的核心库 libbgpstream 实现，它提供以下功能：

透明访问并发转储
- 以透明方式访问来自多个收集器的并发转储、
- 不同收集器项目的
- RIB 和 Updates 类型的转储
实时数据处理
数据提取、注释和错误检查
生成按时间戳排序的 BGP 测量数据流
用户可指定和接收数据流的 API

libBGPStream 用户应用程序接口（API）提供了配置和消费 BGP 测量数据分类流的基本功能，以及将 BGP 信息系统地组织到数据结构中的基本功能。

3.记录处理层

记录处理层由使用 libBGPStream API 的所有组件组成。BGPStream 分配了多个记录处理组件。

BGPReader，一种以 ASCII 格式输出 BGP 数据的命令行工具
PyBGPStream，与 libBGPStream API 的 Python 绑定

4.安装 BGPStream

BGPStream 框架由两个软件包组成：libBGPStream 和 PyBGPStream。libBGPStream 包含 libBGPStream C 库以及 BGPReader 工具。PyBGPStream 是一个 Python 包，绑定到 libBGPStream 库，允许直接从 Python 脚本使用 BGPStream。

libBGPStream目前提供6种系统版本的安装代码：

在这里我们只简单介绍一下Ubuntu/Debian系统的安装流程。Ubuntu/Debian系统的libBGPStream又有两种安装方法：使用 apt 安装和从源代码安装。小伙伴们两种安装方法都可以自行尝试。如果你的系统支持直接使用apt安装，那可就要恭喜你了。因为使用apt安装的是最方便的，也是最可靠的。而使用源代码安装的则可能面临无法安装成功的窘境，至少在搬运工手动安装的时候遇到了各种问题。

4.1 从apt安装

首先，我们介绍一下从apt安装：

安装依赖项：wandio 存储库和CAIDA 存储库。-->安装 BGPStream 软件包。--> 安装 PyBGPStream。下面这些命令按条无脑冲就完事了。

sudo apt-get update

sudo apt-get install -y curl apt-transport-https ssl-cert ca-certificates gnupg lsb-release 

curl -1sLf 'https://dl.cloudsmith.io/public/wand/libwandio/cfg/setup/bash.deb.sh' | sudo -E bash 

echo "deb https://pkg.caida.org/os/$(lsb_release -si|awk '{print tolower($0)}') $(lsb_release -sc) main" | sudo tee /etc/apt/sources.list.d/caida.list

sudo wget -O /etc/apt/trusted.gpg.d/caida.gpg https://pkg.caida.org/os/ubuntu/keyring.gpg 

sudo apt-get install bgpstream

sudo apt-get install python3-pybgpstream

其中，PyBGPStream还可以直接用下面这条命令安装：

pip install pybgpstream

4.2 从源代码安装

如果很遗憾的，你的系统也不能使用命令行直接安装，说明你的软件源没有包含BGPstream，那就只能走安装包了。我们先按照网站说明的步骤操作一遍吧，但根据搬运工的实操，官网的libBGPStream安装包有问题，缺少Makefile文件，因此一般情况下，你得手动调整。先看原文：

安装依赖项--> 下载并安装 wandio --> 下载并安装 BGPStream --> 安装 PyBGPStream --> 测试安装情况。和上面的方法一样，也是按照命令顺序无脑冲。

# 安装依赖
$ sudo apt-get install build-essential curl zlib1g-dev libbz2-dev libcurl4-openssl-dev librdkafka-dev

# 下载并安装 wandio
$ mkdir ~/src
$ cd ~/src/
$ curl -LO https://github.com/LibtraceTeam/wandio/archive/refs/tags/4.2.4-1.tar.gz
$ tar zxf wandio-4.2.3.tar.gz
$ cd wandio-4.2.3/
$ ./configure
$ make
$ sudo make install
$ sudo ldconfig

# 下载并安装 BGPStream
$ cd ~/src/
$ curl -LO https://github.com/CAIDA/libbgpstream/releases/download/v2.2.0/libbgpstream-2.2.0.tar.gz
$ tar zxf libbgpstream-2.2.0.tar.gz
$ cd libbgpstream-2.2.0/
$ ./configure
$ make
$ make check
$ sudo make install
$ sudo ldconfig

# 安装pybgpstream
$ mkdir ~/src
$ cd ~/src/
$ curl -LO https://github.com/CAIDA/pybgpstream/releases/download/v2.0.2/pybgpstream-2.0.2.tar.gz
$ tar zxf pybgpstream-2.0.2.tar.gz
$ cd pybgpstream-2.0.2
$ python3 setup.py build_ext
$ sudo python3 setup.py install

# 测试安装情况
$ curl https://raw.githubusercontent.com/CAIDA/pybgpstream/master/examples/tutorial_print.py | python3

4.3 问题处理

1. WANDIO安装没有configure文件。

这是因为官网指导中缺少了一条从guihub获取的资源包的安装命令：./bootstrap.sh。你先运行这条命令，就会生成configure文件。强烈建议你直接把可选项依赖全部安装，包括下列部分：

         * libpthread
         * zlib-dev
         * libbz2-dev
         * liblzma-dev
         * liblzo2-dev
         * liblz4-dev
         * libzstd-dev

2. configure.ac:65: error: AM_INIT_AUTOMAKE expanded multiple times.

你遇到的问题是在运行 configure.ac 文件时出现错误，具体来说，AM_INIT_AUTOMAKE 宏被多次扩展。这个问题通常是由于宏被重复定义或多次包含导致的。解决这个问题的一种方法是检查你的 configure.ac 文件，确保 AM_INIT_AUTOMAKE 宏只被定义一次。如果你在其他地方也定义了这个宏，可能会导致问题。你需要找到并删除这些额外的定义或者注释掉。

3. libBGPStream目录下没有configure文件。

有这个问题的小伙伴肯定是下载了libBGPStream-master，因为这个包就只给最主要的文件，甚至连configure文件都没有，如果出现没有configure文件的问题，你需要手动生成configure。步骤如下：

aclocal
autoconf
autoreconf -vfi

4. error: Libtool library used but 'LIBTOOL' is undefined.

这个问题是紧跟上面的问题的，因为你没有安装libtool工具，这时候需要手工安装这个工具。

sudo apt-get install libtool

5. error：could not find pthread_yield function.

如果你的包报这个错，估计你是安装的libBGPStream-2.2.0，这个包里面包含了一个主要错误，就是configure文件中，没有及时更新pthread_yield函数。由于新版本弃用了pthread_yield函数，因此所有pthread_yield都应该被替换为sched_yield函数，请仔细检查你的configure和configure.ac文件。如果是手动生成的configure，所则需要重新生成configure文件。

最后，搬运工给大家提供一个处理之后的资源包，大家可以开箱即用无脑冲。

5. BGPReader使用

BGPReader 是一个命令行工具，可打印到标准输出有关 BGP 记录和属于 BGP 的 BGP elem 的信息 BGP 流。熟悉 BGPdump 工具的用户应该会发现使用 BGPReader 很容易;BGPReader 甚至支持 BGPdump 输出格式，因此在某些情况下 BGPReader 可以用作 BGPdump 的直接替代品。此外，由于 BGPReader 提供无缝访问Route Views和 RIS 数据存档，用户不再需要手动获取数据，只需提供一个时间间隔即可 BGPReader 和 BGPStream 将完成剩下的工作。-m

5.1使用方法

当输入bgpreader 时，BGPReader 会提供有用的使用提示。

Usage: bgpreader [<options>]

提供的选项包括：

 -d, --data-interface  <interface>      use the given data interface to find available data. Available values are:
                       broker           Retrieve metadata information from the BGPStream Broker service (default)
                       singlefile       Read a single mrt data file (RIB and/or updates)
                       kafka            Read updates in real-time from an Apache Kafka topic
                       csvfile          Retrieve metadata information from a csv file
                       sqlite           Retrieve metadata information from an SQLite database
 -o, --data-interface-option
                       <option-name>=<option-value>*
                                        set an option for the current data interface. Use '-o?' to get a list of available options for the current data interface (as selected with -d)

默认的数据接口是 broker，它允许 BGPReader 开箱即用地访问Route Views和 RIPE RIS 数据。数据接口参数可使用选项-o 设置。

5.2 流过滤选项

有关可用采集器和相关时间间隔的信息，请访问数据提供者页面。

 -w, --time-window     <start>[,<end>]  process records within the given time window.  <start> and <end> may be in 'Y-m-d [H:M[:S]]' format (in UTC) or in unix epoch time.  Omitting <end> enables live mode.
 -f, --filter          <filterstring>   filter records and elements using the rules described in the given filter string
 -l, --live                             enable live mode (make blocking requests for BGP records); allows bgpstream to be used to process data in real-time

 -I, --interval        <num> <unit>     process records that were received the last <num> <unit>s of time, where <unit> is one of 's', 'm', 'h', 'd' (seconds, minutes, hours, days).
 -n, --count           <rec-cnt>        process at most <rec-cnt> records
 -p, --project         <project>        process records from only the given project (routeviews, ris)*
 -c, --collector       <collector>      process records from only the given collector*
 -R, --router          <router>         process records from only the given router*
 -t, --record-type     <type>           process records with only the given type (ribs, updates)*
 -T, --resource-type   <resource-type>  process records from only the given resource type (stream, batch)*
 -P, --rib-period      <period>         process a rib files every <period> seconds (bgp time)

 -j, --peer-asn        <peer ASN>       return elems received by a given peer ASN*
 -a, --origin-asn      <origin ASN>     return elems originated by a given origin ASN*
 -k, --prefix          <prefix>         return elems associated with a given prefix*
 -y, --community       <community>      return elems with the specified community* (format: asn:value. the '*' metacharacter is recognized)
 -A, --aspath          <regex>          return elems that match the aspath regex*

5.3 输出格式选项

* 表示可多次给出的选项。

 -e, --output-elems                     print info for each element of a BGP record (default)
 -m, --output-bgpdump                   print info for each BGP record in bgpdump -m format
 -r, --output-records                   print info for each BGP record (used mostly for debugging BGPStream)
 -i, --output-headers                   print format information before output

5.4 ASCII 输出格式

以下是有关以下格式的详细信息：

BGP Elem (default format) -e
BGPdump -m
BGP Record -r

BGP Elem格式：

<dump-type>|<elem-type>|<record-ts>|<project>|<collector>|<router-name>|<router-ip>|<peer-ASn>|<peer-IP>|<prefix>|<next-hop-IP>|<AS-path>|<origin-AS>|<communities>|<old-state>|<new-state>

dump-type 字段包括:R - RIB和U - Update；elem-type 字段包括：R - RIB、A - announcement、W - withdrawal、S - state message。

当数据流包含 RIB 数据时，我们还提供 RIB 控制信息，以通知 RIB 的开始和结束。控制信息的格式如下：

<dump-type>|<dump-pos>|<record-ts>|<project>|<collector>|<router-name>|<router-ip>|<record-status>|<dump-time>

其中dump-type字段始终设置为 R，dump-pos为B - begin或者E - end。

$ bgpreader -w 1445306400,1445306402 -c route-views.sfmix
R|B|1445306400.000000|routeviews|route-views.sfmix|||V|1445306400
R|R|1445306400.000000|routeviews|route-views.sfmix|||32354|206.197.187.5|1.0.0.0/24|206.197.187.5|32354 15169|15169|||
R|R|1445306400.000000|routeviews|route-views.sfmix|||14061|206.197.187.10|1.0.0.0/24|206.197.187.10|14061 15169|15169|||
...
R|R|1445306401.000000|routeviews|route-views.sfmix|||14061|2001:504:30::ba01:4061:1|3803:b600::/32|2001:504:30::ba01:4061:1|14061 2914 3549 27751|27751|2914:420 2914:1008 2914:2000 2914:3000||
R|E|1445306401.000000|routeviews|route-views.sfmix|||V|1445306400
U|A|1445306401.000000|routeviews|route-views.sfmix|||32354|2001:504:30::ba03:2354:1|2402:ef35::/32|2001:504:30::ba03:2354:1|32354 6939 6453 4755 7633|7633|||
U|A|1445306401.000000|routeviews|route-views.sfmix|||14061|2001:504:30::ba01:4061:1|2a02:158:200::/39|2001:504:30::ba01:4061:1|14061 2914 44946|44946|2914:410 2914:1201 2914:2202 2914:3200||
...

5.5 BGPdump出格式 -m

BGPReader 支持 bgpdump 每条目一行的 unix 时间戳输出格式。

$ bgpreader -w 1445306400,1445306402 -p ris -m
BGP4MP|1445306400|A|146.228.1.3|1836|212.22.66.0/24|1836 6939 12389 41938 8359 50618 35189 201432|IGP|146.228.1.3|0|0|1836:120 1836:3100 1836:3110|NAG||
BGP4MP|1445306400|A|146.228.1.3|1836|177.154.84.0/22|1836 12989 28640 262401 262401 262401 262401 262401 262401 262401 262401 262949|IGP|146.228.1.3|0|0|1836:120 1836:3100 1836:3110|NAG||
BGP4MP|1445306400|A|146.228.1.3|1836|209.212.8.0/24|1836 174 7922 33659 21669|IGP|146.228.1.3|0|0|1836:110 1836:6000 1836:6031|AG|21669 209.212.31.21|
BGP4MP|1445306400|W|12.0.1.63|7018|193.221.122.0/24
BGP4MP|1445306400|A|146.228.1.3|1836|177.154.84.0/22|1836 12989 52840 262949 262949 262949 262949 262949 262949 262949 262949 262949 262949 262949 262949|IGP|146.228.1.3|0|0|1836:120 1836:3100 1836:3110|NAG||
BGP4MP|1445306400|A|146.228.1.3|1836|212.22.66.0/24|1836 6939 12389 41938 8359 50618 35189 201432|IGP|146.228.1.3|0|0|1836:110 1836:3000 1836:3020|NAG||
BGP4MP|1445306400|A|213.200.87.254|3257|194.36.165.0/24|3257 3356 3356 24724 41930|IGP|213.200.87.254|0|10|3257:8093 3257:30049 3257:50002 3257:51100 3257:51102|NAG||
BGP4MP|1445306400|W|2001:504:1::a502:4482:1|24482|2a03:5080::/32
BGP4MP|1445306400|A|198.32.160.242|24482|208.74.216.0/21|24482 7029 40377|IGP|198.32.160.242|0|0|7029:260 7029:1001 7029:1002 24482:2 24482:13020 24482:13021 24482:65302|NAG||
BGP4MP|1445306400|W|198.32.160.242|24482|193.221.122.0/24
...

5.5 BGP Record输出格式 -r

注意：BGPRecord 格式主要用于调试 BGPStream，因为它包含从转储文件中读取的记录有效性的底层信息。

<dump-type>|<dump-pos>|<record-ts>|<project>|<collector>|<router-name>|<router-ip>|<record-status>|<dump-time>

dump-type 字段包括:R - RIB或者U - Update；

dump-pos 字段包括B - begin、M - middle、E - end；

status 字段包括V - valid record、E - empty (it signals an empty dump file)、R - corrupted record、S - corrupted source (the entire dump is corrupted)。

$ bgpreader -w 1445306400,1445306402 -c route-views.sfmix -r
R|B|1445306400.000000|routeviews|route-views.sfmix|||V|1445306400
R|B|1445306400.000000|routeviews|route-views.sfmix|||V|1445306400
R|M|1445306400.000000|routeviews|route-views.sfmix|||V|1445306400
R|M|1445306400.000000|routeviews|route-views.sfmix|||V|1445306400
R|M|1445306400.000000|routeviews|route-views.sfmix|||V|1445306400
R|M|1445306400.000000|routeviews|route-views.sfmix|||V|1445306400
R|M|1445306400.000000|routeviews|route-views.sfmix|||V|1445306400
R|M|1445306400.000000|routeviews|route-views.sfmix|||V|1445306400
R|M|1445306400.000000|routeviews|route-views.sfmix|||V|1445306400
R|M|1445306400.000000|routeviews|route-views.sfmix|||V|1445306400
...

5.5 替代性的数据接口

Single-File -d singlefile

Data interface options for 'singlefile':
   rib-file       rib mrt file to read (default: "not-set")
   rib-type       rib file type (mrt/bmp) (default: mrt)
   upd-file       updates mrt file to read (default: "not-set")
   upd-type       update file type (mrt/bmp/ris-live) (default: mrt)

CSV File -d csvfile

Data interface options for 'csvfile':
   csv-file       csv file listing the mrt data to read  (default: "not-set")

SQLite DB -d sqlite

Data interface options for 'sqlite':
   db-file        sqlite database (default: "not-set")

6. 数据编码

以下是我们在设计和实现 libBGPStream 和 PyBGPStream 时采用的一些数据编码实践。

6.1 AS 路径

每个 AS 路径包含一个 AS 路径段列表，以空格分隔。
每个 AS 路径段以下列字符串格式表示：

如果网段是一个简单的 ASN（BGPSTREAM_AS_PATH_SEG_ASN），那么字符串将是 ASN 的十进制表示法（不是点-十进制）。
如果网段是 AS 集（BGPSTREAM_AS_PATH_SEG_SET），那么字符串将是用逗号分隔的 ASN 列表，并用大括号括起来。例如，"{12345,6789}"。
如果网段是 AS Confederation Set（BGPSTREAM_AS_PATH_SEG_CONFED_SET），则字符串将是用逗号分隔的 ASN 列表，并用括号括起来。例如，"[12345,6789]"。
如果网段是 AS Confederation Sequence（BGPSTREAM_AS_PATH_SEG_CONFED_SEQ），则字符串将是用空格分隔的 ASN 列表，并用括号括起来。例如，"(12345 6789)"。
如果数据段是未知类型（不应该出现这种情况），那么字符串将是一个用空格分隔的 ASN 列表，并用角括弧括起来。例如，"<12345 6789>"。

注意：集合/序列可能只有一个元素。

6.2 前缀

对于 IPv4 和 IPv6 前缀，IP 前缀通常用 NETWORK_ADDR/MASK 表示。例如，你可能会看到来自 Google 的公告，其前缀为 8.8.8.0/24 或 2001:4860::/32。

6.3 社区值

BGP 公告中的社区值（参见 RFC1997 和 RFC8642）由若干社区值段表示，中间用空格隔开。每个社区值段用 ASN:VALUE 表示，其中 ASN 是最初设置社区值的 AS 的 AS 号，VALUE 是实际社区值。ASN 和 VALUE 均用 16 位数字表示。
例如，社区值 10000:65535 表示 AS10000 最初设置了 NO_EXPORT 社区值，且此更新只能通过 iBGP 在目的 AS 内部传播。

6.4 记录类型、位置和状态

在 BGPStream 中，我们有两种记录类型： RIB（BGP RIB 表转储）和 UPDATE（BGP 更新），表示如下：R: BGP RIB dump，U: BGP update.

对于 BGP RIB 转储，我们还将转储的位置表示为B: start of dump, M: middle of dump, E: end of dump.

每条记录都有一个状态，表示为V: valid record, F: filtered source, E: empty source, O: outside time interval, S: corrupted source, R: corrupted record, U: unsupported record.

在大多数情况下，您可能会在数据流中看到有效记录 (V)。例如，在 bgpreader 教程中，您可以看到以下输出：

$ bgpreader -w 1445306400,1445306402 -c route-views.sfmix -r
R|B|1445306400.000000|routeviews|route-views.sfmix|||V|1445306400
R|B|1445306400.000000|routeviews|route-views.sfmix|||V|1445306400
R|M|1445306400.000000|routeviews|route-views.sfmix|||V|1445306400
R|M|1445306400.000000|routeviews|route-views.sfmix|||V|1445306400

此命令将输出 route-views.sfmix 在 1445306400 和 1445306402 之间收集的所有记录。第一列显示记录类型，即 RIB 转储 (R)。第二列显示记录在资源中的位置，我们可以看到它从开始 (B) 到中间 (M)。倒数第二列显示记录的状态，本例中所有记录均为有效 (V)。

6.5 元素类型

每条记录可能包含多个元素。例如，BGP 更新报文可能在同一报文中包含通告、撤回。
每个元素可以是以下类型： R: RIB table entry, A: prefix announcement, W: prefix withdrawal, S: peer state change.

6.6 对等状态

BGP 路由收集器的对等方可具有以下状态之一：

IDLE
CONNECT
ACTIVE
OPENSENT
OPENCONFIRM
ESTABLISHED
CLEARING
DELETED

对于对等状态更新，BGPStream 会显示新状态和旧状态，

6.7 Resource

如上所示。BGPStream中使用的每个资源都可通过以下唯一字符串表示来识别： project.collector.record_type.initial_time.duration。

project：资源的项目（如 routeviews 或 rrc）
COLLECTOR：采集器名称（如 rrc02）
RECORD_TYPE：该资源包含的RIB或UPDATES记录的类型
INITIAL_TIME：资源的开始时间，用 Unix 时间整数表示
DURATION：该资源包含的数据的持续时间，用秒数表示

7. Python API (PyBGPStream)

在本文中，我们只关注Python API 的用法，还有一个关于C/C++的API接口，感兴趣的读者可以自行去官网阅读使用教程。

PyBGPStream 是我们对 libBGPStream C API 的 Python 绑定。它提供了与 C API 相同的功能（和大部分相同的效率），但也具有 Python 模块的灵活性，允许快速原型开发和集成。为了简化从 Python 移植到 PyBGPStream 的过程，PyBGPStream 包含 _pybgpstream 模块，它是 libBGPStream API 的一个几乎完全相同的接口。

7.1 基础演示

熟悉 API

作为第一个示例，我们使用 pybgpstream 输出提取的信息来自 BGP 记录和 BGP elems。我们提供分步说明以及本节末尾的脚本下载链接。该示例功能齐全，可以使用以下命令运行：

$ python pybgpstream-print.py
update|A|1499385779.000000|routeviews|route-views.eqix|None|None|11666|206.126.236.24|210.180.224.0/19|206.126.236.24|11666 3356 3786|11666:1000 3356:3 3356:2003 3356:575 3786:0 3356:22 11666:1002 3356:666 3356:86|None|None
update|A|1499385779.000000|routeviews|route-views.eqix|None|None|11666|206.126.236.24|210.180.0.0/19|206.126.236.24|11666 3356 3786|11666:1000 3356:3 3356:2003 3356:575 3786:0 3356:22 11666:1002 3356:666 3356:86|None|None
update|A|1499385788.000000|routeviews|route-views.eqix|None|None|11666|206.126.236.24|210.180.64.0/19|206.126.236.24|11666 6939 4766 4766|11666:2000 11666:2001|None|None
...

pybgpstream-print.py的代码如下：

#!/usr/bin/env python

import pybgpstream
stream = pybgpstream.BGPStream(
    from_time="2017-07-07 00:00:00", until_time="2017-07-07 00:10:00 UTC",
    collectors=["route-views.sg", "route-views.eqix"],
    record_type="updates",
    filter="peer 11666 and prefix more 210.180.0.0/16"
)

for elem in stream:
    # record fields can be accessed directly from elem
    # e.g. elem.time
    # or via elem.record
    # e.g. elem.record.time
    print(elem)

每个 pybgpstream 脚本的第一步是导入 Python 模块和创建新的 BGPStream 实例。在创建 BGPStream 实例的过程中，我们还添加了一些过滤器来缩小流范围。

from_time指定流的开始
until_time指定流的结束时间
collectors将流范围缩小为包含来自指定收集器的记录
record_type="updates"表示我们只想要更新（即不想要 RIB 转储）
最后， filter字符串指定更灵活的滤波器条件

此时我们可以启动流，并反复请求新的 BGP elems。每次读取有效记录时，我们从中提取它包含的 elems 并打印记录和 Elem 字段。如果发现无效记录，我们不会尝试提取 elems。

打印 MOAS 前缀

#!/usr/bin/env python

from collections import defaultdict
import pybgpstream

stream = pybgpstream.BGPStream(
    # Consider this time interval:
    # Sat, 01 Aug 2015 7:50:00 GMT -  08:10:00 GMT
    from_time="2015-08-01 07:50:00", until_time="2015-08-01 08:10:00",
    collectors=["rrc00"],
    record_type="ribs",
)

# <prefix, origin-ASns-set > dictionary
prefix_origin = defaultdict(set)

for rec in stream.records():
    for elem in rec:
        # Get the prefix
        pfx = elem.fields["prefix"]
        # Get the list of ASes in the AS path
        ases = elem.fields["as-path"].split(" ")
        if len(ases) > 0:
            # Get the origin ASn (rightmost)
            origin = ases[-1]
            # Insert the origin ASn in the set of
            # origins for the prefix
            prefix_origin[pfx].add(origin)

# Print the list of MOAS prefix and their origin ASns
for pfx in prefix_origin:
    if len(prefix_origin[pfx]) > 1:
        print((pfx, ",".join(prefix_origin[pfx])))

在第二个教程中，我们将展示如何使用 pybgpstream 输出 MOAS 前缀及其源 AS。

该程序解析从与筛选器匹配的 BGP 记录（收集器、记录类型和 time），将每个prefix的唯一源 ASn 列表保存在哈希映射中，并输出具有多个源 ASns 的prefix。在这种情况下，数据流被配置为返回从 Route View Singapore 收集器生成的 RIB 转储中读取的 BGP 记录，其时间戳区间为 7:50:00 - 08:10:00 Sat, 01 Aug 2015 GMT。我们使用字典将 RIB 转储中观察到的每个前缀与来源 ASns 列表关联起来。每次提取新的 BGP 元素时，程序都会提取前缀和起源 ASn，并更新 prefix_origin 字典。前缀和 AS 路径是字符串字段，存在于任何 RIB 类型的 BGP 元素中。split 函数将 AS 路径字符串转换为字符串数组，每个字符串代表一个 AS 跳，最后一跳是源 AS。

测量 AS 路径膨胀程度

在本例中，我们将演示如何使用 pybgpstream 测量 AS 路径膨胀的程度，即测量有多少 AS 路径由于采用路由策略而比两个 AS 之间的最短路径长。

import pybgpstream
import networkx as nx
from collections import defaultdict
from itertools import groupby

# Create an instance of a simple undirected graph
as_graph = nx.Graph()

bgp_lens = defaultdict(lambda: defaultdict(lambda: None))

stream = pybgpstream.BGPStream(
    # Consider this time interval:
    # Sat, 01 Aug 2015 7:50:00 GMT -  08:10:00 GMT
    from_time="2015-08-01 07:50:00", until_time="2015-08-01 08:10:00",
    collectors=["rrc00"],
    record_type="ribs",
)

for rec in stream.records():
    for elem in rec:
        # Get the peer ASn
        peer = str(elem.peer_asn)
        # Get the array of ASns in the AS path and remove repeatedly prepended ASns
        hops = [k for k, g in groupby(elem.fields['as-path'].split(" "))]
        if len(hops) > 1 and hops[0] == peer:
            # Get the origin ASn
            origin = hops[-1]
            # Add new edges to the NetworkX graph
            for i in range(0,len(hops)-1):
                as_graph.add_edge(hops[i],hops[i+1])
            # Update the AS path length between 'peer' and 'origin'
            bgp_lens[peer][origin] = \
                min(list(filter(bool,[bgp_lens[peer][origin],len(hops)])))

# For each 'peer' and 'origin' pair
for peer in bgp_lens:
    for origin in bgp_lens[peer]:
        # compute the shortest path in the NetworkX graph
        nxlen = len(nx.shortest_path(as_graph, peer, origin))
        # and compare it to the BGP hop length
        print((peer, origin, bgp_lens[peer][origin], nxlen))

该程序读取由 RIS RRC00 收集器生成的 RIB 转储，计算对等 ASn 与源 AS 之间的 AS 跳数，并将其与使用 AS 路径邻接关系构建的简单无向图中相同 AS 对之间的最短路径进行比较。

在本例中，数据流被配置为返回从 RIS RRC00 收集器生成的 RIB 转储中读取的 BGP 记录，其时间戳区间为 7:50:00 - 08:10:00 Sat, 01 Aug 2015 GMT。脚本使用 NetworkX 软件包实用程序生成一个简单的无向图（即没有循环或自边的图）。字典中的字典用于维护在 BGP 中观察到的对等 ASn 与原点 ASn 之间的最短路径。

每次提取新的 BGP 元素时，程序都会删除 AS 路径中重复预置的 ASn（使用 groupby 函数），计算对等 AS 与目标 AS（即起源 AS）之间的 AS 跳数，并将此信息保存在 bgp_lens 字典中。缩减的 AS 路径中的每个邻接都会被用来在 NetworkX 图中添加一个新链接。

最后，对于每一对对等节点和起源节点，脚本都会使用 NetworkX 实用功能计算简单无向图中两个节点之间的最短路径长度。输出结果将 BGP 中观察到的最短路径长度与简单无向图中计算出的最短路径长度并列。

研究社区

在本例中，我们将演示如何使用 pybgpstream 提取与特定类型社区相关的前缀信息。具体来说，我们使用 bgpstream 过滤选项选择一组特定的前缀，以及特定的对等 ASn 和至少有一个以 3400 为社区值的报文。

#!/usr/bin/env python

import pybgpstream
from collections import defaultdict

stream = pybgpstream.BGPStream(
    # Consider this time interval:
    # Sat, 01 Aug 2015 7:50:00 GMT -  08:10:00 GMT
    from_time="2015-08-01 07:50:00", until_time="2015-08-01 08:10:00",
    collectors=["rrc06"],
    record_type="ribs",
    filter="peer 25152 and prefix more 185.84.166.0/23 and community *:3400"
)

# <community, prefix > dictionary
community_prefix = defaultdict(set)

# Get next record
for rec in stream.records():
    for elem in rec:
        # Get the prefix
        pfx = elem.fields['prefix']
        # Get the associated communities
        communities = elem.fields['communities']
        # for each community save the set of prefixes
        # that are affected
        for c in communities:
            community_prefix[c].add(pfx)

# Print the list of MOAS prefix and their origin ASns
for ct in community_prefix:
    print("Community:", ct, "==>", ",".join(community_prefix[ct]))

该程序读取由 RIS RRC06 收集器生成的 RIB 转储，并选择由 25152 对等设备生成的报文，这些报文与 185.84.166.0/23（或更多详细信息）相关，且至少有一个社区的值为 3400。

在本例中，数据流被配置为返回从 RIS RRC06 收集器生成的 RIB 转储中读取的 BGP 记录，其时间戳区间为 7:50:00 - 08:10:00 Sat, 01 Aug 2015 GMT。过滤元素时会考虑三个条件：发端对等 AS 编号、公布的前缀以及至少存在一个以 3400 为值的社区。

一个集合字典维护着受特定社区影响的前缀列表。每次提取新的 BGP 元素时，程序都会用每个社区的 ASn 和值字段建立一个字符串，并将前缀添加到集合中。最后，字典将被写入标准输出。

访问实时数据流源

在本示例中，我们将展示如何使用 pybgpstream 从 Route Views 和 RIPE RIS 访问实时数据流。示例程序将打印出从 Route Views BMP()和 RIPE RIS Live()接收到的实时BGP更新流。访问这些实时流数据源非常简单，只需在脚本中启动对象时将 or 字段设置为 or。

RIPE RIS Live

#!/usr/bin/env python

import pybgpstream
stream = pybgpstream.BGPStream(
    # accessing ris-live
    project="ris-live",
    # filter to show only stream from rrc00
    filter="collector rrc00",
)

for elem in stream:
    print(elem)

Route Views Stream

#!/usr/bin/env python

import pybgpstream
stream = pybgpstream.BGPStream(
    # accessing routeview-stream
    project="routeviews-stream",
    # filter to show only stream from amsix bmp stream
    filter="router amsix",
)

for elem in stream:
    print(elem)

7.2 pybgpstream

本文档描述了 pybgpstream 模块的 API，它是 C libbgpstream 库的高级接口，可提供与底层 _pybgpstream API 更友好的交互。

import pybgpstream

# create and configure the stream
stream = pybgpstream.BGPStream(
   from_time="2017-07-07 00:00:00", until_time="2017-07-07 00:10:00 UTC",
   collectors=["route-views.sg", "route-views.eqix"],
   record_type="updates",
   filter="peer 11666 and prefix more 210.180.0.0/16"
)

# add any additional (or dynamic) filters
# e.g. from peer AS 11666 regarding the more-specifics of 210.180.0.0/16:
# stream.parse_filter_string("peer 11666 and prefix more 210.180.0.0/16")
# or using the old filter interface:
# stream.add_filter("peer-asn", "11666")
# stream.add_filter("prefix-more", "210.180.0.0/16")

# read elems
for elem in stream:
   # record fields can be accessed directly from elem
   # e.g. elem.time
   # or via elem.record
   # e.g. elem.record.time
   print(elem)

# alternatively, records and elems can be read in nested loops:
for rec in stream.records():
   # do something with rec (e.g., choose to continue based on timestamp)
   print("Received %s record at time %d from collector %s" % (rec.type, rec.time, rec.collector))
   for elem in rec:
      # do something with rec and/or elem
      print("  Elem Type: %s" % elem.type)

pybgpstream模块主要包含三个类：class pybgpstream.BGPStream，class pybgpstream.BGPRecord，class pybgpstream.BGPElem。这三个类的主要功能是提供单一的BGP记录流。

class pybgpstream.BGPRecord是对底层_pybgpstream.BGPRecord的封装，所有属性都是只读的。
class pybgpstream.BGPElem是对底层 _pybgpstream.BGPElem的封装，所有属性都是只读的。

class `pybgpstream.BGPStream`

BGP流类提供单一的BGP记录流。BGPStream的主要功能还是作为启动函数，通过传入相关参数获得一个stream。对于stream对象可以直接逐行读取相关记录。

class pybgpstream.BGPStream的输入参数参数
参数	描述
from_time	指定数据流的开始时间。使用 dateutil.parser.parse 函数解析时间字符串。
until_time	指定数据流的结束时间。使用 dateutil.parser.parse 函数解析时间字符串。
data_interface	指定 BGPStream 在检索和处理数据时应使用的数据接口。
project	要检索数据的项目名称。
projects	要检索数据的多项目名称。
collector	要从其获取数据的采集器名称。
collectors	要从其获取数据的多个采集器名称。
record_type	指定要处理的记录类型：uupdates或rib。
record_types	指定要处理的记录类型：uupdates或rib。
filter	筛选器字符串。
`records`()	返回记录对象的数据流。

filter字段设置

根据提供的过滤器字符串向未启动的 BGP 流实例添加过滤器。只有与过滤器匹配的记录/元素才会被包含在流中。
如果添加了多个相同类型的筛选器，只要记录/元素与其中任何一个筛选器匹配，就会被视为匹配。例如，如果使用了项目 routeviews 和项目 ris 这两个过滤器字符串，那么routeviews或 RIS 项目中的记录都将包含在内。
如果添加了不同类型的筛选器，则记录/元素只有在匹配所有筛选器时才会被视为匹配。例如，如果使用了项目 routeviews 和前缀精确为 1.2.3.0/24 的过滤器字符串，那么只有来自routeviews项目且前缀为 1.2.3.0/24 的记录才会被包括在内。
project、collector 和 record-type 过滤 BGP 记录，而 peer-asn、prefix-exact、prefix-more、prefix-less、prefix-any、aspath、ipversion、elemtype 和 community 过滤 BGP 元素。
prefix-* 过滤器选择与前缀相关的 BGP 元素。
prefix-exact 仅匹配元素中出现的精确前缀。
prefix-more 将匹配精确前缀或观察到的更具体的前缀。
prefix-less 将匹配精确前缀或观察到的不太具体的前缀。
aspath 过滤器被指定为正则表达式，如果 AS 路径与正则表达式相匹配，就会匹配。
^ 可用来表示 AS 路径的起点，$ 可用来表示 AS 路径的终点。_ 可用来分隔路径中相邻的 ASN。例如，如果使用过滤值 ^681_1444_，则只包含 AS 路径以 AS681 开头、后跟 AS1444 的元素。
ipversion 过滤器可用于将数据流限制为 IPv4 或 IPv6 前缀。使用 4 仅获取 IPv4，使用 6 仅获取 IPv6。
elemtype 过滤器可用于将数据流限制为某些元素类型。可能的元素类型包括ribs、withdrawals, announcements and peerstates。
社区过滤器是以 asn:value 格式的字符串指定的，用户可以指定 ASn 或值，并使用 * 保留未指定的其他字段。例如，如果使用 ('community','*:300')，那么所有至少有一个社区值为 300 的 BGP 元素都将包括在内。
为未启动的 BGP 流实例添加时间间隔过滤器。只有在给定时间间隔内的记录才会包含在流中。
将停止参数设置为 0 将启用实时模式，并有效地设置一个无尽的时间间隔。
如果添加了多个时间间隔筛选器，那么如果记录位于任何一个时间间隔内，都会被包含在内。

直接从BGPStream中生成的stream对象可以直接读取相关字段数据，读取的字段参数和BGPRecord以及BGPElem 的读取参数相同。

class `pybgpstream.BGPRecord`

BGP 记录类表示从 BGP 流中获取的单条记录。

return "%s|%s|%f|%s|%s|%s|%s|%s|%d" % (self.type, self.dump_position, self.time,
                                      self.project, self.collector, self.router, self.router_ip,
                                      self.status, self.dump_time)

class pybgpstream.BGPRecord的读取参数
参数	描述
project	创建记录的项目名称，如果未设置，则为 "无"。(basestring，只读）
collector	创建记录的收集器名称，如果未设置，则为 "无"。(basestring，只读）
router	创建记录的路由器名称，如果未设置，则为 "无"。(仅在从 OpenBMP kafka 流访问数据时使用）（basestring，只读）类型
type	可以是 "update"、"RIB "或 "unknown"。(basestring，只读）
time	记录所代表的时间（即采集器生成记录的时间）。(int,只读)
status	状记录的状态，可以是 ‘valid’, ‘filtered-source’, ‘empty-source’, ‘corrupted-source’, ‘unknown’。(basestring，只读）
dump_time	与包含该记录的转储相关联的时间（例如，发现该记录的 MRT 文件的开始时间）（int，只读）
dump_position	该记录在转储中的位置，可以是 "开始"、"中间"、"结束"、"未知 "中的一个。(basestring, 只读)
get_next_elem()	从该记录中获取下一个 BGPElem。如果已读取所有元素，则返回 None。

class `pybgpstream.BGPElem`

BGP Elem 类表示使用 BGPRecord.get_next_elem() 方法从 BGP Record 实例获取的单个元素。在第 2 版中，BGPElem 对象不再包含时间字段。该信息是从记录中复制的，现在只能从记录中访问。所有属性均为只读。

return "%s|%s|%f|%s|%s|%s|%s|%s|%s|%s|%s|%s|%s|%s|%s" % (
    self.record_type,
    self.type,
    self.time,
    self.project,
    self.collector,
    self.router,
    self.router_ip,
    self.peer_asn,
    self.peer_address,
    self._maybe_field("prefix"),
    self._maybe_field("next-hop"),
    self._maybe_field("as-path"),
    " ".join(self.fields["communities"]) if "communities" in self.fields else None,
    self._maybe_field("old-state"),
    self._maybe_field("new-state")
)

class pybgpstream.BGPElem 读取参数
参数	描述
type	元素的类型，可以是 "R"（ribs）、"A"（announcement）、"W"（withdrawal）、"S"（peer state）。(基字符串，只读）
peer_address	从对等设备接收此元素的 IP 地址。(基字符串, 只读)
peer_asn	该元素接收方的 ASN。(int,只读)
fields	根据元素类型而不同的字段字典。(dict,只读)
	next-hop	下一跳 IP 地址（basestring）
	as-path	AS 路径（基字符串 AS 路径（basestring）
	prefix	前缀（basestring）
	communities	社区（一组规范的"asn:value "格式的字符串）
	old-state	对等设备的旧状态，可以是‘idle’, ‘connect’, ‘active’, ‘open-sent’, ‘open-confirm’, ‘established’.。(basestring）
	new-state	对等节点的新状态，与旧状态的可能值相同。（basestring）

8. 总结

本文介绍了 BGPStream，一个开源软件框架用于历史和实时分析边界网关协议（BGP）测量数据。尽管 BGP 是互联网基础设施，是研究的主题在互联网性能、安全性、拓扑、协议、经济等，没有有效的方法处理大量分布式和/或实时 BGP 测量数据。BGPStream 填补了这一空白，使高效调查事件，快速原型设计，以及构建复杂的工具和大规模监控应用（例如，检测连接中断或 BGP 劫持攻击）。