v=spf1 a:_您所有的SPF都属于我们：通过全球范围的SPF挖掘探索信任关系-CSDN博客

v=spf1 a:

In this post we explore a large collection of Sender Policy Framework (SPF) records to see what they might tell us about global email sending trust relationships and how they relate to email security providers. This is a fast follow-up to my previous post on Mining DNS MX Records for Fun and Profit.

在本文中，我们探索了大量的发件人策略框架(SPF)记录，以了解它们可能告诉我们有关全球电子邮件发送信任关系以及它们与电子邮件安全提供者之间的关系。这是我之前有关“ 挖掘DNS MX记录以获取乐趣和利润”的帖子的快速跟进。

Here is the methodology I devised for this (very similar to the previous post, but with new custom built tools):

这是我为此设计的方法(与上一篇文章非常相似，但是使用了新的自定义构建工具 )：

Collect a large sample of SPF records via DNS TXT lookups of popular domain names (and recursively resolving SPF “include” domains).
通过受欢迎域名的DNS TXT查找来收集大量SPF记录样本(并递归解析SPF“包含”域)。
Enrich SPF records with IP intelligence and useful metadata (including email security provider mappings)
通过IP智能和有用的元数据(包括电子邮件安全提供商映射 )丰富SPF记录
Analyze the enriched results.
分析丰富的结果。

发件人政策框架(SPF)简介 (Intro to Sender Policy Framework (SPF))

The Sender Policy Framework (SPF) enables domain name administrators to authorize hosts to use their domain names when sending email (i.e. in the “MAIL FROM” or “HELO” identities in SMTP). One of the goals of SPF is to limit spammer’s abilities to spoof email messages. SPF is limited and is usually used with DKIM and DMARC. SPF records are published using DNS TXT records. SPF compliant mail receivers use the published SPF records to test the authorization of sending Mail Transfer Agents (MTAs). SPF can be used to build complex policies around who can send email on whose behalf. Below is an example SPF record for Florida State University.

发送方策略框架(SPF)使域名管理员可以授权主机在发送电子邮件时使用其域名(即SMTP中的“ MAIL FROM”或“ HELO”身份)。 SPF的目标之一是限制垃圾邮件制造者欺骗电子邮件的能力。 SPF受限制，通常与DKIM和DMARC一起使用。 SPF记录使用DNS TXT记录发布。符合SPF的邮件接收者使用已发布的SPF记录来测试发送邮件传输代理(MTA)的授权。 SPF可用于围绕谁可以代表谁发送电子邮件建立复杂的策略。以下是佛罗里达州立大学的SPF记录示例。

According to this SPF record 146.201.58.212, 146.201.58.213, 146.201.107.145, 146.201.107.249, 192.12.121.23, and 199.188.157.80 are allowed to send email purporting to be from fsu.edu. Also, the SPF records from spf.protection.outlook.com, _spf.qualtrics.com, spf.blackboardconnect.com, servers.mcsv.net, and _spf.mlsend.com should be retrieved and their policies applied as well. Below are the SPF records for each of these domains. As you can see they include more and more IPs/CIDRs as well as additional SPF includes.

根据此SPF记录，允许146.201.58.212、146.201.58.213、146.201.107.145、146.201.107.249、192.12.121.23和199.188.157.80发送声称来自fsu.edu的电子邮件。另外，还应检索来自spf.protection.outlook.com，_spf.qualtrics.com，spf.blackboardconnect.com，servers.mcsv.net和_spf.mlsend.com的SPF记录，并同时应用其策略。以下是这些域中每个域的SPF记录。如您所见，它们包括越来越多的IP / CIDR以及其他SPF。

As you can see, SPF forms a chain of trust between the domain owner and all the SPF policies included recursively (potentially crossing several different administrative boundaries). In this post I was hoping to explore this chain of trust at a large scale by collecting a large sample of SPF records and mining them.

如您所见，SPF在域所有者和所有递归包含的SPF策略之间形成信任链(可能跨越多个不同的管理边界)。在这篇文章中，我希望通过收集大量SPF记录样本并进行挖掘来大规模探索这种信任链。

Below are some useful resources for understanding SPF:

以下是一些了解SPF的有用资源：

RFC7208: Sender Policy Framework (SPF) for Authorizing Use of Domains in Email
RFC7208：用于在电子邮件中授权使用域的发件人策略框架(SPF)
SPF Syntax Table — really useful guide for understanding SPF “mechanisms”.
SPF语法表 -对理解SPF“机制”非常有用的指南。

第一步：收集 (Step One: Collection)

For step one, I built a very crude useful SPF crawler that uses dig (optionally adnshost) to perform DNS TXT requests, parse out SPF records found, and then recursively follow the trail of SPF include records and perform TXT lookups against the included domains.

对于第一步，我构建了一个非常粗糙的有用SPF搜寻器，该搜寻器使用dig(可选地为adnshost)执行DNS TXT请求，解析找到的SPF记录，然后递归地跟踪SPF包含记录的踪迹，并对包含的域执行TXT查找。

In order to seed the SPF crawler, I used the same domains I used in my previous blog post on mining MX records. I downloaded the Alexa top 1M domains, Quantcast top 1m domains (from WaybackMachine), Domcop Top 10m domains, Majestic Million Domains and Cisco Umbrella top 1m domains. I identified the registered domain using tldextract for each of these and then combined them into a single de-duplicated list. This resulted in ~8.3M unique domain names.

为了播种SPF搜寻器，我使用了我以前的博客文章MX记录中使用的域。我下载了Alexa排名前1M的域名， Quantcast排名前1m的域名(来自WaybackMachine) ， Domcop排名前10m的域名， Majestic Million域名和Cisco Umbrella排名前1m的域名。我使用tldextract分别为每个域名标识了注册域，然后将它们合并为一个重复数据删除列表。这样就产生了约830万个唯一域名。

These domains were fed into my SPF crawler and then the results were collected, parsed, and then assembled. I ended up backing the SPF crawler with “dig” instead of “adnshost” this time since I found dig was more reliable, completing 23% more DNS requests in an experiment against the Fortune 1000 domains. Dig is single threaded, but I easily parallelized it using splits files and xargs and its performance ended up being good enough. See parallel_dig.sh for more details.

这些域被输入到我的SPF搜寻器中，然后收集，解析和组装结果。这次我以“ dig”代替“ adnshost”来支持SPF搜寻器，因为我发现dig更可靠，在针对《财富》 1000强域名的实验中完成了23％的DNS请求。 Dig是单线程的，但是我很容易使用splits文件和xargs将其并行化，并且其性能最终足够好。有关更多详细信息，请参见parallel_dig.sh 。

Below are a few simple commands as well as example output data collected with my SPF crawler applied to just one domain. As you can see, the assembled output for fsu.edu includes all the IPs and Netblocks from all the SPF includes that it links to, recursively.

以下是一些简单的命令以及我的SPF搜寻器仅将其应用到一个域时收集的示例输出数据。如您所见，fsu.edu的组合输出包括它递归链接到的所有SPF包含的所有IP和Netblock。

Below is the same information, visualized as a network (and enriched with ASN info from Maxmind).

以下是相同的信息，可视化为网络(并来自Maxmind的ASN信息得到了充实)。

第二步：浓缩 (Step Two: Enrichment)

For this step, I reused a lot of the code from my previous blog post on Mining MX records and performed the following enrichments:

在此步骤中，我重用了我以前的有关Mining MX记录的博客文章中的许多代码，并进行了以下充实：

Maxmind ASN
迈思敏
Maxmind Country
万智国
Cloud Provider IP Lookups for AWS, Azure, and GCP
适用于AWS，Azure和GCP的云提供商IP查找
Alexa Ranking
Alexa排名
Email Security Provider mapping
电子邮件安全提供者映射

netaddr, tldextract, and cidr-trie were useful during this stage.

netaddr ， tldextract和cidr-trie在此阶段很有用。

第三步：分析 (Step Three: Analysis)

Through this analysis, I hoped to answer the following questions:

通过此分析，我希望回答以下问题：

What is the largest trusted network size (both single CIDR and aggregate network space)? … HUGE
最大的受信任网络大小(单个CIDR和聚合网络空间)是多少？ …巨大
Could I find any blatantly misconfigured SPF records? … YES
我可以找到任何公然配置错误的SPF记录吗？ …是的
What does SPF data show about email security providers? … A lot that MX doesn’t
SPF数据显示有关电子邮件安全提供商的哪些信息？ …很多MX不
What are the most “included” SPF includes? … Not many surprises here
SPF包含的最多的是什么？ …这里没有太多惊喜
Does SPF augment the MX record mining (give more coverage? reveal things previously hidden? or 100% redundant?) … YES!
SPF是否增加了MX记录挖掘(提供更多的覆盖范围？显示以前隐藏的内容？还是100％冗余？)…是的！
Are domains trusting IP space from cloud providers that may be re-usable (i.e. AWS EC2)? … YES!
域是否信任可重复使用的云提供商的IP空间(即AWS EC2)？ …是的！

Below are some outputs and commentary from this project’s Jupyter notebook that answer the questions above.

以下是该项目的Jupyter笔记本的一些输出和评论，它们回答了上述问题。

网络图 (Network Graphs)

These networkx visualizations of the Fortune 100 and Alexa 100 are a bit of a mess, but they should get the point across of how interconnected the SPF trust relationships are.

财富100和Alexa 100的这些networkx可视化有些混乱，但是它们应该可以理解SPF信任关系之间如何相互联系。

财富100强SPF可信网络图 (Fortune 100 SPF Trusted Networks Graph)

Alexa 100 SPF可信网络图 (Alexa 100 SPF Trusted Networks Graph)

热图 (Heatmaps)

As you can see from the next several heatmaps, as we go beyond the Alexa top 1,000 domains the number of networks trusted drastically increases, and as we hit the Alexa 1m, the entire Internet is trusted (likely due to SPF misconfigurations).

从接下来的几个热点图中可以看出，随着我们超越Alexa前1,000个域，受信任的网络数量急剧增加，当我们达到Alexa 1m时，整个Internet都是受信任的(可能是由于SPF配置错误)。

These heatmaps were generated with the awesome ipv4-heatmap tool provided by the Measurement Factory. The code to automate this can be found in my Jupyter Notebook here.

这些热图是使用Measurement Factory提供的ipv4-heatmap工具生成的。自动运行该代码可以在我的笔记本Jupyter中找到这里。

财富1,000 SPF可信网络热图 (Fortune 1,000 SPF Trusted Networks Heatmap)

Alexa 1,000 SPF可信网络热图 (Alexa 1,000 SPF Trusted Networks Heatmap)

Alexa 10,000 SPF可信网络热图 (Alexa 10,000 SPF Trusted Networks Heatmap)

Alexa 100,000 SPF可信网络热图 (Alexa 100,000 SPF Trusted Networks Heatmap)

Alexa 1,000,000 SPF可信网络热图 (Alexa 1,000,000 SPF Trusted Networks Heatmap)

Alexa信任/ 7或更大网络的排名前1M域名 (Alexa Top 1M Domains Trusting /7 or larger networks)

As you can see from this list, there are quite a few domains that trust very large networks. Several of these seem like likely misconfigurations. For example, these four domains trust the entire Internet:

从该列表中可以看到，有很多域信任大型网络。其中一些看起来可能是错误的配置。例如，这四个域信任整个Internet：

hitadouble[.]com: 208.67.207.0/0
hitadouble [。] com：208.67.207.0/0
payukraine[.]com: 0.0.0.0/0
payukraine [。] com：0.0.0.0/0
angliss[.]edu[.]au: 0.0.0.0/0
angliss [。] edu [。] au：0.0.0.0/0
hutkigrosh[.]by: 0.0.0.0/0
hutkigrosh [。]上传者：0.0.0.0/0

This domain trusts half of the Internet — salaam[.]af: 175.106.32.0/1

此域信任Internet的一半-salaam [.af]：175.106.32.0/1

And these five domains trust 1/4 of the Internet. cfe[.]fr appears to have fixed this apparent misconfiguration now. As their TXT record has changed.

这五个域信任Internet的1/4。 cfe [。] fr现在似乎已经修复了此明显的错误配置。由于他们的TXT记录已更改。

creativecircle[.]com: 64.4.22.64/2
creativecircle [。] com：64.4.22.64/2
gevestor[.]de: 91.241.72.0/2
gevestor [。] de：91.241.72.0/2
debeersgroup[.]com: 10.47.149.168/2
debeersgroup [。] com：10.47.149.168/2
cfe[.]fr: 82.97.62.0/2
cfe [。] fr：82.97.62.0/2
adecco[.]com: 148.105.8.0/2
adecco [。] com：148.105.8.0/2

顶级SPF包括所有顶级域名列表(通过SPF) (Top SPF Includes from all top domain lists (via SPF))

Using all the popular domain names, here is a summary of the top 10 SPF includes.

使用所有流行的域名，以下是SPF排名前10位的摘要。

Major Cloud Email Providers:

主要的云电子邮件提供商：

Microsoft: spf.protection.outlook.com
微软：spf.protection.outlook.com
Google: _spf.google.com
Google：_spf.google.com

Hosting Providers:

托管服务提供商：

HostGator: websitewelcome.com
HostGator：websitewelcome.com
OVH: mx.ovh.com
OVH：mx.ovh.com
Bluehost: bluehost.com
Bluehost：bluehost.com

Commercial Email Marketing companies

商业电子邮件营销公司

MailChimp: servers.mcsv.net
MailChimp：servers.mcsv.net
Mandrill: spf.mandrillapp.com (MailChimp add-on)
Mandrill：spf.mandrillapp.com(MailChimp附加组件)
Sendgrid: sendgrid.net
Sendgrid：sendgrid.net

Email Security company:

电子邮件安全公司：

MailChannels: mailchannels.net (more on this later)
MailChannels：mailchannels.net(稍后会有更多信息)

SPF列出的《财富》 1000强产品中的顶级SPF (Top SPF Includes from Fortune 1000 (via SPF))

顶级SPF包括Alexa top1m (Top SPF Includes from Alexa top1m)

电子邮件安全提供商 (Email Security Providers)

If you read my previous blog post on Mining DNS MX Records for Fun and Profit, then you might notice that these top lists look significantly different than the top email providers as identified from MX records. The top 5 providers identified in the SPF data are MailChannels, Mimecast, Proofpoint, Solarwinds, and Barracuda. In the MX post, the top 5 were Proofpoint, Mimecast, Deteque, Barracuda, and Solarwinds, AND MailChannels was #48 on that list. These top lists are using all the popular domains data which is likely not an accurate reflection of the actual email security market. When reviewing the Fortune 1000 top Email Security providers the story is not as surprising as the top 4 from the Fortune 1000 Email security providers were nearly identical across SPF and MX records with just the order being different. I suspect that MailChannels shows up as popular in SPF because either it is the default setting on newly registered domains OR it is the default setting for domains that are parked with certain hosting providers, but I haven’t spent the time to prove/disprove this.

如果您阅读我以前的有关为娱乐和利润挖掘DNS MX记录的博客文章，那么您可能会注意到，这些顶部列表看起来与从MX记录中确定的顶部电子邮件提供商明显不同。 SPF数据中确定的前5位提供商是MailChannels，Mimecast，Proofpoint，Solarwinds和Barracuda。在MX帖子中，前5名是Proofpoint，Mimecast，Detque，Barracuda和Solarwinds，而MailChannels在该列表中排名第48。这些顶部列表使用了所有流行的域数据，这些数据可能无法准确反映实际的电子邮件安全市场。当回顾《财富》 1000强电子邮件安全提供商时，故事并不令人惊讶，《财富》 1000强电子邮件安全提供商的前4名在SPF和MX记录中几乎相同，只是顺序不同。我怀疑MailChannels在SPF中似乎很受欢迎，因为它是新注册的域上的默认设置，还是它是某些托管服务提供商托管的域的默认设置，但是我没有花时间来证明/反证这。

(Update 7/7/2020) I received this message from Ken Simpson, CEO of MailChannels, that helps explain why there is a mismatch between the MX and SPF counts.

(更新7/7/2020)我从MailChannels首席执行官Ken Simpson收到了此消息，该消息有助于解释为什么MX和SPF计数之间不匹配。

“You were wondering why MailChannels shows up in a lot of SPF records (actually, we’re number one), but relatively few MX records. MailChannels delivers email for the web hosting industry, with over 700 service provider customers worldwide. To deliver email reliably, they have to add us to their customers’ SPF records. Those same customers often host their inbound email with someone else — GSuite, Microsoft 365, or another provider. Hence the mismatch in SPF and MX records.”

“您想知道为什么MailChannels出现在许多SPF记录中(实际上，我们排在第一位)，而MX记录却相对较少。 MailChannels为网络托管行业提供电子邮件，全球有700多家服务提供商客户。为了可靠地发送电子邮件，他们必须将我们添加到客户的SPF记录中。这些相同的客户通常将他们的入站电子邮件托管给其他人-GSuite，Microsoft 365或其他提供商。因此，SPF和MX记录不匹配。”

One other interesting aspect with SPF is it (potentially) reveals relationships with multiple email security providers. See the “Fortune 100 Email Security Providers Listing (via SPF)” and “Domains with 4 or more Email Security Providers (via SPF)” gists below. In the Fortune 100 list, there are 3 domains with SPF relationships with more than one provider. If you look across all the top domains data you can see there are many. For anyone who has worked in the cyber security department at a large company before, this is not surprising, but it was cool to be able to see this in the data.

SPF另一个有趣的方面是(潜在地)揭示了与多个电子邮件安全提供商的关系。请参阅下面的要点，“财富100强电子邮件安全提供商列表(通过SPF)”和“具有4个或更多电子邮件安全提供商的域(通过SPF)”。在“财富100强”列表中，有3个与多个提供商具有SPF关系的域。如果查看所有顶级域名数据，就会发现有很多。对于以前曾在一家大公司的网络安全部门工作过的人来说，这并不奇怪，但是能够在数据中看到它是很酷的。

Domains with 2 SPF relationships with Email Security Providers: 11,393
与电子邮件安全提供商有2个SPF关系的域：11,393
Domains with 3 SPF relationships with Email Security Providers: 468
与电子邮件安全提供商有3个SPF关系的域：468
Domains with 4 SPF relationships with Email Security Providers: 35
与电子邮件安全提供商有4个SPF关系的域：35
Domains with 5 SPF relationships with Email Security Providers: 1
与电子邮件安全提供商有5个SPF关系的域：1

所有顶级域列表中的顶级电子邮件安全提供程序(通过SPF) (Top Email Security Provider from all top domain lists (via SPF))

来自Alexa 1m的顶级电子邮件安全提供商(通过SPF) (Top Email Security Provider from Alexa 1m (via SPF))

《财富》 1000强公司(通过SPF)排名第一的电子邮件安全提供商 (Top Email Security Provider from Fortune 1000 (via SPF))

《财富》 100强公司的顶级电子邮件安全提供商(通过SPF) (Top Email Security Provider from Fortune 100 (via SPF))

财富100强电子邮件安全提供商列表(通过SPF) (Fortune 100 Email Security Providers Listing (via SPF))

具有4个或更多电子邮件安全提供程序的域(通过SPF) (Domains with 4 or more Email Security Providers (via SPF))

信任云提供商网络 (Trusting Cloud Provider Networks)

As you can see from the next few tables, many domains transitively trust a lot of Cloud provider IP space for SPF. For some of the larger networks trusted it seems like this carries risk since it may be possible for the cloud IP space to get reused; see Fishing the AWS IP Pool for Dangling Domains for a practical example of this. Like I mentioned earlier, SPF is usually used with DKIM and DMARC so this data doesn’t paint the whole picture. I am hoping to dive into DMARC/DKIM next.

从下面的几张表中可以看出，许多域为SPF传递了许多Cloud provider IP空间。对于某些受信任的较大网络而言，这似乎带来了风险，因为云IP空间有可能被重用。有关此示例，请参阅将AWS IP池钓鱼用于悬挂域。就像我之前提到的，SPF通常与DKIM和DMARC一起使用，因此此数据不会完全描绘出来。我希望接下来深入探讨DMARC / DKIM。

Alexa 1000信任AWS网络 (Alexa 1000 Trusting AWS Networks)

Alexa 1000信任Azure网络 (Alexa 1000 Trusting Azure Networks)

Alexa 1000信任GCP网络 (Alexa 1000 Trusting GCP Networks)

财富1000强信任AWS网络 (Fortune 1000 Trusting AWS Networks)

财富1000强信任Azure网络 (Fortune 1000 Trusting Azure Networks)

财富1000强信任GCP网络 (Fortune 1000 Trusting GCP Networks)

一些其他可能有趣的结果，不值得在此处转储： (Some other potentially interesting results, not worth dumping here:)

Alexa top1m domains trusting AWS Networks
信任AWS网络的Alexa top1m域
Alexa top1m domains trusting Azure Networks
Alexa信任Azure网络的top1m个域
Alexa top1m domains trusting GCP Networks
信任GCP网络的Alexa top1m域
Top Maxmind ASNs of SFP Trusted Networks from Fortune 1000 (via SPF)
《财富》 1000强公司(通过SPF)评出的SFP可信网络的顶级Maxmind ASN
Top Maxmind ASNs of SFP Trusted Networks from all top domain lists (via SPF)
来自所有顶级域名列表的SFP可信网络的顶级Maxmind ASN(通过SPF)
Top Maxmind ASNs of SFP Trusted Networks from Alexa top1m (via SPF)
Alexa top1m(通过SPF)中的SFP Trusted Networks顶级Maxmind ASN
Graph analytics applied to Fortune 1000 and Alexa 1000: degree centrality, edge betweenness centrality, pagerank, closeness centrality, triangle counts, and connected components stats, see the notebook and search for “print_graph_metrics”.
适用于《财富》 1000强和Alexa 1000的图形分析：度中心性，边缘中间性中心，页面等级，紧密度中心性，三角形计数和连接的组件统计信息，请参阅笔记本并搜索“ print_graph_metrics”。

未来的工作 (Future Work)

SPF Crawler enhancements: As you can see from the SPF guide I shared above for “a” and “mx”, SPF supports some fairly complex policies for allowing certain IPs to send email (esp. the prefix operators on these SPF mechanisms). I did not provide support for these mechanisms in the first version of my SPF crawler mainly due to the complexity involved. Because of this, my results will under represent the trust relationships where these are used. I hope to add support for these operators to expand what could be found in this data.
SPF爬网程序增强功能：从上面我为“ a”和“ mx”共享的SPF指南中可以看出，SPF支持一些相当复杂的策略，以允许某些IP发送电子邮件(尤其是这些SPF机制上的前缀运算符)。主要由于所涉及的复杂性，我未在SPF搜寻器的第一版中提供对这些机制的支持。因此，我的结果将代表使用它们的信任关系。我希望增加对这些运算符的支持，以扩展在此数据中可以找到的内容。
Try some more graph analytics on the entire dataset. In the Jupyter notebook I ran several graph algorithms on subsets of the entire graph (Fortune 100 and Alexa 100). These showed some mildly interesting results, but testing against larger graphs caused graphviz to fail due to some data format issues that I have not had a chance to research.
在整个数据集上尝试其他图形分析。在Jupyter笔记本中，我对整个图形的子集(财富100和Alexa 100)运行了几种图形算法。这些结果显示了一些令人感兴趣的结果，但是针对较大图形的测试导致graphviz失败，原因是我没有机会研究某些数据格式问题。
Perform another study measuring DMARC and DKIM usage across popular domains.
进行另一项研究，以评估跨流行域的DMARC和DKIM使用情况。

资源资源 (Resources)

As usual all notebooks, code, and summary results can be found in Github: https://github.com/covert-labs/mx-intel.

像往常一样，所有笔记本，代码和摘要结果都可以在Github中找到： https : //github.com/covert-labs/mx-intel 。

And all data can be found at the links below:

所有数据都可以在下面的链接中找到：

all-registered-domains.txt.gz — base domains extracted from combining several popular domains lists together and then uniqued.
all-registered-domains.txt.gz-从将几个常用域列表组合在一起然后唯一的基础域。
all-registered-domains-outputs-combined.txt.gz — raw dig output for all the TXT requests.
all-registered-domains-outputs-combined.txt.gz-所有TXT请求的原始摘要输出。
spf-results-all-registered-domains.json.gz — the parsed results from running the SPF Crawler against all-registered-domains.txt.gz.
spf-results-all-registered-domains.json.gz —对all-registered-domains.txt.gz运行SPF 爬网程序的解析结果。
spf-linked-all-registered-domains.json.gz — the assembled results from processing spf-results-all-registered-domains.json.gz. This is the collapsed/combined data that shows all the SPF domains and networks included recursively.
spf-linked-all-registered-domains.json.gz —处理spf-results-all-registered-domains.json.gz的组合结果。这是折叠/合并的数据，该数据递归显示了所有SPF域和网络。