有趣的python typosquatting不赚钱

by William Bengtson | @__muscles

威廉·本格森 | @__肌肉

Two years ago a few colleagues (shoutout to helloarbit, travismcpeak, and coffeetocode) and I were talking about supply chain attacks which led to this work being completed. A supply chain attack is an attack that targets dependencies of a company in hopes that they can leverage a weakness in the supply chain to damage the target company. For a company that produces software, the supply chain attack typically targets the software that is used in development of the software product.

两年前,一些同事(对helloarbittravismcpeakcoffeetocode的大喊大叫 )和我谈论的是导致该工作完成的供应链攻击。 供应链攻击是一种针对公司依赖关系的攻击,希望它们可以利用供应链中的弱点来破坏目标公司。 对于生产软件的公司,供应链攻击通常针对软件产品开发中使用的软件。

With this is mind, the software dependencies are packages/libraries in popular languages like Python, Java, Ruby, and Golang. Each language has their own way of packaging and retrieving dependencies which over the years has led to some having more proactive measures to protect against certain types of supply chain attacks than others.

考虑到这一点,软件依赖项是流行语言(如Python,Java,Ruby和Golang)中的软件包/库。 每种语言都有自己的打包和检索依赖关系的方式,多年来,这导致了某些语言比其他语言有更积极的措施来防御某些类型的供应链攻击。

打qua (Typosquatting)

I decided to take a look at what typosquatting in Python looks like and first started to look at Levenshtein to calculate the distance between two package names to determine if one thing is a typosquat of another. This can be useful to detecting a package that has been squatted and used as a tool to prevent packages from being squatted. With this in mind I wanted to see how many packages of the top installed Python packages can be squatted by removing underscores (_) or dashes (-).

我决定看一下Python中的域名抢注是什么样子,首先开始查看Levenshtein,以计算两个软件包名称之间的距离,以确定一件事是否是另一项域名的抢注。 这对于检测已被包裹的包裹并用作防止包裹被包裹的工具很有用。 考虑到这一点,我想看看可以通过删除下划线( _ )或破折号( - )来替换顶部安装的Python软件包中的多少个软件包。

数据 (Data)

One of the most fascinating pieces for me is that the Python Package Index (PyPI) makes data available about each Python package. This data can be very powerful when determining which packages are the most popular packages in the Python ecosystem. For more on analyzing PyPI downloads, checkout their guide here.

对我而言,最有趣的部分之一是Python包索引( PyPI )使每个Python包的数据可用。 当确定哪些软件包是Python生态系统中最受欢迎的软件包时,此数据可能非常强大。 有关分析PyPI下载的更多信息,请在此处查看其指南。

实作 (Implementation)

The goal of this exercise was to understand how many packages could be squatted and if possible prevent the future squat by registering them myself. Using the data mentioned above, I performed a query to tell me the top 10,000 installed Python packages using the package installer pip. Having a list of the top 10,000 pip installed packages allowed me to come up with a list of which packages could potentially be squatted by removing any_ or - from the package name as mentioned above.

本练习的目的是了解可以封装多少个软件包,并在可能的情况下通过自行注册来防止将来出现蹲袋。 使用上面提到的数据,我使用软件包安装程序pip进行了查询,以告诉我安装了10,000个最热门的Python软件包。 列出了安装的10,000个pip最高的软件包,这样我就可以列出通过从软件包名称中删除任何_-可能被蹲下的软件包列表。

I wanted to squat the potential packages so I needed to come up with a strategy for squatting them. There were a few options to choose from:

我想蹲下潜在的包裹,所以我需要提出一种将它们包裹的策略。 有一些选项可供选择:

  1. Clone the existing packages, change the name to the squatted name and register them.

    克隆现有软件包,将名称更改为蹲名并注册它们。
  2. Register the packages with a package that does nothing.

    用不执行任何操作的程序包注册程序包。
  3. Register the packages with a package that could potentially educate the person installing the squatted package.

    用可能会教育安装深蹲软件包的人员的软件包注册软件包。

Looking at the different options, I chose number three. The goal of the project has always been to be a “Guardian” and the other options seemed less than ideal in achieving the goal. Number two would have left developers wasting a lot of time trying to understand why their software doesn’t work if they installed the squatted package instead of the real package and number one seemed shady and could lead people to believe the intent behind this project had a malicious future.

考虑不同的选择,我选择了第三位。 该项目的目标始终是成为“守护者” ,而其他选择似乎并不理想。 第二名会让开发人员浪费大量时间试图理解为什么如果他们安装了蹲下的软件包而不是真正的软件包,为什么他们的软件不起作用,第一名似乎是阴暗的,并且可能使人们相信该项目背后的意图是恶意的未来。

Initially I decided to squat around 1,100 or so packages. In order to do this, I first needed to create the package to push to PyPI and then create some automation around this to make squatting this many packages achievable in a short amount of time.

最初,我决定蹲下大约1100个包裹。 为了做到这一点,我首先需要创建要推送到PyPI的软件包,然后围绕该软件包创建一些自动化措施,以使在很短的时间内即可压缩大量软件包。

I decided to create a simple package that when installed would fail and print out an error message letting you know the actual package name you probably meant to install. Below is an example of what happens when you try to install pythonjsonlogger instead of the real package python-json-logger.

我决定创建一个简单的软件包,该软件包在安装时将失败并打印一条错误消息,告知您可能要安装的实际软件包名称。 以下是当您尝试安装pythonjsonlogger而不是实际的包python-json-logger时发生的情况的示例。

pip install pythonjsonlogger
Collecting pythonjsonlogger
Downloading pythonjsonlogger-0.1.1.tar.gz (1.3 kB)
Building wheels for collected packages: pythonjsonlogger
Building wheel for pythonjsonlogger (setup.py) ... error
ERROR: Command errored out with exit status 1:
.
.
.
Complete output (29 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib
.
.
.
.
File "/private/var/folders/cy/kc766fxx37b5rf87qxkt8hj00000gp/T/pip-install-kp4jab6f/pythonjsonlogger/setup.py", line 20, in run
raise Exception("You probably meant to install and run python-json-logger")
Exception: You probably meant to install and run python-json-logger
----------------------------------------
ERROR: Failed building wheel for pythonjsonlogger

Now that I had the package setup and automation written, I kicked off the squatting and sat back. With over 1,100 packages registered, I now needed to wait to see if anyone would actually install these accidentally. What I found over the next two years is the most interesting piece of this project in my opinion.

现在,我已经写好了软件包的设置和自动化程序,我开始蹲下并坐下来。 现在已经注册了1,100多个软件包,现在我需要等待,看看是否有人真的会意外安装这些软件包。 在我看来,未来两年我发现的是该项目中最有趣的部分。

结果 (Results)

I originally meant to do a post on this after 6 months or a year, but here we are two years later and I have some data and stories to share. I’ll start with the data on 1,131 packages, and end with the stories.

我原本打算在6个月或一年后对此发表一篇文章,但是在这里,我们已经过了两年了,我有一些数据和故事可以分享。 我将从1131个数据包的数据开始,然后以故事结尾。

Top 10 squatted package downloads from July 16, 2018 until August 4, 2020:

从2018年7月16日到2020年8月4日排名前10位的蹲下软件包下载:

Image for post

In a little over two years there have been 530,950 total pip install commands run on 1,131 packages! This does not include any mirrors or internal package registries that have cloned these packages privately. Malicious packages in PyPI have been know to steal credentials stored on the local file system such as SSH credentials in ~/.ssh/, GPG keys, or perhaps AWS credentials stored in ~/.aws/credentials. If these typosquat packages were written with malicious intent and we assume one attempt per install, that would mean 530,950 machines could have been compromised over the two year period.

在短短两年多的时间里,总共有530,950个pip install命令运行在1,131个软件包上! 这不包括已经私下克隆了这些软件包的任何镜像或内部软件包注册表。 已知PyPI中的恶意软件包会窃取存储在本地文件系统上的凭据,例如~/.ssh/ SSH凭据,GPG密钥,或者也许是存储在~/.aws/credentials AWS ~/.aws/credentials 。 如果这些错别字软件包是出于恶意目的编写的,并且我们假设每次安装都要进行一次尝试,那么这意味着530,950台计算机可能在两年内受到了威胁。

While the data is incredibly interesting and you can draw your own conclusions on what could have happened if these were malicious packages, the encounters/stories I find are the most interesting.

尽管数据令人难以置信,并且您可以得出自己的结论,如果这些都是恶意软件包可能会发生什么,但是我发现的遭遇/故事是最有趣的。

Over the two years I received the following encounters:

两年来,我遇到了以下问题:

  • Help installing my package

    帮助安装我的软件包
  • Thanks for protecting the Python community

    感谢您保护Python社区
  • Text from a friend asking if I owned a certain package

    朋友发来的短信,询问我是否拥有某个包裹
  • Researchers finding my work

    研究人员找到我的工作
  • Company asking to confirm the license on one of my squatted packages

    公司要求确认我一个蹲下的包裹的许可证

帮助安装我的软件包 (Help installing my package)

Over the course of the last two years I have received numerous emails asking for help installing my package or reporting that my package is broken. Each time I’d simply reply with the correct package they should install.

在过去的两年中,我收到了许多电子邮件,要求您帮助安装我的软件包或报告我的软件包已损坏。 每次我只要回答正确的软件包就应该安装它们。

感谢您保护Python社区 (Thanks for protecting the Python community)

A few times I received emails from people who have installed my squatted package accidentally and either read the error print out or saw my package registration clearly stating it is package to prevent exploit.

几次,我收到了意外安装我蹲下的程序包的人的电子邮件,或者阅读错误打印输出或者看到我的程序包注册清楚地表明它是程序包,以防止被利用。

Image for post

In a few cases it was both a research finding my work and thanking me.

在少数情况下,这既是一项寻找我的工作并感谢我的研究。

I am working on a project for my security class related to attacks on the Python ecosystem. We kept stumbling upon your packages while trying to identify typo-squatting attempts.

我正在为我的安全类开发一个项目,该项目与对Python生态系统的攻击有关。 我们一直在绊倒您的包裹,同时尝试识别打字错误。

I just thought I would say thanks for helping out the community! :)

我只是想感谢您对社区的帮助! :)

朋友发来的短信,询问我是否拥有某个包裹 (Text from a friend asking if I owned a certain package)

I told a few friends about this work and I don’t quite remember how the interaction went down, but it was something along the lines of:

我向几个朋友介绍了这项工作,但我不太记得互动是如何发生的,但这与以下内容类似:

Friend: Hey! Do you own pythonjsonlogger?

朋友:嘿! 您是否拥有pythonjsonlogger?

Me: Yeah why?

我:是吗?

Friend: Dammit!

朋友:该死!

You know who you are :)

你知道你是谁 :)

研究人员找到我的工作 (Researchers finding my work)

I have really enjoyed each occurrence of a researcher finding my work because it typically involved a conversation around typosquatting. The example above ended up with my work being included in a symposium paper for the University of Maryland CMSC 8180 class called PYed PIPer by Josiah Wedgwood and Aadesh Bagmar.

我真的很喜欢研究人员每次都找到我的作品,因为它通常涉及围绕域名抢注的对话。 上面的示例最终使我的工作包含在由Josiah Wedgwood和Aadesh Bagmar主持的马里兰大学CMSC 8180班的研讨会论文中,名为PYed PIPer。

In the most recent case I met with the creator of pypi-scan, John Speed Myers, on Zoom for about an hour on our work and we talked about potential future collaborations in this area. Most importantly, this conversation got me excited about the topic again and prompted me to finally write this post as well as do another round of squatting 3,000+ more packages to continue to protect the Python ecosystem.

在最近的一次案例中,我在Zoom上与pypi-scan的创建者John Speed Myers会面了大约一个小时,并讨论了该领域未来的潜在合作。 最重要的是,这次对话让我再次对这个话题感到兴奋,并促使我最终写了这篇文章,并进行了另一轮蹲下3,000多个软件包来继续保护Python生态系统。

公司要求确认我一个蹲下的包裹的许可证 (Company asking to confirm the license on one of my squatted packages)

The one I find most scary is an email from a large company asking to confirm the license on one of my squatted packages for use.

我最害怕的一封邮件是某大公司发出的一封电子邮件,要求确认我一个蹲下的软件包的使用许可。

Image for post

最后的想法 (Final thoughts)

All in all this was a fun project that I never meant to take two years to actually write about. Over the last year or so, I have discussed this work with the folks at PyPI and they will actually be taking ownership of my packages once I can confirm a final list of which packages are squatted versus the real packages I actually contribute to or own.

总而言之,这是一个有趣的项目,我从来没有想过要花两年的时间来写。 在过去的一年左右的时间里,我已经与PyPI的人员讨论了这项工作,一旦我确定了被盗用的软件包的最终清单与我实际贡献或拥有的真实软件包的清单,他们实际上将拥有我的软件包的所有权。

All language ecosystems are vulnerable to this type of attack with some being harder to achieve due to things like package namespace. This work was very targeted and did not expand into squatting future libraries for evolution of projects. Supply chain security is very difficult and has been challenging companies with large pockets for many many years.

所有语言生态系统都容易受到这种类型的攻击,由于软件包名称空间之类的原因,某些语言生态系统更难实现。 这项工作非常有针对性,没有扩展到为将来的项目开发而占用未来的库。 供应链安全非常困难,并且多年来一直在挑战拥有大量资金的公司。

Most importantly, THANKS to the folks at PyPI for what they do in making Python packages available to enable folks to develop each day!

最重要的是,感谢PyPI的人们在提供Python软件包以使人们每天发展方面所做的工作!

翻译自: https://medium.com/@williambengtson/python-typosquatting-for-fun-not-profit-99869579c35d

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值