怎样通过GitHub API下载Repository的README文本内容

本文链接：https://blog.csdn.net/qysh123/article/details/80480246

本文介绍了一种从GitHub仓库中获取README文件的方法。通过分析GitHub API，可以手动构造下载URL来完整地获取README内容。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

这个是我在获取数据时候的一些经验，简单总结一下：

按照这里的文档：http://pygithub.readthedocs.io/en/latest/github_objects/Repository.html

要得到一个Repository的Readme文件，只需要使用：get_readme

这个方法会返回一个github.ContentFile.ContentFile对象，

http://pygithub.readthedocs.io/en/latest/github_objects/ContentFile.html

由于涉及到Base64编码，另外害怕如果过长的Readme不能完全通过content获取，我倾向于使用：download_url来获取Readme文本的下载地址，然后通过python requests进行下载，但是如果使用：

read_me=repo.get_readme()
read_me.download_url

这样的方法来得到download_url，就会报exception：'ContentFile' object has no attribute 'download_url'

这个exception真的是莫名其妙，GitHub API修改了吗？但是看其文档里也没说啊。

不过仔细分析了一下这里的几个url，发现其有固定的规律，是可以通过拼接实现的：

https://developer.github.com/v3/repos/contents/#get-contents

html_url=read_me.html_url
prefix="https://github.com/"+name+"/blob/"#前缀，name是repo的全名
suffix=html_url[html_url.index(prefix)+len(prefix):]#后缀
download_url="https://raw.githubusercontent.com/"+name+"/"+suffix

就这样简单记录一下。