git文件删除文件_教程从git删除大文件

最新推荐文章于 2024-07-06 18:11:33 发布

weixin_26712121

最新推荐文章于 2024-07-06 18:11:33 发布

阅读量1k

点赞数

文章标签： git python linux github ubuntu

原文链接：https://medium.com/analytics-vidhya/tutorial-removing-large-files-from-git-78dbf4cf83a

版权

git文件删除文件

How to overcome the “error: GH001: Large files detected” error message when you’re pushing changes to GitHub

将更改推送到GitHub时如何解决“错误：GH001：检测到大文件”错误消息

This tutorial uses the commit hashes from this GitHub repository, although all necessary information is contained in this blog post.

本教程使用了来自 GitHub存储库 的提交哈希 ，尽管所有必要信息都包含在此博客文章中。

错误讯息 (The Error Message)

So, you just tried to run git push, and after taking longer than usual, you get an error trace like this one:

因此，您只是尝试运行git push ，并且在花费比平时更长的时间后，将得到如下所示的错误跟踪：

remote: error: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.comremote: error: Trace: 08740bd2fb02f980041be67b73e715a9remote: error: See http://git.io/iEPt8g for more information.remote: error: File csv_building_damage_assessment.csv is 218.83 MB; this exceeds GitHub's file size limit of 100.00 MBTo https://github.com/hoffm386/git-large-file-example.git! [remote rejected] master -> master (pre-receive hook declined)error: failed to push some refs to 'https://github.com/hoffm386/git-large-file-example.git'

发生了什么？ (What Happened?)

When you tried to run git push, it failed. None of your changes have been pushed to GitHub, although nothing has changed locally. The reason the push failed is best highlighted by this line of the error message:

当您尝试运行git push ，它失败了。尽管本地没有任何更改，但您所做的任何更改都没有推送到GitHub。错误消息的这一行可以最好地突出显示推送失败的原因：

remote: error: File csv_building_damage_assessment.csv is 218.83 MB; this exceeds GitHub's file size limit of 100.00 MB

In my case the file was called csv_building_damage_assessment.csv, but any file larger than 100MB can cause this error (.zip, .pdf, .xlsx, .pkl, etc.). To quote from the GitHub documentation:

在我的情况下，该文件名为csv_building_damage_assessment.csv ，但是任何大于100MB的文件都可能导致此错误(.zip，.pdf，.xlsx，.pkl等)。引用GitHub文档：

“GitHub limits the size of files allowed in repositories, and will block a push to a repository if the files are larger than the maximum file limit…GitHub blocks pushes that exceed 100 MB.”

“ GitHub限制了存储库中允许的文件大小，如果文件大于最大文件限制，它将阻止向存储库的推送…GitHub阻止超过100 MB的推送。”

GitHub provides a lot of services for free, but they generally charge money for storing and versioning large files through their Large File Storage product, and do not allow files larger than 100MB to be pushed to their standard repositories.

GitHub免费提供许多服务，但通常会通过其“ 大文件存储”产品收取大文件存储和版本控制的费用，并且不允许将大于100MB的文件推送到其标准存储库中。

不该做什么 (What Not to Do)

Often a beginner’s first intuition is just to make a new commit that deletes the large file, something like:

通常，初学者的第一个直觉只是进行新的提交以删除大文件，例如：

git rm csv_building_damage_assessment.csv
git commit -m "removing large file"

Unfortunately, this won’t work, since when you push something to GitHub, GitHub is actually promising to keep track of each any every commit, and allow you to roll back to any place in your history. So if you push a sequence of commits that adds then deletes a large file, that’s still asking GitHub to store the large file, and GitHub will still block the push. You need a solution that “rewrites history” to make it seem to GitHub that you never added the large file in the first place!

不幸的是，这是行不通的，因为当您将某些内容推送到GitHub时，GitHub实际上承诺会跟踪每个提交，并允许您回滚到历史记录中的任何位置。因此，如果您推送一系列提交，然后添加然后删除一个大文件，这仍然要求GitHub存储该大文件，并且GitHub仍然会阻止推送。您需要一个“重写历史记录”的解决方案，以使GitHub在您看来从来没有首先添加大文件！

该怎么办 (What to Do)

We need to amend (i.e. edit) the commit where the large file was added in order to “rewrite history”.

我们需要修改 (即编辑) 为了“重写历史记录”而添加大文件的提交。

The number of commands required to do this depends on which commit added the large file. The two scenarios are:

为此所需的命令数取决于哪个提交添加了大文件。这两种情况是：

The large file was just added in the most recent commit
大文件刚刚添加到最近的提交中
The large file was committed prior to the most recent commit
大文件是在最近一次提交之前提交的

CAUTION: when you start rewriting history, accidentally running the wrong command means potentially deleting the large file. If the contents of the file are important to you, make a copy of the file outside of the repository (e.g. on your desktop) so that you can recover it later.

注意：当您开始重写历史记录时，不小心运行错误的命令意味着可能会删除大文件。如果文件的内容对您很重要，请在存储库外部(例如，在桌面上)复制文件，以便以后可以恢复它。

Let’s start by addressing scenario 1, which is easier to handle.

让我们从解决方案1开始，它更易于处理。

方案1：在最近的提交中刚刚添加了大文件 (Scenario 1: The Large File Was Just Added in the Most Recent Commit)

In this scenario, you can amend the most recent commit to remove the large file. This is the same as any other action that amends the most recent commit: first you make the change, then you run git commit --amend. In the case of the example above, you would run the following in the terminal (at the root of your repository):

在这种情况下，您可以修改最新的提交以删除大文件。这与修改最新提交的任何其他操作相同：首先进行更改，然后运行git commit --amend 。对于上面的示例，您将在终端中(存储库的根目录)运行以下命令：

git rm --cached csv_building_damage_assessment.csv
git commit --amend -C HEAD

(Replacing csv_building_damage_assessment.csv with the name of your large file)

(用大文件名替换csv_building_damage_assessment.csv )

That’s it! The large file has been removed from the commit history, and you should now be able to push to GitHub.

而已！大文件已从提交历史记录中删除，您现在应该能够推送到GitHub。

方案2：大文件是在最近一次提交之前提交的 (Scenario 2: The Large File Was Committed Prior to The Most Recent Commit)

This situation is more complicated than the first one, but still fixable! There are multiple possible approaches, but I recommend an interactive rebase as the simplest approach that still maintains fine-grained control.

这种情况比第一种情况复杂，但仍然可以解决！有多种可能的方法，但是我建议使用交互式变基作为最简单的方法，该方法仍可保持细粒度的控制。

The explanation for this scenario is long enough that I’m going to add a whole new heading for it:

这种情况的解释足够长，我将为此添加一个新的标题：

用于删除大文件的交互式变基 (Interactive Rebase for Removing Large Files)

Conceptually what we’re doing here is looking back through the Git history, finding the commit where the large file was added, and editing that commit while leaving the others alone.

从概念上讲，我们在这里所做的工作是回顾Git历史记录，找到添加大文件的提交，并在不影响其他提交的情况下编辑该提交。

找到最后一个“良好”提交 (Locating the Last “Good” Commit)

Run this command in the terminal to print out the commit history of the repository:

在终端中运行以下命令以打印出存储库的提交历史记录：

git log

Your output might look slightly different based on your settings, but in general you should see something like this:

根据您的设置，您的输出可能看起来略有不同，但是通常您会看到类似以下的内容：

099e6e4 update gitignore to ignore large data file
de69e51 preliminary exploratory data analysis
d1bfae6 download data CSV
8464da4 update README
48f7303 Initial commit

When I look at this output, I can see that “download data CSV” was when I downloaded this large file. This is just one example of why it’s important to write meaningful commit messages! If you’re not sure which commit was the last “good” one, you’ll need to try this process repeatedly with different commits, until you find the right one.

查看此输出时，可以看到“下载数据CSV”就是我下载此大文件时的情况。这只是为什么写有意义的提交消息很重要的一个例子！如果不确定哪个提交是最后一个“好的”提交，则需要使用不同的提交重复尝试此过程，直到找到正确的提交为止。

So, now that I’ve identified the message of the last “good” commit as “update README”, I need to identify the commit hash. This is the unique identifier of the commit, in this case located to the left of the commit message. It’s highlighted in bold here:

因此，既然我已经将最后一次“良好”提交的消息标识为“ update README”，那么我需要标识提交哈希。这是提交的唯一标识符，在这种情况下，位于提交消息的左侧。它在此处以粗体突出显示：

099e6e4 update gitignore to ignore large data file
de69e51 preliminary exploratory data analysis
d1bfae6 download data CSV8464da4 update README
48f7303 Initial commit

在上一次“良好”提交和当前提交之间启动重新设置 (Initiate a Rebase Between the Last “Good” Commit and the Current Commit)

Now that I’ve identified the commit hash, I run this command using that hash:

现在，我已经确定了提交哈希，然后使用该哈希运行以下命令：

git rebase -i 8464da4

This will open up a file in your Git editor (in my case, Vim), that looks something like this:

这将在您的Git编辑器(在我的情况下为Vim)中打开一个文件，如下所示：

pick d1bfae6 download data CSV
pick de69e51 preliminary exploratory data analysis
pick 099e6e4 update gitignore to ignore large data file# Rebase 8464da4..099e6e4 onto 8464da4 (3 commands)
#
# Commands:
# p, pick <commit> = use commit
# r, reword <commit> = use commit, but edit the commit message
# e, edit <commit> = use commit, but stop for amending
# s, squash <commit> = use commit, but meld into previous commit
# f, fixup <commit> = like "squash", but discard this commit's log message
# x, exec <command> = run command (the rest of the line) using shell
# b, break = stop here (continue rebase later with 'git rebase --continue')
# d, drop <commit> = remove commit
# l, label <label> = label current HEAD with a name
# t, reset <label> = reset HEAD to a label
# m, merge [-C <commit> | -c <commit>] <label> [# <oneline>]
# .       create a merge commit using the original merge commit's
# .       message (or the oneline, if no original merge commit was
# .       specified). Use -c <commit> to reword the commit message.
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#
# Note that empty commits are commented out

There’s a lot of information in there, since an interactive rebase can be used for a lot of things, not just removing large files from history. The only lines in this file that actually matter are the first three, everything else is just providing instructions:

那里有很多信息，因为交互式资源库可以用于很多事情，而不仅仅是从历史记录中删除大文件。该文件中唯一重要的行是前三行，其他所有行仅提供了说明：

pick d1bfae6 download data CSV
pick de69e51 preliminary exploratory data analysis
pick 099e6e4 update gitignore to ignore large data file

Notice how the last “good” commit is not here, it’s just the commits that happened after that one. I happen to know that the commit causing the problem was the “download data CSV” one, so I’m going to edit the file to say that I want to edit that commit, and just pick (i.e. keep without changes) the other two:

请注意，最后一个“好的”提交不在这里了，只是在那之后发生的提交。我碰巧知道导致问题的提交是“下载数据CSV”，因此我将编辑该文件以表示我要edit该提交，然后pick (即保持不变)其他两个文件：

edit d1bfae6 download data CSV
pick de69e51 preliminary exploratory data analysis
pick 099e6e4 update gitignore to ignore large data file

Then I save and close the file (:wq in Vim)

然后我保存并关闭文件(Vim中的:wq )

修改提交 (Amending the Commit)

Now that I’ve closed the file, I see this message in the terminal:

既然我已经关闭了文件，那么我会在终端中看到以下消息：

Stopped at d1bfae6...  download data CSV
You can amend the commit now, with  git commit --amendOnce you are satisfied with your changes, run  git rebase --continue

So now I’ll run essentially the same command as if I had only added the large file in the most recent commit:

因此，现在我将运行基本上与在最近一次提交中仅添加大文件的命令相同的命令：

git rm --cached csv_building_damage_assessment.csv
git commit --amend -C HEAD

If you made additional changes to the repository other than just adding the file, that’s all you need to do.

如果您不只是添加文件而对存储库进行了其他更改，那就是您要做的全部。

But if adding that CSV was literally the only thing you did in that commit, you might get this message:

但是， 如果添加CSV实际上是您在该提交中所做的唯一事情 ，则可能会收到以下消息：

interactive rebase in progress; onto 8464da4
Last command done (1 command done):
  edit d1bfae6 download data CSV
Next commands to do (2 remaining commands):
  pick de69e51 preliminary exploratory data analysis
  pick 099e6e4 update gitignore to ignore large data file
You are currently splitting a commit while rebasing branch 'master' on '8464da4'.Untracked files:
  csv_building_damage_assessment.csvNo changes
You asked to amend the most recent commit, but doing so would make
it empty. You can repeat your command with --allow-empty, or you can
remove the commit entirely with "git reset HEAD^".

If that’s the case — you didn’t to anything except add a file or files too large for GitHub in this commit—you have a couple of options, including starting over (or using git rebase --edit-todo) and replacing pick with drop instead of edit. But because I like being able to see the original commit history, I recommend that you use the --allow-empty flag. In this case, that would mean:

如果是这种情况，除了在提交中添加一个文件或对于GitHub太大的文件外，您什么都没有做，您有两种选择，包括重新开始(或使用git rebase --edit-todo )和替换pick为drop而不是edit 。但是因为我喜欢能够看到原始的提交历史记录，所以我建议您使用--allow-empty标志。在这种情况下，这意味着：

git commit --amend --allow-empty -C HEAD

Now you should be done amending the commit (whether or not you had to re-run the command with --allow-empty). If you run git status, it will look something like this:

现在，您应该完成提交的修改了 (无论是否必须使用--allow-empty重新运行命令)。如果您运行git status ，它将看起来像这样：

interactive rebase in progress; onto 8464da4
Last command done (1 command done):
   edit d1bfae6 download data CSV
Next commands to do (2 remaining commands):
   pick de69e51 preliminary exploratory data analysis
   pick 099e6e4 update gitignore to ignore large data file
  (use "git rebase --edit-todo" to view and edit)
You are currently editing a commit while rebasing branch 'master' on '8464da4'.
  (use "git commit --amend" to amend the current commit)
  (use "git rebase --continue" once you are satisfied with your changes)Untracked files:
  (use "git add <file>..." to include in what will be committed)      csv_building_damage_assessment.csvnothing added to commit but untracked files present (use "git add" to track)

完成基准 (Finishing the Rebase)

Once the commit adding the large file is fixed, the last thing you need to do is finish the rebase with:

固定添加大文件的提交后，您需要做的最后一件事是使用以下命令完成变基：

git rebase --continue

You should see an output like:

您应该看到类似以下的输出：

Successfully rebased and updated refs/heads/master.

Now you should be able to git push without any error messages about large files ✨

现在您应该能够进行git push而不会出现有关大文件的任何错误消息messages

回顾 (Recap)

This error message happens when you try to push a file larger than 100MB to GitHub. To fix this issue, you can’t just remove the file from future commits, you need to “rewrite history” and edit whichever commit introduced the large file.

当您尝试将大于100MB的文件推送到GitHub时，会出现此错误消息。要解决此问题，您不能仅从以后的提交中删除文件，而是需要“重写历史记录”并编辑引入大文件的任何提交。

If the large file was added in the most recent commit, you can just run:

如果大文件是在最近的commit中添加的，则可以运行：

git rm --cached <filename> to remove the large file, then
git rm --cached <filename>删除大文件，然后
git commit --amend -C HEAD to edit the commit
git commit --amend -C HEAD编辑提交

If the large file was added in an earlier commit, I recommend running an interactive rebase. That means you need to:

如果较大的文件是在较早的提交中添加的，则建议运行交互式变基。这意味着您需要：

Run git log to find the commit hash of the last commit before you added the large file
在添加大文件之前，运行git log查找上次提交的提交哈希
Then run git rebase -i <commit hash>. This will open up an editor where you want to replace pick with edit on the commit where the large file was added.
然后运行git rebase -i <commit hash> 。这将打开一个编辑器，您要在其中使用添加大文件的提交上的edit替换pick 。
Once you save and close the editor, you’ll be in essentially the same position as if you had added the file in the most recent commit—all you need to do is git rm --cached <filename> and git commit --amend -C HEAD (same as the “most recent commit” steps)
保存并关闭编辑器后，您的位置将与在最近一次提交中添加文件时的位置大致相同—您所要做的就是git rm --cached <filename>和git commit --amend -C HEAD (与“最近提交”步骤相同)
Then to finish up, run git rebase --continue
然后完成，运行git rebase --continue

常见问题和跟进 (FAQs and Follow-ups)

Where is the example dataset from? It contains earthquake data from the Nepal Earthquake Open Data Portal. Check out this cool GitHub repo with a machine learning analysis one of my students did, using this data:

示例数据集来自哪里？ 它包含来自尼泊尔地震开放数据门户的地震数据。使用以下数据，通过我的一个学生进行的机器学习分析，查看了这个很棒的GitHub存储库：

When I run git rebase -i it opens in VS Code, Atom, or Sublime Text, and Git acts like I’m closing the file immediately. What can I do? Due to some quirks of how Git interacts with files, you’ll need to have your Git editor configured to be a command-line text editor rather than a richer editor like VS Code, Atom, or Sublime Text. I use Vim, which you can set as your editor by running this line in the terminal:

当我运行 git rebase -i 它将在VS Code，Atom或Sublime Text中打开，Git的行为就像我立即关闭文件一样。 我能做什么？ 由于Git与文件交互的方式有些古怪，因此您需要将Git编辑器配置为命令行文本编辑器，而不是像VS Code，Atom或Sublime Text这样的更丰富的编辑器。我使用Vim，您可以通过在终端中运行以下行来将其设置为编辑器：

git config --global core.editor vim

You could also use Emacs if you prefer that interface:

如果您喜欢该界面，也可以使用Emacs：

git config --global core.editor emacs

Then you should be able to proceed with git rebase -i and edit the file and indicate which commit(s) you want to amend.

然后，您应该能够继续使用git rebase -i并编辑文件，并指出要修改的提交。

Is an interactive rebase the only solution for removing large files like this? No, the solution described here is not the only solution for this issue. Check out this StackOverflow post for several other approaches:

交互式重新定位是否是删除此类大型文件的唯一解决方案？ 不，这里描述的解决方案不是解决此问题的唯一方法。查看此StackOverflow帖子，了解其他几种方法：

What if I was trying to push the large file on purpose, since the other people working on the project need it? First, try to see if you can make the file small enough. Maybe split it into a couple of files, remove parts of the data you don’t need, or compress it to make it smaller. If that doesn’t work, you’ll need to find some other way to distribute the file. If the file is fairly static, consider adding it to a cloud storage location, e.g. AWS S3, where you can usually get a decent amount of storage in the free tier. If you need version control on the file (i.e. to track changes), check out Git LFS, which is an open-source tool that typically costs more when you use a server like GitHub.

如果由于其他在项目上工作的人需要它，我试图有目的地推送大文件怎么办？ 首先，尝试查看是否可以使文件足够小。也许将其拆分为几个文件，删除不需要的数据部分，或压缩以使其更小。如果这不起作用，则需要找到其他分发文件的方法。如果文件相当静态，请考虑将其添加到云存储位置，例如AWS S3 ，在该位置通常可以在免费层中获得相当数量的存储。如果您需要文件的版本控制(即跟踪更改)，请查看Git LFS ，这是一个开放源代码工具，当您使用GitHub之类的服务器时，其价格通常更高。

Can I use this same technique to “rewrite history” in other ways, e.g. removing a password from my Git history or adding the right author information? Yes, but you want to be very careful. With large files, GitHub prevents you from pushing your commits, so rewriting history in this way only affects code that is stored locally on your computer. For something like a password (or anything else that’s smaller than 100MB), GitHub doesn’t prevent you from pushing your commits, so it’s possible that other developers on your team have already pulled down your changes. You’ll need to use the --force flag to push, and your collaborators will need to follow these instructions. This approach should be the last resort, and it’s always better to avoid getting in this kind of situation in the first place.

我是否可以通过其他方式(例如从我的Git历史记录中删除密码或添加正确的作者信息)使用同一技术来“重写历史记录”？ 是的，但是您要非常小心。对于大文件，GitHub阻止您推送提交，因此以这种方式重写历史记录只会影响存储在本地计算机上的代码。对于密码之类的东西(或其他任何小于100MB的东西)，GitHub不会阻止您推送您的提交，因此您团队中的其他开发人员很可能已经撤消了您的更改。您需要使用--force标志进行推送，并且您的协作者将需要遵循以下说明。这种方法应该是万不得已的方法，始终最好避免首先遇到这种情况。

Thanks for reading! Happy pushing!

谢谢阅读！快乐推！

翻译自: https://medium.com/analytics-vidhya/tutorial-removing-large-files-from-git-78dbf4cf83a

git文件删除文件

weixin_26712121

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
git文件删除文件_教程从git删除大文件

git文件删除文件How to overcome the “error: GH001: Large files detected” error message when you’re pushing changes to GitHub 将更改推送到GitHub时如何解决“错误：GH001：检测到大文件”错误消息 This tutorial uses the commit hashes from ...
复制链接

扫一扫