git 初始化git存储库_使用Git管理庞大的存储库

最新推荐文章于 2024-07-25 22:23:08 发布

culh2177

最新推荐文章于 2024-07-25 22:23:08 发布

阅读量393

点赞数

文章标签： python git java 数据库大数据

原文链接：https://www.sitepoint.com/managing-huge-repositories-with-git/

版权

git 初始化git存储库

Linus Torvalds created Git in the mid 2000s to solve a problem that other open source version control systems at that time could not—to be distributed, reliable and fast.

Linus Torvalds在2000年代中期创建了Git，以解决当时其他开放源代码版本控制系统无法分发，可靠且快速的问题。

As he mentions in this Google tech talk on Git, Git was created out of necessity for the Linux project. At the time the talk was given, Git was very young and people were getting used to it. It seemed to solve all the problems faced by version control software, and this contributed to its meteoric rise.

正如他在有关Git的Google技术演讲中提到的那样，Git是出于Linux项目的需要而创建的。在发表演讲时，Git还很年轻，人们已经习惯了。它似乎解决了版本控制软件所面临的所有问题，这促进了它的飞速发展。

Git的缺点 (Git's Shortcomings)

Fast forward a few years, and people started noticing the first real flaw in Git: it was difficult to handle very large repositories. How large are we talking here? It's Facebook large. Facebook's team projected that in a few years, a simple git status would take up to half a minute to show the result, as Facebook adds a large number of commits from thousands of developers every day. Facebook shifted its whole codebase to Mercurial, and its team actively started contributing to Mercurial to meet Facebook's needs.

几年过去了，人们开始注意到Git中的第一个真正的缺陷：处理非常大的存储库很困难。我们在这里聊多大？ Facebook很大。 Facebook的团队预计，在几年内，一个简单的git status最多需要半分钟才能显示结果，因为Facebook每天都会增加来自数千名开发人员的大量提交。 Facebook将其整个代码库转移到了Mercurial ，其团队积极开始为Mercurial做出贡献，以满足Facebook的需求。

Where did Git fail? A Mercurial contributor, Prasoon Shukla, on the comparison of scaling in Git and Mercurial, says that this can be attributed to the way Mercurial and Git store commits. Mercurial manages a couple of objects (or files) for each file in your repository, whereas Git creates an object for each commit. Therefore, on increasing the number of commits, the number of objects in Mercurial remains constant, in contrast to a linear increase in Git. Therefore, when you run a simple git status command, Git has to sift through all these objects, which takes a considerable amount of time (in spite of the high efficiency of Git).

Git在哪里失败？ Mercurial贡献者Prasoon Shukla 在比较Git和Mercurial中的缩放比例时说，这可以归因于Mercurial和Git存储的提交方式。 Mercurial为存储库中的每个文件管理几个对象(或文件)，而Git为每个提交创建一个对象。因此，与Git的线性增加相反，在增加提交次数时，Mercurial中的对象数目保持不变。因此，当您运行简单的git status命令时，Git必须筛选所有这些对象，这需要花费大量时间(尽管Git的效率很高)。

Another area where Git might fall short is managing large binary files in the repository. Because Git tracks the changes in files, it’s not able to interpret the change in the content of binary files. And the size of the repository increases with every commit, because Git has to store the exact binary, rather than the change from the last version.

Git可能不足的另一个领域是管理存储库中的大型二进制文件。由于Git跟踪文件中的更改，因此它无法解释二进制文件内容中的更改。而且每次提交时，存储库的大小都会增加，因为Git必须存储确切的二进制文件，而不是对上一个版本的更改。

Over the years, developers of Git have tried to solve these problems. Each third-party service has come up with solutions to enable Git to manage larger repositories—such as GitHub's Large File Storage extension.

多年来，Git的开发人员一直试图解决这些问题。每个第三方服务都提供了使Git能够管理更大的存储库的解决方案，例如GitHub的Large File Storage扩展。

This post looks at techniques that can be used to handle large repositories in Git—in terms of large histories, as well as the presence of large binary files, or both.

这篇文章从大型历史记录，大型二进制文件的存在或两者的角度探讨了可用于在Git中处理大型存储库的技术。

大量提交的项目 (Projects with a Large Number of Commits)

I'll firstly look at a few ways to manage repositories with large histories more efficiently.

首先，我将探讨几种更有效地管理具有大量历史记录的存储库的方法。

浅克隆存储库 (Shallow clone of repositories)

As mentioned earlier, the primary reason why projects with large histories slow down is the huge number of commits. In a distributed system like Git, when you clone a repository, its full project history gets downloaded. However, Git provides a way to specify the number of commits you want to have in your clone of a project. This is known as a shallow clone. When you get the number of commits down, your Git operations run faster.

如前所述，具有大量历史记录的项目变慢的主要原因是提交数量巨大。在像Git这样的分布式系统中，克隆存储库时，将下载其完整的项目历史记录。但是，Git提供了一种方法来指定您希望在项目克隆中拥有的提交数量。这被称为浅克隆 。当您减少提交次数时，您的Git操作运行得更快。

To perform shallow cloning, you need to add the --depth option, with the number of commits we want, to the clone command:

要执行浅层克隆，您需要在clone命令中添加--depth选项以及所需的提交次数：

git clone --depth [number_of_commits] [url_of_remote]

In earlier versions of Git, there was limited support for shallow clones. If your truncated history didn't stretch long enough, you weren't allowed to push or pull. However, with the release of Git 1.9.0, support for shallow clones was increased significantly.

在早期版本的Git中，对浅表克隆的支持有限。如果您的历史记录没有足够长的时间，则不允许您进行推拉操作。但是，随着Git 1.9.0的发布，对浅层克隆的支持显着增加。

克隆单个分支 (Clone a single branch)

When you clone a repository, all the branches in the remote get downloaded. (If you run git branch in a newly cloned repository, it shows only the master branch. You should run git branch -a to list all the branches that were a part of the remote.) It's probable that many of the commits present in other branches are irrelevant to one developer's work. Therefore, you can clone just the master or the branch relevant to your development. Doing so significantly reduces the number of commits that make up the history of the cloned version, especially if branches in the repository have divergent histories.

克隆存储库时，将下载远程中的所有分支。 (如果在新克隆的存储库中运行git branch ，则它仅显示master分支。应运行git branch -a列出属于远程一部分的所有分支。)分支与一位开发人员的工作无关。因此，您可以仅克隆与您的开发相关的master或分支。这样做显着减少了构成克隆版本历史记录的提交次数，尤其是在存储库中的分支具有不同历史记录的情况下。

To clone only a single branch of a remote, you can run the following command:

要仅克隆远程的单个分支，可以运行以下命令：

git clone [url_of_remote] --branch [branch_name] --single-branch

This command instructs Git to clone only the branch_name branch from the remote.

此命令指示Git从远程branch_name仅克隆branch_name分支。

大文件项目 (Projects With Large Files)

The next problem that arises is the presence of large binary files (which are not traditional text files). Changes in binary files are not tracked by Git, which is why any change in a binary file is stored as the binary file itself. If binary files are large (like 3D models, or graphic designs), the size of the repository increases considerably with every changing commit.

出现的下一个问题是大型二进制文件(不是传统的文本文件)的存在。 Git不会跟踪二进制文件中的更改，这就是为什么二进制文件中的任何更改都存储为二进制文件本身的原因。如果二进制文件很大(例如3D模型或图形设计)，则每次更改提交时，存储库的大小都会大大增加。

使用子模块 (Using submodules)

One way out of the problem of large files is to use submodules, which enable you to manage one Git repository within another. You can create a submodule, which contains all your binary files, keeping the rest of the code separately in the parent repository, and update the submodule only when necessary. This logically separates the core part of your project from the large files, and helps in managing them separately.

解决大文件问题的一种方法是使用子模块，这些子模块使您可以管理另一个Git存储库。您可以创建一个包含所有二进制文件的子模块，将其余代码单独保留在父存储库中，并仅在必要时更新该子模块。这从逻辑上将项目的核心部分与大文件分开，并有助于分别管理它们。

使用第三方扩展 (Using a third-party extension)

There are many extensions for Git, built by other developers, to handle large files. One option is to use git-annex, which allows you to manage files in Git without checking the contents of the file into Git. Another extension (the development of which has now been stopped) is git-bigfiles.

由其他开发人员构建的Git扩展有很多，可以处理大文件。一种选择是使用git-annex ，它允许您在Git中管理文件，而无需将文件内容检入到Git中。另一个扩展(现已停止开发)是git-bigfiles 。

GitHub recently launched Git Large File Storage, an open source extension for Git, to manage large binary files in Git. LFS stores these large files in a remote server like GitHub, whereas only text pointers are stored in your Git repository. SitePoint recently published a tutorial on how to get started with Git LFS.

GitHub最近启动了Git的大型扩展程序Git Large File Storage ，以管理Git中的大型二进制文件。 LFS将这些大文件存储在GitHub之类的远程服务器中，而Git存储库中仅存储文本指针。 SitePoint最近发布了有关如何开始使用Git LFS的教程。

In a short period of time, Git LFS has gained popularity, signaling that it provides a good way of handing such large binaries in Git.

在短时间内，Git LFS受到欢迎，这表明它提供了处理Git中如此大的二进制文件的好方法。

最后的想法 (Final Thoughts)

It has been a long time since Facebook announced its move to Mercurial (although it still continues to use Git for side projects like ReactJS). It's good to see Git developers and third-party developers have both reacted positively to it and come up with innovative solutions to the problems at hand. If you're thinking about learning version control, I'd recommend you go for Git—as the future is definitely bright!

自从Facebook宣布转向Mercurial以来已经有很长时间了(尽管它仍然继续将Git用于诸如ReactJS之类的副项目)。很高兴看到Git开发人员和第三方开发人员都对此做出了积极的React，并提出了针对当前问题的创新解决方案。如果您正在考虑学习版本控制，我建议您选择Git，因为未来肯定是光明的！

If you'd like to learn more about Git and its amazing powers, check out Shaumik's new book Jump Start Git, published right here at SitePoint!

如果您想了解有关Git及其惊人功能的更多信息，请查看Shaumik的新书Jump Start Git ，该书就在SitePoint上出版！

Understand Git’s core philosophy.
了解Git的核心理念。
Get started with Git: install it, learn the basic commands, and set up your first project.
Git入门：安装它，学习基本命令，并设置您的第一个项目。
Work with Git as part of a collaborative team.
与Git合作，作为协作团队的一部分。
Use Git’s debugging tools for maximum debug efficiency.
使用Git的调试工具可获得最大的调试效率。
Take control with Git’s advanced features: reflog, rebase, stash, and more.
使用Git的高级功能进行控制：引用日志，变基，隐藏等。
Use Git with cloud-based Git repository host services like Github and Bitbucket.
将Git与基于云的Git存储库宿主服务(如Github和Bitbucket)一起使用。
See how Git’s used effectively on large open-source projects.
了解如何在大型开源项目中有效使用Git。

翻译自: https://www.sitepoint.com/managing-huge-repositories-with-git/

git 初始化git存储库

culh2177

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
git 初始化git存储库_使用Git管理庞大的存储库

git 初始化git存储库 Linus Torvalds created Git in the mid 2000s to solve a problem that other open source version control systems at that time could not—to be distributed, reliable and fast. Linus Torvald...
复制链接

扫一扫