git --git-dir_估计使用git-pandas在项目上花费的时间

最新推荐文章于 2024-11-15 18:38:47 发布

cumei1658

最新推荐文章于 2024-11-15 18:38:47 发布

阅读量154

点赞数

文章标签： java python git github 大数据

原文链接：https://www.pybloggers.com/2016/04/estimating-the-time-spent-on-a-project-with-git-pandas/

版权

本文介绍了一种使用Git-Pandas库估算项目开发时间的方法，通过分析提交历史，可以评估编写代码所花费的时间，适用于税务申报、时间跟踪、报告等场景。文中详细介绍了如何在特定存储库和分支上应用此功能，以及如何调整参数以获得更准确的估计。

摘要由CSDN通过智能技术生成

git --git-dir

I stumbled across a conversation recently on the Tech404 slack channel (a pretty good public slack group for Atlanta area software folks) about mostly taxes, but nestled in the middle was this project: git_time_extractor. In the past I’ve noticed a kind of weird concentration of git related open source projects among Atlanta developers, I’m not sure if that says more about Atlanta or git’s abstruseness.

最近，我在Tech404松弛频道（对于亚特兰大地区软件人员来说是一个相当不错的公共松弛团体）上偶然发现了有关税收的对话，但中间是这个项目： git_time_extractor 。过去，我注意到亚特兰大开发人员中与git相关的开放源代码项目有点怪异，我不确定这是否能进一步说明亚特兰大或git的深刻性。

Anyway, git time extractor is one of a few projects that will rip through your commit history in a given repository and piece together an estimate of how much time was spent writing the code behind them. This can be useful for taxes, general time tracking, reporting or just plain old vanity. The 3 projects I know of that do this are:

无论如何，git time extractor是将翻阅给定存储库中提交历史的几个项目之一，并估算出在其背后编写代码所花费的时间。这对于税收，常规时间跟踪，报告或只是普通的虚荣表很有用。我知道的3个项目是：

I though it would make a nice feature for git-pandas, which I’ve written about here a few times before. I just released version 1.0.3 of that library, so you can get the new functionality by installing it with:

我虽然它将使一个很好的功能混帐大熊猫，这是我在这里写了一个几倍之前。我刚刚发布了该库的1.0.3版，因此您可以通过以下方式安装该新功能：

pip install -U git-pandas

For a deeper dive into how the algorithm itself works, kimmobrunfeldt really did a great job in his README of git-hours, so check it out here: https://github.com/kimmobrunfeldt/git-hours#how-it-works

为了更深入地了解算法本身的工作原理，kimmobrunfeldt在git-hours的自述文件中确实做了出色的工作，因此请在此处查看： https : //github.com/kimmobrunfeldt/git-hours#how-it-works

For an example of that in git-pandas, we will make a Repository object for the git-pandas repo itself, and calculate the hours spent on it’s python files in the master branch, excluding tests, and assuming 30 minutes or so for a lone commit:

例如，在git-pandas中，我们将为git-pandas存储库本身创建一个Repository对象，并在master分支中计算花费在其python文件上的时间（不包括测试），并假设30分钟左右承诺：

import os
from gitpandas.repository import Repository

# get the path of this repo
path = os.path.abspath('../../git-pandas')

# build an example repository object and try some things out
ignore_dirs = ['tests']
r = Repository(path, verbose=True)

# get the hours estimate for this repository (using 30 mins per commit)
he = r.hours_estimate(
    branch='master',
    grouping_window=0.5,
    single_commit_hours=0.5,
    limit=None,
    extensions=['py'],
    ignore_dir=ignore_dirs
)
print(he)

Which yields two rows (because apparently I can’t spell my name right on one computer):

它产生两行（因为显然我不能在一台计算机上拼写我的名字）：

       committer      hours
0  Will McGinnis  19.454444
1  Will Mcginnis   9.275556

If we were to change our single_commit_hours from 30 minutes to 45 minutes we get:

如果我们将single_commit_hours从30分钟更改为45分钟，则会得到：

       committer      hours
0  Will McGinnis  28.768056
1  Will Mcginnis  13.275556

So not quite an exact science, but pretty cool. You can, as always, use this function on a Repository object (which corresponds to a single git repo) or a ProjectDirectory (which corresponds to a collection of repos). The interface between the two is, as always, the exact same. Another potentially neat thing to do would be to run it on your GitHub.com profile and all public repositories in it, which is easy to set up:

因此，这不是一门确切的科学，而是很酷的。与往常一样，您可以在Repository对象（对应于单个git repo）或ProjectDirectory（对应于回购的集合）上使用此功能。两者之间的接口一如既往地完全相同。可能要做的另一件事是在GitHub.com个人资料及其中的所有公共存储库上运行它，这很容易设置：

g = GitHubProfile(username='wdm0006', ignore_forks=True, verbose=True)
g.hours_estimate(branch='master', by='repository')

For a project directory or github profile, you can chose to aggregate the output dataframe by committer, by repository, or use the default value of None to return the table unmodified.

对于项目目录或github概要文件，您可以选择按提交者，按存储库聚合输出数据框，或使用默认值None返回未修改的表。

Check out the source on github:

在github上查看源代码：