问题描述
由于生产需要,代码托管库由Hg全面倒向Git,工作量很大,因此希望借助自动化工具处理。
问题解决
找工具
网上搜索很快发现fast-export能实现想要的功能,并且有在官方文档Git Book中提及,因此决定使用该工具完成迁移工作。
环境准备
依据fast-export
的README
文档描述,我们需要:
Python 2.7 or 3.5+, and the Mercurial >= 4.6 package (>= 5.2, if Python 3.5+)
我这里使用的Python 版本为3.9,Mercurial版本5.4.2.
- 下载安装Python并设置好环境变量(如果用的是miniconda或者anaconda,将对应的Python路径设置好);
- 安装Mercurial;
- 安装Python扩展:
pip install Mercurial
脚本准备(注意所有的命令都是在 git bash 下执行的)
Mercurial
的官方用户手册提供了最简单的hg的库,可以先以这个库为demo测试各种情况,比如用户名为中文,message为中文,或者文档本身为中文等等。
- 生成authors映射文件
在fast-export
的README
下有提到hg-export-tool,根据文档描述,该工具可以用来生成authors映射文件。 - 将源码下载下来,根据example,我们可以生成hg repository对应的
authors.map
;
首先要编辑repo_mapping.json
,example中提供的如下:
其中.hg代表hg repository路径,.git代表git的,这里git可以先不创建。我们仿写repo_mapping.json
;
然后在git bash
中执行下面的命令:
python list-authors.py repo_mapping.json
如果找不到路径可以使用绝对路径,注意地址分割为/
而非\
;
author中如果有中文并且报错,则修改list-authors.py
中的decode方式为gbk
;
3. (如果报错)同理修改hg2git.py
中get_git_sha1
函数的decode方式为gbk
;
4. git创建库
git init
git config core.ignoreCase false
- 执行命令:
E:/TestDir/fast-export-201029/hg-fast-export.sh -r E:/TestDir/my-hello --force -A E:/TestDir/authors.map -fe gbk
其中--force
属性设置是为了解决hg中的multiple heads
的问题,-fe gbk
是为了解决中文乱码的问题;
6. 执行:
git checkout HEAD
- 迁移完成。
总结
不知道是福是祸,一切中文问题都起源于hg的默认用户设置,有<未指定>
这样的描述:
英文对应<unspecified>
,在某个issue中有人提到,但是被作者回怼回去了,不是bug,你们要自己维护好map文件😄:
Missing < in ident string…
其实并不是因为没有指定好,而是因为是中文不认识,所以失效了。绕了一圈回来,只需要解决乱码的问题就可以了。
如果生成失败了,可以在.git文件夹下查找日志。
中文tag可能还是乱码,不过tag可以删了重新打,不碍大事;
附补充-20210707
这个版本的代码会利用系统的临时文件做转储,如果C盘容量不够请慎用!
#exporter.py
import subprocess
import json
import sys
import os
import errno
from binascii import hexlify
from tempfile import gettempdir
import shutil
from collections import defaultdict
import itertools
import stat
here = os.path.dirname(os.path.abspath(__file__))
FAST_EXPORT_DIR = os.path.join(here, 'fast-export')
DEFAULT_BRANCH = 'master'
def mkdir_p(path):
try:
os.makedirs(path)
except OSError as exc:
if exc.errno == errno.EEXIST and os.path.isdir(path):
pass
else:
raise
def remove_readonly(func, path, _):
"""Clear the readonly bit and reattempt the removal. Necessary to delete read-only
files in Windows, and the .git directory appears to contain such files."""
os.chmod(path, stat.S_IWRITE)
func(path)
def init_git_repo(git_repo):
"""Make a new git repo in a temporary directory, and return its path"""
random_hex = hexlify(os.urandom(16)).decode()
temp_repo = os.path.join(
gettempdir(), os.path.basename(git_repo) + '-' + random_hex
)
mkdir_p(temp_repo)
subprocess.check_call(['git', 'init', temp_repo])
subprocess.check_call(['git', 'config', 'core.ignoreCase', 'false'], cwd=temp_repo)
return temp_repo
def copy_hg_repo(hg_repo):
random_hex = hexlify(os.urandom(16)).decode()
hg_repo_copy = os.path.join(
gettempdir(), os.path.basename(hg_repo) + '-' + random_hex
)
shutil.copytree(hg_repo, hg_repo_copy)
return hg_repo_copy
def get_heads(hg_repo):
"""Return alist of heads, including of closed branches, each in the
format:
{
'commit_hash': '<hash>',
'branch': '<branchname>',
'bookmark': '<bookmark name or None>',
'timstamp': <utc_unix_timestamp>,
'topological': <whether the head is a topological head>,
}
"""
cmd = ['hg', 'heads', '--closed', '--topo', '--template', 'json']
output = subprocess.check_output(cmd, cwd=hg_repo)
topo_heads = json.loads(output.decode('utf8'))
cmd = ['hg', 'heads', '--closed', '--template', 'json']
output = subprocess.check_output(cmd, cwd=hg_repo)
all_heads = json.loads(output.decode('utf8'))
results = []
for head in all_heads:
results.append(
{
'hash': head['node'],
'branch': head['branch'],
'timestamp': head['date'][0] + head['date'][1], # add UTC offset
# If multiple bookmarks, ignore all but one:
'bookmark': head['bookmarks'][0] if head['bookmarks'] else None,
'topological': head in topo_heads
}
)
return results
def fix_branches(hg_repo):
"""Amend anonymous/bookmarked additional heads on a branch to be on a new branch ,
either <branchname>-<n>, or the first bookmark name. Return a dict of commits
amended mapping the original commit hash to the amended one"""
all_heads = get_heads(hg_repo)
heads_by_branch = defaultdict(list)
# Group by branch:
for head in all_heads:
heads_by_branch[head['branch']].append(head)
# Sort by timestamp, newest first:
for heads in heads_by_branch.values():
heads.sort(reverse=True, key=lambda head: head['timestamp'])
amended_commits = {
}
for branch, heads in heads_by_branch.items():
if len(heads) == 1 or all(not head['topological'] for head in heads):
# No topological heads in this branch, no renaming:
heads_to_rename = []
elif all(head['topological'] for head in heads):
# Only topological heads in this branch. Rename all but the most recently
# committed to:
heads_to_rename = heads[1:]
else:
# Topological and non-topological heads in this branch. Rename all
# topological heads:
heads_to_rename = [head for head in heads if head['topological']]
counter = itertools.count(1)
for head in heads_to_rename:
if head['bookmark'] is