Bilingual 双语文件类型
Bilingual 双语文件类型指的是在一个文件中包含两种文字语言,并且两种文字语言有规律可循
例1
en-US = "Hello World";
zh-CN = "你好世界";
en-US = "What's the weather like today?";
zh-CN = "今天天气怎么样?";
例2
<tu>
<source>Hello World</source>
<target>你好世界</target>
</tu>
<tu>
<source>What's the weather like today?</source>
<target>今天天气怎么样?</target>
</tu>
例3
'国内' => 'Domestic',
'海外' => 'Overseas',
'当前操作环境' => 'Current Environment',
'开发环境' => 'Development ',
他们的共同点:
- 有规律可循
- 有的单元有完整译文,因此疑问需要处理为审校;有的单元没有译文,要处理为翻译
按照SDL Trados官方的处理建议是使用BilingualFileTypeAPI 编写自定义插件以支持文件类型
见 SDL File Type Support 3.0
http://producthelp.sdl.com/SDK/FileTypeSupport/3.0/html/671d5ad6-8de2-40a5-adc9-4406c146cf73.htm
建立一个Sniff类并继承 INativeFileSniffer接口
建立Paser类继承AbstractBilingualFileTypeComponent, IBilingualParser, INativeContentCycleAware, ISettingsAware接口
并且建立一个Writer类继承AbstractBilingualFileTypeComponent, IBilingualWriter, INativeOutputSettingsAware接口
并且还要处理TAG,Comments,ConfirmationLevel等细节
这看起来似乎有些复杂了,如果每次遇到双语文件类型都这么开发一次要花费的时间可观啊
有没有简单的方法呢,我们来找一下:
SDL Trados 2021 的解析器中有这样一个Excel解析器,Bilingual Excel
并且这个解析器定义了完善的处理规则,排除规则,备注规则,已存在内容的处理规则以及嵌入式规则
这基本上就是我们要的,那么我们就尝试使用这个Bilingual Excel处理器解析任意Bilingual File,要进行这样的操作我们就需要:
- 将Bilingual File转换为Bilingual Excel 然后进行翻译
- 将翻译后的Bilingual Excel中的翻译好的内容写回Bilingual File
以上面的例三为例,来看一下,首先Bilingual File转换为Bilingual Excel:
//现根据文档规则定义一个结构相关的正则表达式
public Regex RXRule = new Regex(@"^(\s+)?'(.*?)'(\s+)?=>(\s+)?'(.*?)',.*?$",RegexOptions.Compiled);
//转换方法,我这里使用SimpleOOXML一个简单的OpenXML扩展来处理Excel
public void ReadFile(string infile, string outfile)
{
using (MemoryStream stream = SpreadsheetReader.Create())
{
using (SpreadsheetDocument doc = SpreadsheetDocument.Open(stream, true))
{
var sheet = SpreadsheetReader.GetWorksheetPartByName(doc, "Sheet1");
var writer = new WorksheetWriter(doc, sheet);
int counter = 0;
using (StreamReader sr = new StreamReader(infile, Encoding.UTF8, true))
{
string line = "";
while (sr.Peek() > 0)
{
line = sr.ReadLine();
string key, value;
if (RXRule.IsMatch(line))
{
counter++;
Match m = RXRule.Match(line);
key = m.Groups[2].Value;
value = m.Groups[5].Value;
writer.PasteSharedText("A" + counter, counter.ToString());
writer.PasteSharedText("B" + counter, key);
if(key !=value)
writer.PasteSharedText("C" + counter, value);
}
}
}
writer.Save();
if (File.Exists(outfile))
File.Delete(outfile);
SpreadsheetWriter.StreamToFile(outfile, stream);
}
}
}
用一段很简单的代码将文件转为下列表格形式
之后还要有将翻译后内容写回Bilingual File的功能
//现根据文档规则定义一个结构相关的正则表达式
//我们采用正则表达式替换的方式将译文写回
public Regex RXRule = new Regex(@"^(\s+)?'(.*?)'(\s+)?=>(\s+)?'(.*?)',(.*?)$", RegexOptions.Compiled);
string r1 = @"$1'$2'$3=>$4'";
string r2 = @"',$6";
//转换方法,我这里使用SimpleOOXML一个简单的OpenXML扩展来处理Excel
public void ReadFile(string infile, string outfile)
{
Dictionary<string, string> keyValuePairs = new Dictionary<string, string>();
using (SpreadsheetDocument doc = SpreadsheetDocument.Open(infile, false))
{
WorkbookPart wbPart = doc.WorkbookPart;
Sheet sheet = wbPart.Workbook.Descendants<Sheet>().FirstOrDefault(c => c.Name == "Sheet1");
WorksheetPart wsPart = (WorksheetPart)wbPart.GetPartById(sheet.Id);
if (sheet != null)
{
SharedStringTable stringTable = doc.WorkbookPart.SharedStringTablePart.SharedStringTable;
IEnumerable<Row> rows = wsPart.Worksheet.Descendants<Row>();
//bool IsFirstRow = true;
foreach (Row row in rows)
{
Cell dDell = row.Descendants<Cell>().First<Cell>();
if (!GetValue(dDell, stringTable).Equals("N/A"))
{
IEnumerator<OpenXmlElement> o = row.GetEnumerator();
o.MoveNext();
o.MoveNext();
string s = GetValue((Cell)o.Current, stringTable);
o.MoveNext();
string t = GetValue((Cell)o.Current, stringTable);
KeyValuePair<string, string> N = new KeyValuePair<string, string>(s, t);
if (!keyValuePairs.Contains(N))
keyValuePairs.Add(s,t);
}
}
}
}
//先把Excel读到 Dictionary<string, string> keyValuePairs 里面
//然后一边读源文件一边替换并写出翻译后的文件
using (StreamReader sr = new StreamReader(Path.ChangeExtension(infile,".php"), Encoding.UTF8, true))
{
using (StreamWriter sw = new StreamWriter(outfile, false, new UTF8Encoding(false)))
{
string line = "";
while (sr.Peek() > 0)
{
line = sr.ReadLine();
string key, value;
if (RXRule.IsMatch(line))
{
Match m = RXRule.Match(line);
key = m.Groups[2].Value;
value = m.Groups[5].Value;
string trans = "";
if (keyValuePairs.TryGetValue(key, out trans))
{
line = RXRule.Replace(line, r1 + trans + r2);
}
sw.WriteLine(line);
}
else
{
sw.WriteLine(line);
}
}
sw.Flush();
}
}
}
至此就完成了,是不是很容易了呢