将文本按行标识符进行分割

最新推荐文章于 2022-04-28 10:43:18 发布

weixin_33796205

最新推荐文章于 2022-04-28 10:43:18 发布

阅读量180

点赞数

原文链接：http://www.cnblogs.com/wenlin-gk/p/10694977.html

版权

将文本按行标识符进行分割

def split_docs(lines, separator):
    """
    :note: The English sentence is in the front,
        the Chinese sentence is in the back,
        and the two are separated by a separator.
    """
    if not lines:
        return [], []

    eng_lines = []
    chn_lines = []

    chn_begin_condition = False
    for line in lines:
        if chn_begin_condition:
            chn_lines.append(line)
        else:
            chn_begin_condition = line.lstrip().startswith(separator)
            if not chn_begin_condition:
                eng_lines.append(line)

    return eng_lines, chn_lines

posted on 2019-04-12 11:29 wenlin_gk 阅读( ...) 评论( ...) 编辑收藏

转载于:https://www.cnblogs.com/wenlin-gk/p/10694977.html

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

weixin_33796205

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
将文本按行标识符进行分割

将文本按行标识符进行分割 def split_docs(lines, separator): """ :note: The English sentence is in the front, the Chinese sentence is in the back, and the tw...
复制链接

扫一扫