判断是否是汉字使用的是unicode编码
#判断是步是中文字符,汉字的unicode编码最小值为:0x4e00,最大值为0x952f
def _is_chinese_char(self, cp):
"""Checks whether CP is the codepoint of a CJK character."""
# This defines a "chinese character" as anything in the CJK Unicode block:
# https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_(Unicode_block)
#
# Note that the CJK Unicode block is NOT all Japanese and Korean characters,
# despite its name. The modern Korean Hangul alphabet is a different block,
# as is Japanese Hiragana and Katakana. Those alphabets are used to write
# space-separated words, so they are not treated specially and handled
# like the all of the other languages.
'''
0x4e00-0x9fff cjk 统一字型 常用字 共 20992个(实际只定义到0x9fc3)
0x3400-0x4dff cjk 统一字型扩展表a 少用字 共 6656个
0x20000-0x2a6df cjk 统一字型扩展表b 少用字,历史上使用 共42720个
0xf900-0xfaff cjk 兼容字型 重复字,可统一变体,共同字 共512个
0x2f800-0x2fa1f cjk 兼容字型补遗 可统一变体 共544个
'''
if ((cp >= 0x4E00 and cp <= 0x9FFF) or #
(cp >= 0x3400 and cp <= 0x4DBF) or #
(cp >= 0x20000 and cp <= 0x2A6DF) or #
(cp >= 0x2A700 and cp <= 0x2B73F) or #
(cp >= 0x2B740 and cp <= 0x2B81F) or #
(cp >= 0x2B820 and cp <= 0x2CEAF) or
(cp >= 0xF900 and cp <= 0xFAFF) or #
(cp >= 0x2F800 and cp <= 0x2FA1F)): #
['this', 'text', 'is', 'included', 'to', 'make', 'sure', 'unicode', 'is', 'handled', 'properly', ':', '力', '加', '勝', '北', '区', 'ᴵ', '##ᴺ', '##ᵀ', '##ᵃ', '##ছ', '##জ', '##ট', '##ড', '##ণ', '##ত']
Text should be one-sentence-per-line, with empty lines between documents.
all_dovument =[[['this', 'text', 'is', 'included', 'to', 'make', 'sure', 'unicode', 'is', 'handled', 'properly', ':', '力', '加', '勝', '北', '区', 'ᴵ', '##ᴺ', '##ᵀ', '##ᵃ', '##ছ', '##জ', '##ট', '##ড', '##ণ', '##ত'], ['text', 'should', 'be', 'one', '-', 'sentence', '-', 'per', '-', 'line', ',', 'with', 'empty', 'lines', 'between', 'documents', '.'], ['this', 'sample', 'text', 'is', 'public', 'domain', 'and', 'was', 'randomly', 'selected', 'from', 'project', 'gut', '##tenberg', '.']], [['the', 'rain', 'had', 'only', 'ceased', 'with', 'the', 'gray', 'streaks', 'of', 'morning', 'at', 'blazing', 'star', ',', 'and', 'the', 'settlement', 'awoke', 'to', 'a', 'moral', 'sense', 'of', 'clean', '##liness', ',', 'and', 'the', 'finding', 'of', 'forgotten', 'knives', ',', 'tin', 'cups', ',', 'and', 'smaller', 'camp', 'ut', '##ens', '##ils', ',', 'where', 'the', 'heavy', 'showers',
instance =[tokens: [CLS] ancient sage [MASK] [MASK] the name kang un ##im [MASK] ##ant to a monk - - pumped water nightly that he might study by day , so i [MASK] the [MASK] of cloak ##s [MASK] para ##sol ##acies , at the sacred doors of her [MASK] - room [MASK] im ##bib ##e celestial knowledge . from my youth i felt in me a [SEP] fallen star , i am , bobbie ! ' continued he , [MASK] ##ively , stroking his lean [MASK] - - ' a fallen star ! - [MASK] fallen , if the dignity [MASK] philosophy will allow of the simi ##le , among the hog [MASK] of the lower world - [MASK] indeed , even into the hog - bucket itself . [SEP]
all_documents = [[]] #转成二维矩阵,文档个数×句子个数
create_instances_from_document():函数的解释:
1.根据索引选中一篇文档,并将句子对的最大长度定义为128,考虑到需要插入标志3个,最后的长度为125.
2.考虑到微调和预训练的过程,以一定的概率随机产生一些小于最短序列的数据,
3.从选中的文档中选择候选集,候选集所有句子的长度不超过设定的最大长度,训练数据的句子对分为a句和b句[a,b]共同构成训练数据,就是先有a句,下一句为b句。有两种情况:b是a的下一句;b不是a的下一句。
a的构造:a可能有多个句子组成,a_end是a的结束句,a_end的选取是从,候选集中随机选取。a确定后在来确定下一句b
b的构造:分为两种情况:一种是a的真实下一句,另一种是从其他文档中随机的选取b,构成a的下一句。选择的过程是先随机选择一个文档,在该文档张随机选择初始句子,在从(初始句,结束句)随机选择剩下的句子,且b的长度为最大长度减去a的长度,这样就构成了b不是a的真正的下一句,这样的a,b对有50%的可能性。另外一种构成方式是,直接把a部分的真实语句拼接。构成b。以上是由上一句构成下一句的训练集的构成。
tokens ['[CLS]', 'like', 'most', 'of', 'his', 'fellow', 'gold', '-', 'seekers', ',', 'cass', 'was', 'super', '##sti', '##tious', '.', '[SEP]', 'text', 'should', 'be', 'one', '-', 'sentence', '-', 'per', '-', 'line', ',', 'with', 'empty', 'lines', 'between', 'documents', '.', 'this', 'sample', 'text', 'is', 'public', 'domain', 'and', 'was', 'randomly', 'selected', 'from', 'project', 'gut', '##tenberg', '.', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
------------ [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48]
============== [47, 3, 8, 11, 4, 46, 40, 7, 28, 30, 33, 26, 18, 12, 22, 39, 35, 21, 31, 42, 15, 1, 38, 34, 44, 29, 32, 19, 17, 43, 6, 37, 45, 27, 41, 36, 13, 20, 14, 23, 25, 9, 24, 48, 2, 10, 5]
[MaskedLmInstance(index=47, label='##tenberg'), MaskedLmInstance(index=3, label='of'), MaskedLmInstance(index=8, label='seekers'), MaskedLmInstance(index=11, label='was'), MaskedLmInstance(index=4, label='his'), MaskedLmInstance(index=46, label='gut'), MaskedLmInstance(index=40, label='and'), MaskedLmInstance(index=7, label='-')]
以上是被屏蔽的位置和标签-原来的真实值
然后进行排序后的结果
[MaskedLmInstance(index=3, label='of'), MaskedLmInstance(index=4, label='his'), MaskedLmInstance(index=7, label='-'), MaskedLmInstance(index=8, label='seekers'), MaskedLmInstance(index=11, label='was'), MaskedLmInstance(index=40, label='and'), MaskedLmInstance(index=46, label='gut'), MaskedLmInstance(index=47, label='##tenberg')]
原始值
['[CLS]', 'like', 'most', 'of', 'his', 'fellow', 'gold', '-', 'seekers', ',', 'cass', 'was', 'super', '##sti', '##tious', '.', '[SEP]', 'text', 'should', 'be', 'one', '-', 'sentence', '-', 'per', '-', 'line', ',', 'with', 'empty', 'lines', 'between', 'documents', '.', 'this', 'sample', 'text', 'is', 'public', 'domain', 'and', 'was', 'randomly', 'selected', 'from', 'project', 'gut', '##tenberg', '.', '[SEP]']
屏蔽后的值
['[CLS]', 'like', 'most', '[MASK]', '[MASK]', 'fellow', 'gold', '[MASK]', '[MASK]', ',', 'cass', 'was', 'super', '##sti', '##tious', '.', '[SEP]', 'text', 'should', 'be', 'one', '-', 'sentence', '-', 'per', '-', 'line', ',', 'with', 'empty', 'lines', 'between', 'documents', '.', 'this', 'sample', 'text', 'is', 'public', 'domain', '[MASK]', 'was', 'randomly', 'selected', 'from', 'project', '[MASK]', '[MASK]', '.', '[SEP]']
一个instances
[tokens: [CLS] ceased [MASK] the gray streaks of morning at blazing star , and the [MASK] awoke to a [MASK] sense of clean ##liness , and the finding of forgotten knives , tin cups , and smaller [MASK] ut ##ens ##ils , where the heavy showers had washed away the [MASK] [MASK] dust heap ##s before the cabin doors . indeed , it [MASK] recorded in blazing star that a fortunate [MASK] rise ##r had once picked up on the highway a solid chunk [MASK] gold quartz which the rain had freed from its inc ##umber ##ing soil , and [SEP] this text is [MASK] to [MASK] sure unicode is handled [MASK] : [MASK] 加 勝 北 区 ᴵ bobbie ##ᵀ ##ᵃ ##ছ ##জ ##ট ##ড [MASK] ##ত [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 6 8 14 18 36 44 49 50 62 70 83 93 103 105 110 112 118 125
masked_lm_labels: with of at settlement moral camp showers debris and was early of inc included make properly 力 ##ᴺ ##ণ
, tokens: [CLS] possibly this may have been the reason why early rise [MASK] [MASK] that locality , during the [MASK] season , adopted [MASK] thoughtful habit of body , and seldom lifted their eyes to the rift ##ed [MASK] [MASK] - ink washed skies above them . [SEP] [MASK] , [MASK] not with a view [MASK] discovery . a leak in his cabin roof , - - quite consistent with his careless , imp ##rov ##ide ##nt habits , - - had rouse ##d him at 4 a . m . , with a flooded " bunk " and wet blankets . the [MASK] [MASK] his wood pile independently to kind ##le a fire to ##ᵘ [MASK] bed - [MASK] , and he had rec honesty ##e to [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 11 12 14 18 22 37 38 47 49 54 73 102 103 107 114 115 118 124 125
masked_lm_labels: ##rs in locality rainy a or india morning but to ##rov chips from refused dry his clothes ##ours ##e
, tokens: [CLS] this was nearly opposite . mr . cass ##ius crossed the highway , and stopped suddenly . something glitter ##ed in the [MASK] red pool [MASK] him [MASK] gold , surely ! but [MASK] wonderful [MASK] [MASK] , not an irregular , shape ##less fragment of [MASK] ore , fresh from [MASK] ' s cr ##ucible , but a bit of jewel ##er ' s ⁻ ##ic [MASK] ##t in [MASK] form [MASK] a plain gold ring . [MASK] at it [MASK] at ##ten ##tively , he saw that it [MASK] the inscription , " may to cass . [MASK] [SEP] this sample text is public domain and [MASK] randomly selected from project gut ##tenberg . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 23 26 28 34 36 37 47 52 66 68 71 73 79 82 91 96 100 109
masked_lm_labels: nearest before . , to relate crude nature hand ##raf the of looking more bore may " was
, tokens: [CLS] like most [MASK] [MASK] fellow gold [MASK] [MASK] , cass was super ##sti ##tious . [SEP] text should be one - sentence - per - line , with empty lines between documents . this sample text is public domain [MASK] was randomly selected from project [MASK] [MASK] . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 3 4 7 8 11 40 46 47
masked_lm_labels: of his - seekers was and gut ##tenberg
]
write_instance_to_example_files:写入文件
token转换成ID
input_ids [101, 2023, 3793, 2003, 2443, 2000, 2191, 2469, 27260, 2003, 8971, 7919, 1024, 1778, 1779, 1780, 1781, 1782, 1493, 30030, 30031, 30032, 29893, 29894, 29895, 29896, 29897, 29898, 3793, 2323, 103, 2028, 1011, 6251, 1011, 103, 1011, 2240, 1010, 2007, 4064, 3210, 103, 5491, 1012, 102, 103, 7099, 3793, 2003, 2270, 5884, 1998, 2001, 103, 3479, 2013, 2622, 9535, 21806, 1012, 102]
覆盖:
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
features["input_ids"]
int64_list {
value: 101
value: 2023
value: 3793
value: 2003
value: 2443
value: 2000
value: 2191
value: 2469
value: 27260
value: 2003
value: 8971
value: 7919
value: 1024
value: 1778
value: 1779
value: 1780
value: 1781
value: 1782
value: 1493
value: 30030
value: 30031
value: 30032
value: 29893
value: 29894
value: 29895
value: 29896
value: 29897
value: 29898
value: 3793
value: 2323
value: 103
value: 2028
value: 1011
value: 6251
value: 1011
value: 103
value: 1011
value: 2240
value: 1010
value: 2007
value: 4064
value: 3210
value: 103
value: 5491
value: 1012
value: 102
value: 103
value: 7099
value: 3793
value: 2003
value: 2270
value: 5884
value: 1998
value: 2001
value: 103
value: 3479
value: 2013
value: 2622
value: 9535
value: 21806
value: 1012
value: 102
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
}
----------features OrderedDict([('input_ids', int64_list {
value: 101
value: 4298
value: 2023
value: 2089
value: 2031
value: 2042
value: 1996
value: 3114
value: 2339
value: 2220
value: 4125
value: 103
value: 103
value: 2008
value: 10246
value: 1010
value: 2076
value: 1996
value: 103
value: 2161
value: 1010
value: 4233
value: 103
value: 16465
value: 10427
value: 1997
value: 2303
value: 1010
value: 1998
value: 15839
value: 4196
value: 2037
value: 2159
value: 2000
value: 1996
value: 16931
value: 2098
value: 103
value: 103
value: 1011
value: 10710
value: 8871
value: 15717
value: 2682
value: 2068
value: 1012
value: 102
value: 103
value: 1010
value: 103
value: 2025
value: 2007
value: 1037
value: 3193
value: 103
value: 5456
value: 1012
value: 1037
value: 17271
value: 1999
value: 2010
value: 6644
value: 4412
value: 1010
value: 1011
value: 1011
value: 3243
value: 8335
value: 2007
value: 2010
value: 23358
value: 1010
value: 17727
value: 12298
value: 5178
value: 3372
value: 14243
value: 1010
value: 1011
value: 1011
value: 2018
value: 27384
value: 2094
value: 2032
value: 2012
value: 1018
value: 1037
value: 1012
value: 1049
value: 1012
value: 1010
value: 2007
value: 1037
value: 10361
value: 1000
value: 25277
value: 1000
value: 1998
value: 4954
value: 15019
value: 1012
value: 1996
value: 103
value: 103
value: 2010
value: 3536
value: 8632
value: 9174
value: 2000
value: 2785
value: 2571
value: 1037
value: 2543
value: 2000
value: 30042
value: 103
value: 2793
value: 1011
value: 103
value: 1010
value: 1998
value: 2002
value: 2018
value: 28667
value: 16718
value: 2063
value: 2000
value: 102
}
), ('input_mask', int64_list {
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
}
), ('segment_ids', int64_list {
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 0
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
value: 1
}
), ('masked_lm_positions', int64_list {
value: 11
value: 12
value: 14
value: 18
value: 22
value: 37
value: 38
value: 47
value: 49
value: 54
value: 73
value: 102
value: 103
value: 107
value: 114
value: 115
value: 118
value: 124
value: 125
value: 0
}
), ('masked_lm_ids', int64_list {
value: 2869
value: 1999
value: 10246
value: 16373
value: 1037
value: 2030
value: 2634
value: 2851
value: 2021
value: 2000
value: 12298
value: 11772
value: 2013
value: 4188
value: 4318
value: 2010
value: 4253
value: 22957
value: 2063
value: 0
}
), ('masked_lm_weights', float_list {
value: 1.0
value: 1.0
value: 1.0
value: 1.0
value: 1.0
value: 1.0
value: 1.0
value: 1.0
value: 1.0
value: 1.0
value: 1.0
value: 1.0
value: 1.0
value: 1.0
value: 1.0
value: 1.0
value: 1.0
value: 1.0
value: 1.0
value: 0.0
}
), ('next_sentence_labels', int64_list {
value: 0
}
)])
数据格式的构造部分结束
tokens ['[CLS]', 'ancient', 'sage', '-', '-', 'the', 'name', 'is', 'un', '##im', '##port', '##ant', 'to', 'a', 'monk', '-', '-', 'pumped', 'water', 'nightly', 'that', 'he', 'might', 'study', 'by', 'day', ',', 'so', 'i', ',', 'the', 'guardian', 'of', 'cloak', '##s', 'and', 'para', '##sol', '##s', ',', 'at', 'the', 'sacred', 'doors', 'of', 'her', 'lecture', '-', 'room', ',', 'im', '##bib', '##e', 'celestial', 'knowledge', '.', 'from', 'my', 'youth', 'i', 'felt', 'in', 'me', 'a', '[SEP]', 'fallen', 'star', ',', 'i', 'am', ',', 'sir', '!', "'", 'continued', 'he', ',', 'pens', '##ively', ',', 'stroking', 'his', 'lean', 'stomach', '-', '-', "'", 'a', 'fallen', 'star', '!', '-', '-', 'fallen', ',', 'if', 'the', 'dignity', 'of', 'philosophy', 'will', 'allow', 'of', 'the', 'simi', '##le', ',', 'among', 'the', 'hog', '##s', 'of', 'the', 'lower', 'world', '-', '-', 'indeed', ',', 'even', 'into', 'the', 'hog', '-', 'bucket', 'itself', '.', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] ancient sage [MASK] [MASK] the name kang un ##im [MASK] ##ant to a monk - - pumped water nightly that he might study by day , so i [MASK] the [MASK] of cloak ##s [MASK] para ##sol ##acies , at the sacred doors of her [MASK] - room [MASK] im ##bib ##e celestial knowledge . from my youth i felt in me a [SEP] fallen star , i am , bobbie ! ' continued he , [MASK] ##ively , stroking his lean [MASK] - - ' a fallen star ! - [MASK] fallen , if the dignity [MASK] philosophy will allow of the simi ##le , among the hog [MASK] of the lower world - [MASK] indeed , even into the hog - bucket itself . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 3 4 6 7 10 29 31 35 38 46 49 71 77 83 92 98 110 116 124
masked_lm_labels: - - name is ##port , guardian and ##s lecture , sir pens stomach - of ##s - bucket
]
tokens ['[CLS]', 'there', 'is', 'a', 'phil', '##oso', '##phic', 'pleasure', 'in', 'opening', 'one', "'", 's', 'treasures', 'to', 'the', 'modest', 'young', '.', '[SEP]', 'rain', 'had', 'only', 'ceased', 'with', 'the', 'gray', 'streaks', 'of', 'morning', 'at', 'blazing', 'star', ',', 'and', 'the', 'settlement', 'awoke', 'to', 'a', 'moral', 'sense', 'of', 'clean', '##liness', ',', 'and', 'the', 'finding', 'of', 'forgotten', 'knives', ',', 'tin', 'cups', ',', 'and', 'smaller', 'camp', 'ut', '##ens', '##ils', ',', 'where', 'the', 'heavy', 'showers', 'had', 'washed', 'away', 'the', 'debris', 'and', 'dust', 'heap', '##s', 'before', 'the', 'cabin', 'doors', '.', 'indeed', ',', 'it', 'was', 'recorded', 'in', 'blazing', 'star', 'that', 'a', 'fortunate', 'early', 'rise', '##r', 'had', 'once', 'picked', 'up', 'on', 'the', 'highway', 'a', 'solid', 'chunk', 'of', 'gold', 'quartz', 'which', 'the', 'rain', 'had', 'freed', 'from', 'its', 'inc', '##umber', '##ing', 'soil', ',', 'and', 'washed', 'into', 'immediate', 'and', 'glittering', 'popularity', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] ancient sage [MASK] [MASK] the name kang un ##im [MASK] ##ant to a monk - - pumped water nightly that he might study by day , so i [MASK] the [MASK] of cloak ##s [MASK] para ##sol ##acies , at the sacred doors of her [MASK] - room [MASK] im ##bib ##e celestial knowledge . from my youth i felt in me a [SEP] fallen star , i am , bobbie ! ' continued he , [MASK] ##ively , stroking his lean [MASK] - - ' a fallen star ! - [MASK] fallen , if the dignity [MASK] philosophy will allow of the simi ##le , among the hog [MASK] of the lower world - [MASK] indeed , even into the hog - bucket itself . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 3 4 6 7 10 29 31 35 38 46 49 71 77 83 92 98 110 116 124
masked_lm_labels: - - name is ##port , guardian and ##s lecture , sir pens stomach - of ##s - bucket
, tokens: [CLS] there is a phil ##oso ##phic pleasure in opening [MASK] ' s treasures to the modest young . [SEP] rain had only ceased with [MASK] gray streaks of morning at blazing star , [MASK] the settlement awoke to a moral sense of clean akron 16th [MASK] the finding of forgotten knives , tin cups , and smaller camp ut ##ens ##ils , where the [MASK] showers had washed away the debris and dust heap [MASK] before the cabin doors . indeed [MASK] [MASK] was recorded in blazing [MASK] that a fortunate [MASK] [MASK] [MASK] had once picked up on [MASK] highway a solid chunk [MASK] [MASK] quartz which the [MASK] had freed from its inc ##umber ##ing soil , and washed into immediate and glittering popularity [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 10 25 34 44 45 46 61 65 75 82 83 88 92 93 94 100 105 106 110
masked_lm_labels: one the and ##liness , and ##ils heavy ##s , it star early rise ##r the of gold rain
]
tokens ['[CLS]', 'perhaps', 'you', 'will', 'assist', 'me', 'by', 'carrying', 'this', 'basket', 'of', 'fruit', '?', "'", 'and', 'the', 'little', 'man', 'jumped', 'up', ',', 'put', 'his', 'basket', 'on', 'phil', '##am', '##mon', "'", 's', 'head', ',', 'and', 'tr', '##otted', 'off', 'up', 'a', 'neighbouring', 'street', '.', 'phil', '##am', '##mon', 'followed', ',', 'half', 'contempt', '##uous', ',', 'half', 'wondering', 'at', 'what', 'this', 'philosophy', 'might', 'be', ',', 'which', 'could', 'feed', 'the', 'self', '-', 'con', '##ce', '##it', 'of', 'anything', 'so', 'ab', '##ject', 'as', 'his', 'ragged', 'little', 'api', '##sh', 'guide', ';', '[SEP]', 'text', 'should', 'be', 'one', '-', 'sentence', '-', 'per', '-', 'line', ',', 'with', 'empty', 'lines', 'between', 'documents', '.', 'this', 'sample', 'text', 'is', 'public', 'domain', 'and', 'was', 'randomly', 'selected', 'from', 'project', 'gut', '##tenberg', '.', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] ancient sage [MASK] [MASK] the name kang un ##im [MASK] ##ant to a monk - - pumped water nightly that he might study by day , so i [MASK] the [MASK] of cloak ##s [MASK] para ##sol ##acies , at the sacred doors of her [MASK] - room [MASK] im ##bib ##e celestial knowledge . from my youth i felt in me a [SEP] fallen star , i am , bobbie ! ' continued he , [MASK] ##ively , stroking his lean [MASK] - - ' a fallen star ! - [MASK] fallen , if the dignity [MASK] philosophy will allow of the simi ##le , among the hog [MASK] of the lower world - [MASK] indeed , even into the hog - bucket itself . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 3 4 6 7 10 29 31 35 38 46 49 71 77 83 92 98 110 116 124
masked_lm_labels: - - name is ##port , guardian and ##s lecture , sir pens stomach - of ##s - bucket
, tokens: [CLS] there is a phil ##oso ##phic pleasure in opening [MASK] ' s treasures to the modest young . [SEP] rain had only ceased with [MASK] gray streaks of morning at blazing star , [MASK] the settlement awoke to a moral sense of clean akron 16th [MASK] the finding of forgotten knives , tin cups , and smaller camp ut ##ens ##ils , where the [MASK] showers had washed away the debris and dust heap [MASK] before the cabin doors . indeed [MASK] [MASK] was recorded in blazing [MASK] that a fortunate [MASK] [MASK] [MASK] had once picked up on [MASK] highway a solid chunk [MASK] [MASK] quartz which the [MASK] had freed from its inc ##umber ##ing soil , and washed into immediate and glittering popularity [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 10 25 34 44 45 46 61 65 75 82 83 88 92 93 94 100 105 106 110
masked_lm_labels: one the and ##liness , and ##ils heavy ##s , it star early rise ##r the of gold rain
, tokens: [CLS] perhaps murder will assist me by carrying this [MASK] of fruit ? ' [MASK] the little man jumped up , put his basket [MASK] phil ##am ##mon ' [MASK] head , and tr ##otted off up a neighbouring street . phil ##am ##mon followed , half contempt ##uous , half wondering at what this philosophy [MASK] be , which [MASK] [MASK] the self [MASK] con ##ce ##it of anything so ab ##ject as his ragged [MASK] api ##val guide ; [SEP] text should be [MASK] [MASK] sentence - per [MASK] line [MASK] with empty lines [MASK] documents . this sample text is public domain and was randomly selected from project gut ##tenberg . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 9 14 24 29 56 60 61 63 64 76 78 85 86 90 92 96
masked_lm_labels: you basket and on s might could feed self - little ##sh one - - , between
]
tokens ['[CLS]', 'of', 'the', 'street', ',', 'the', 'perpetual', 'stream', 'of', 'busy', 'faces', ',', 'the', 'line', 'of', 'cu', '##rri', '##cles', ',', 'pal', '##an', '##quin', '##s', ',', 'laden', 'ass', '##es', ',', 'camel', '##s', ',', 'elephants', ',', 'which', 'met', 'and', 'passed', 'him', ',', 'and', 'squeezed', 'him', 'up', 'steps', 'and', 'into', 'doorway', '##s', ',', 'as', 'they', 'threaded', 'their', 'way', 'through', 'the', 'great', 'moon', '-', 'gate', 'into', 'the', 'ample', 'street', 'beyond', ',', 'drove', 'everything', 'from', 'his', 'mind', 'but', 'wondering', 'curiosity', ',', 'and', 'a', 'vague', ',', 'helpless', 'dread', 'of', 'that', 'great', 'living', 'wilderness', ',', 'more', 'terrible', 'than', 'any', 'dead', 'wilderness', 'of', 'sand', 'which', 'he', 'had', 'left', '[SEP]', 'this', 'text', 'is', 'included', 'to', 'make', 'sure', 'unicode', 'is', 'handled', 'properly', ':', '力', '加', '勝', '北', '区', 'ᴵ', '##ᴺ', '##ᵀ', '##ᵃ', '##ছ', '##জ', '##ট', '##ড', '##ণ', '##ত', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] ancient sage [MASK] [MASK] the name kang un ##im [MASK] ##ant to a monk - - pumped water nightly that he might study by day , so i [MASK] the [MASK] of cloak ##s [MASK] para ##sol ##acies , at the sacred doors of her [MASK] - room [MASK] im ##bib ##e celestial knowledge . from my youth i felt in me a [SEP] fallen star , i am , bobbie ! ' continued he , [MASK] ##ively , stroking his lean [MASK] - - ' a fallen star ! - [MASK] fallen , if the dignity [MASK] philosophy will allow of the simi ##le , among the hog [MASK] of the lower world - [MASK] indeed , even into the hog - bucket itself . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 3 4 6 7 10 29 31 35 38 46 49 71 77 83 92 98 110 116 124
masked_lm_labels: - - name is ##port , guardian and ##s lecture , sir pens stomach - of ##s - bucket
, tokens: [CLS] there is a phil ##oso ##phic pleasure in opening [MASK] ' s treasures to the modest young . [SEP] rain had only ceased with [MASK] gray streaks of morning at blazing star , [MASK] the settlement awoke to a moral sense of clean akron 16th [MASK] the finding of forgotten knives , tin cups , and smaller camp ut ##ens ##ils , where the [MASK] showers had washed away the debris and dust heap [MASK] before the cabin doors . indeed [MASK] [MASK] was recorded in blazing [MASK] that a fortunate [MASK] [MASK] [MASK] had once picked up on [MASK] highway a solid chunk [MASK] [MASK] quartz which the [MASK] had freed from its inc ##umber ##ing soil , and washed into immediate and glittering popularity [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 10 25 34 44 45 46 61 65 75 82 83 88 92 93 94 100 105 106 110
masked_lm_labels: one the and ##liness , and ##ils heavy ##s , it star early rise ##r the of gold rain
, tokens: [CLS] perhaps murder will assist me by carrying this [MASK] of fruit ? ' [MASK] the little man jumped up , put his basket [MASK] phil ##am ##mon ' [MASK] head , and tr ##otted off up a neighbouring street . phil ##am ##mon followed , half contempt ##uous , half wondering at what this philosophy [MASK] be , which [MASK] [MASK] the self [MASK] con ##ce ##it of anything so ab ##ject as his ragged [MASK] api ##val guide ; [SEP] text should be [MASK] [MASK] sentence - per [MASK] line [MASK] with empty lines [MASK] documents . this sample text is public domain and was randomly selected from project gut ##tenberg . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 9 14 24 29 56 60 61 63 64 76 78 85 86 90 92 96
masked_lm_labels: you basket and on s might could feed self - little ##sh one - - , between
, tokens: [CLS] of [MASK] street , the perpetual stream of busy faces , the line of [MASK] 示 ##cles , pal ##an ##quin ##s , laden ass ##es , camel ##s , elephants [MASK] which met and passed him , [MASK] 1760 him [MASK] steps and into doorway ##s , as they threaded their 1887 through the great moon - association into the ample street beyond , drove everything from his mind but wondering curiosity , and a [MASK] , helpless dread of that great living wilderness , more terrible than any dead wilderness of sand which he had left [SEP] this [MASK] [MASK] included to make sure unicode [MASK] [MASK] properly : 力 加 勝 北 [MASK] ᴵ ##ᴺ ##ᵀ ##ᵃ ##ছ ##জ ##ট ##ড [MASK] ##ত [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 15 16 32 39 40 42 48 53 59 62 77 101 102 108 109 115 116 125
masked_lm_labels: the cu ##rri , and squeezed up , way gate ample vague text is is handled 北 区 ##ণ
]
tokens ['[CLS]', 'the', 'rep', '##ose', ',', 'the', 'silence', 'of', 'the', 'laura', '-', '-', 'for', 'faces', 'which', 'knew', 'him', 'and', 'smiled', 'upon', 'him', ';', 'but', 'it', 'was', 'too', 'late', 'to', 'turn', 'back', 'now', '.', 'his', 'guide', 'held', 'on', 'for', 'more', 'than', 'a', 'mile', 'up', 'the', 'great', 'main', 'street', ',', 'crossed', 'in', 'the', 'centre', 'of', 'the', 'city', ',', 'at', 'right', 'angles', ',', 'by', 'one', 'equally', 'magnificent', ',', 'at', 'each', 'end', 'of', 'which', ',', 'miles', 'away', ',', 'appeared', ',', 'dim', 'and', 'distant', 'over', 'the', 'heads', 'of', 'the', 'living', 'stream', 'of', 'passengers', ',', 'the', 'yellow', 'sand', '-', 'hills', 'of', 'the', 'desert', ';', 'while', 'at', 'the', 'end', 'of', 'the', 'vista', 'in', 'front', 'of', 'them', 'gleamed', 'the', 'blue', 'harbour', ',', 'through', 'a', 'network', 'of', 'countless', 'mast', '##s', '.', '[SEP]', 'this', 'was', 'nearly', 'opposite', '.', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] ancient sage [MASK] [MASK] the name kang un ##im [MASK] ##ant to a monk - - pumped water nightly that he might study by day , so i [MASK] the [MASK] of cloak ##s [MASK] para ##sol ##acies , at the sacred doors of her [MASK] - room [MASK] im ##bib ##e celestial knowledge . from my youth i felt in me a [SEP] fallen star , i am , bobbie ! ' continued he , [MASK] ##ively , stroking his lean [MASK] - - ' a fallen star ! - [MASK] fallen , if the dignity [MASK] philosophy will allow of the simi ##le , among the hog [MASK] of the lower world - [MASK] indeed , even into the hog - bucket itself . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 3 4 6 7 10 29 31 35 38 46 49 71 77 83 92 98 110 116 124
masked_lm_labels: - - name is ##port , guardian and ##s lecture , sir pens stomach - of ##s - bucket
, tokens: [CLS] there is a phil ##oso ##phic pleasure in opening [MASK] ' s treasures to the modest young . [SEP] rain had only ceased with [MASK] gray streaks of morning at blazing star , [MASK] the settlement awoke to a moral sense of clean akron 16th [MASK] the finding of forgotten knives , tin cups , and smaller camp ut ##ens ##ils , where the [MASK] showers had washed away the debris and dust heap [MASK] before the cabin doors . indeed [MASK] [MASK] was recorded in blazing [MASK] that a fortunate [MASK] [MASK] [MASK] had once picked up on [MASK] highway a solid chunk [MASK] [MASK] quartz which the [MASK] had freed from its inc ##umber ##ing soil , and washed into immediate and glittering popularity [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 10 25 34 44 45 46 61 65 75 82 83 88 92 93 94 100 105 106 110
masked_lm_labels: one the and ##liness , and ##ils heavy ##s , it star early rise ##r the of gold rain
, tokens: [CLS] perhaps murder will assist me by carrying this [MASK] of fruit ? ' [MASK] the little man jumped up , put his basket [MASK] phil ##am ##mon ' [MASK] head , and tr ##otted off up a neighbouring street . phil ##am ##mon followed , half contempt ##uous , half wondering at what this philosophy [MASK] be , which [MASK] [MASK] the self [MASK] con ##ce ##it of anything so ab ##ject as his ragged [MASK] api ##val guide ; [SEP] text should be [MASK] [MASK] sentence - per [MASK] line [MASK] with empty lines [MASK] documents . this sample text is public domain and was randomly selected from project gut ##tenberg . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 9 14 24 29 56 60 61 63 64 76 78 85 86 90 92 96
masked_lm_labels: you basket and on s might could feed self - little ##sh one - - , between
, tokens: [CLS] of [MASK] street , the perpetual stream of busy faces , the line of [MASK] 示 ##cles , pal ##an ##quin ##s , laden ass ##es , camel ##s , elephants [MASK] which met and passed him , [MASK] 1760 him [MASK] steps and into doorway ##s , as they threaded their 1887 through the great moon - association into the ample street beyond , drove everything from his mind but wondering curiosity , and a [MASK] , helpless dread of that great living wilderness , more terrible than any dead wilderness of sand which he had left [SEP] this [MASK] [MASK] included to make sure unicode [MASK] [MASK] properly : 力 加 勝 北 [MASK] ᴵ ##ᴺ ##ᵀ ##ᵃ ##ছ ##জ ##ট ##ড [MASK] ##ত [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 15 16 32 39 40 42 48 53 59 62 77 101 102 108 109 115 116 125
masked_lm_labels: the cu ##rri , and squeezed up , way gate ample vague text is is handled 北 区 ##ণ
, tokens: [CLS] the rep ##ose , [MASK] silence of [MASK] laura - - for [MASK] which [MASK] him [MASK] smiled upon him ; but [MASK] was too late to turn back now . his [MASK] held on [MASK] more [MASK] [unused731] mile up the great [MASK] street , crossed in the centre [MASK] the city , at right angles , by one equally magnificent , at each end of which , miles away , appeared , [MASK] and distant over the heads of the living stream of passengers [MASK] the yellow sand - [MASK] of the desert ; while at the end of the vista in front swaying them gleamed the blue harbour , through a network of countless mast ##s . [SEP] archaeologist was nearly opposite . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 5 8 13 15 17 23 33 36 38 39 44 51 71 75 76 87 92 106 122
masked_lm_labels: the the faces knew and it guide for than a main of away dim and , hills of this
]
tokens ['[CLS]', 'at', 'last', 'they', 'reached', 'the', 'quay', 'at', 'the', 'opposite', 'end', 'of', 'the', 'street', ';', '[SEP]', 'but', ',', 'wonderful', 'to', 'relate', ',', 'not', 'an', 'irregular', ',', 'shape', '##less', 'fragment', 'of', 'crude', 'ore', ',', 'fresh', 'from', 'nature', "'", 's', 'cr', '##ucible', ',', 'but', 'a', 'bit', 'of', 'jewel', '##er', "'", 's', 'hand', '##ic', '##raf', '##t', 'in', 'the', 'form', 'of', 'a', 'plain', 'gold', 'ring', '.', 'looking', 'at', 'it', 'more', 'at', '##ten', '##tively', ',', 'he', 'saw', 'that', 'it', 'bore', 'the', 'inscription', ',', '"', 'may', 'to', 'cass', '.', '"', 'like', 'most', 'of', 'his', 'fellow', 'gold', '-', 'seekers', ',', 'cass', 'was', 'super', '##sti', '##tious', '.', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] ancient sage [MASK] [MASK] the name kang un ##im [MASK] ##ant to a monk - - pumped water nightly that he might study by day , so i [MASK] the [MASK] of cloak ##s [MASK] para ##sol ##acies , at the sacred doors of her [MASK] - room [MASK] im ##bib ##e celestial knowledge . from my youth i felt in me a [SEP] fallen star , i am , bobbie ! ' continued he , [MASK] ##ively , stroking his lean [MASK] - - ' a fallen star ! - [MASK] fallen , if the dignity [MASK] philosophy will allow of the simi ##le , among the hog [MASK] of the lower world - [MASK] indeed , even into the hog - bucket itself . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 3 4 6 7 10 29 31 35 38 46 49 71 77 83 92 98 110 116 124
masked_lm_labels: - - name is ##port , guardian and ##s lecture , sir pens stomach - of ##s - bucket
, tokens: [CLS] there is a phil ##oso ##phic pleasure in opening [MASK] ' s treasures to the modest young . [SEP] rain had only ceased with [MASK] gray streaks of morning at blazing star , [MASK] the settlement awoke to a moral sense of clean akron 16th [MASK] the finding of forgotten knives , tin cups , and smaller camp ut ##ens ##ils , where the [MASK] showers had washed away the debris and dust heap [MASK] before the cabin doors . indeed [MASK] [MASK] was recorded in blazing [MASK] that a fortunate [MASK] [MASK] [MASK] had once picked up on [MASK] highway a solid chunk [MASK] [MASK] quartz which the [MASK] had freed from its inc ##umber ##ing soil , and washed into immediate and glittering popularity [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 10 25 34 44 45 46 61 65 75 82 83 88 92 93 94 100 105 106 110
masked_lm_labels: one the and ##liness , and ##ils heavy ##s , it star early rise ##r the of gold rain
, tokens: [CLS] perhaps murder will assist me by carrying this [MASK] of fruit ? ' [MASK] the little man jumped up , put his basket [MASK] phil ##am ##mon ' [MASK] head , and tr ##otted off up a neighbouring street . phil ##am ##mon followed , half contempt ##uous , half wondering at what this philosophy [MASK] be , which [MASK] [MASK] the self [MASK] con ##ce ##it of anything so ab ##ject as his ragged [MASK] api ##val guide ; [SEP] text should be [MASK] [MASK] sentence - per [MASK] line [MASK] with empty lines [MASK] documents . this sample text is public domain and was randomly selected from project gut ##tenberg . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 9 14 24 29 56 60 61 63 64 76 78 85 86 90 92 96
masked_lm_labels: you basket and on s might could feed self - little ##sh one - - , between
, tokens: [CLS] of [MASK] street , the perpetual stream of busy faces , the line of [MASK] 示 ##cles , pal ##an ##quin ##s , laden ass ##es , camel ##s , elephants [MASK] which met and passed him , [MASK] 1760 him [MASK] steps and into doorway ##s , as they threaded their 1887 through the great moon - association into the ample street beyond , drove everything from his mind but wondering curiosity , and a [MASK] , helpless dread of that great living wilderness , more terrible than any dead wilderness of sand which he had left [SEP] this [MASK] [MASK] included to make sure unicode [MASK] [MASK] properly : 力 加 勝 北 [MASK] ᴵ ##ᴺ ##ᵀ ##ᵃ ##ছ ##জ ##ট ##ড [MASK] ##ত [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 15 16 32 39 40 42 48 53 59 62 77 101 102 108 109 115 116 125
masked_lm_labels: the cu ##rri , and squeezed up , way gate ample vague text is is handled 北 区 ##ণ
, tokens: [CLS] the rep ##ose , [MASK] silence of [MASK] laura - - for [MASK] which [MASK] him [MASK] smiled upon him ; but [MASK] was too late to turn back now . his [MASK] held on [MASK] more [MASK] [unused731] mile up the great [MASK] street , crossed in the centre [MASK] the city , at right angles , by one equally magnificent , at each end of which , miles away , appeared , [MASK] and distant over the heads of the living stream of passengers [MASK] the yellow sand - [MASK] of the desert ; while at the end of the vista in front swaying them gleamed the blue harbour , through a network of countless mast ##s . [SEP] archaeologist was nearly opposite . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 5 8 13 15 17 23 33 36 38 39 44 51 71 75 76 87 92 106 122
masked_lm_labels: the the faces knew and it guide for than a main of away dim and , hills of this
, tokens: [CLS] at [MASK] [unused513] reached the quay at the opposite [MASK] of the street ; [SEP] [MASK] , wonderful to relate [MASK] [MASK] an irregular , shape ##less [MASK] of crude ore [MASK] fresh from nature ' s disagreements ##ucible , muse a bit of jewel ##er ' s hand [MASK] ##raf ##t in the form of a plain gold ring . looking at [MASK] [MASK] at ##ten ##tively , he saw that it bore the inscription , " may to cass . " like most [MASK] his fellow gold - seekers , cass [MASK] super ##sti ##tious . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 3 10 16 21 22 28 32 38 41 50 64 65 86 94
masked_lm_labels: last they end but , not fragment , cr but ##ic it more of was
]
tokens ['[CLS]', 'and', 'there', 'burst', 'on', 'phil', '##am', '##mon', "'", 's', 'astonished', 'eyes', 'a', 'vast', 'semi', '##ci', '##rcle', 'of', 'blue', 'sea', ',', 'ring', '##ed', 'with', 'palaces', 'and', 'towers', '.', '[SEP]', 'he', 'stopped', 'in', '##vo', '##lun', '##tar', '##ily', ';', 'and', 'his', 'little', 'guide', 'stopped', 'also', ',', 'and', 'looked', 'ask', '##ance', 'at', 'the', 'young', 'monk', ',', 'to', 'watch', 'the', 'effect', 'which', 'that', 'grand', 'panorama', 'should', 'produce', 'on', 'him', '.', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] ancient sage [MASK] [MASK] the name kang un ##im [MASK] ##ant to a monk - - pumped water nightly that he might study by day , so i [MASK] the [MASK] of cloak ##s [MASK] para ##sol ##acies , at the sacred doors of her [MASK] - room [MASK] im ##bib ##e celestial knowledge . from my youth i felt in me a [SEP] fallen star , i am , bobbie ! ' continued he , [MASK] ##ively , stroking his lean [MASK] - - ' a fallen star ! - [MASK] fallen , if the dignity [MASK] philosophy will allow of the simi ##le , among the hog [MASK] of the lower world - [MASK] indeed , even into the hog - bucket itself . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 3 4 6 7 10 29 31 35 38 46 49 71 77 83 92 98 110 116 124
masked_lm_labels: - - name is ##port , guardian and ##s lecture , sir pens stomach - of ##s - bucket
, tokens: [CLS] there is a phil ##oso ##phic pleasure in opening [MASK] ' s treasures to the modest young . [SEP] rain had only ceased with [MASK] gray streaks of morning at blazing star , [MASK] the settlement awoke to a moral sense of clean akron 16th [MASK] the finding of forgotten knives , tin cups , and smaller camp ut ##ens ##ils , where the [MASK] showers had washed away the debris and dust heap [MASK] before the cabin doors . indeed [MASK] [MASK] was recorded in blazing [MASK] that a fortunate [MASK] [MASK] [MASK] had once picked up on [MASK] highway a solid chunk [MASK] [MASK] quartz which the [MASK] had freed from its inc ##umber ##ing soil , and washed into immediate and glittering popularity [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 10 25 34 44 45 46 61 65 75 82 83 88 92 93 94 100 105 106 110
masked_lm_labels: one the and ##liness , and ##ils heavy ##s , it star early rise ##r the of gold rain
, tokens: [CLS] perhaps murder will assist me by carrying this [MASK] of fruit ? ' [MASK] the little man jumped up , put his basket [MASK] phil ##am ##mon ' [MASK] head , and tr ##otted off up a neighbouring street . phil ##am ##mon followed , half contempt ##uous , half wondering at what this philosophy [MASK] be , which [MASK] [MASK] the self [MASK] con ##ce ##it of anything so ab ##ject as his ragged [MASK] api ##val guide ; [SEP] text should be [MASK] [MASK] sentence - per [MASK] line [MASK] with empty lines [MASK] documents . this sample text is public domain and was randomly selected from project gut ##tenberg . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 9 14 24 29 56 60 61 63 64 76 78 85 86 90 92 96
masked_lm_labels: you basket and on s might could feed self - little ##sh one - - , between
, tokens: [CLS] of [MASK] street , the perpetual stream of busy faces , the line of [MASK] 示 ##cles , pal ##an ##quin ##s , laden ass ##es , camel ##s , elephants [MASK] which met and passed him , [MASK] 1760 him [MASK] steps and into doorway ##s , as they threaded their 1887 through the great moon - association into the ample street beyond , drove everything from his mind but wondering curiosity , and a [MASK] , helpless dread of that great living wilderness , more terrible than any dead wilderness of sand which he had left [SEP] this [MASK] [MASK] included to make sure unicode [MASK] [MASK] properly : 力 加 勝 北 [MASK] ᴵ ##ᴺ ##ᵀ ##ᵃ ##ছ ##জ ##ট ##ড [MASK] ##ত [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 15 16 32 39 40 42 48 53 59 62 77 101 102 108 109 115 116 125
masked_lm_labels: the cu ##rri , and squeezed up , way gate ample vague text is is handled 北 区 ##ণ
, tokens: [CLS] the rep ##ose , [MASK] silence of [MASK] laura - - for [MASK] which [MASK] him [MASK] smiled upon him ; but [MASK] was too late to turn back now . his [MASK] held on [MASK] more [MASK] [unused731] mile up the great [MASK] street , crossed in the centre [MASK] the city , at right angles , by one equally magnificent , at each end of which , miles away , appeared , [MASK] and distant over the heads of the living stream of passengers [MASK] the yellow sand - [MASK] of the desert ; while at the end of the vista in front swaying them gleamed the blue harbour , through a network of countless mast ##s . [SEP] archaeologist was nearly opposite . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 5 8 13 15 17 23 33 36 38 39 44 51 71 75 76 87 92 106 122
masked_lm_labels: the the faces knew and it guide for than a main of away dim and , hills of this
, tokens: [CLS] at [MASK] [unused513] reached the quay at the opposite [MASK] of the street ; [SEP] [MASK] , wonderful to relate [MASK] [MASK] an irregular , shape ##less [MASK] of crude ore [MASK] fresh from nature ' s disagreements ##ucible , muse a bit of jewel ##er ' s hand [MASK] ##raf ##t in the form of a plain gold ring . looking at [MASK] [MASK] at ##ten ##tively , he saw that it bore the inscription , " may to cass . " like most [MASK] his fellow gold - seekers , cass [MASK] super ##sti ##tious . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 3 10 16 21 22 28 32 38 41 50 64 65 86 94
masked_lm_labels: last they end but , not fragment , cr but ##ic it more of was
, tokens: [CLS] and there burst on phil ##am ##mon ' s astonished eyes a [MASK] semi ##ci ##rcle [MASK] blue [MASK] , ring ##ed with palaces and [MASK] . [SEP] he stopped in υ ##lun ##tar [MASK] ; and his little guide stopped also , and looked vuelta ##ance at the young monk , to watch [MASK] [MASK] which that grand panorama should produce on him . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 3 13 17 19 26 32 35 46 55 56
masked_lm_labels: burst vast of sea towers ##vo ##ily ask the effect
]
tokens ['[CLS]', 'this', 'text', 'is', 'included', 'to', 'make', 'sure', 'unicode', 'is', 'handled', 'properly', ':', '力', '加', '勝', '北', '区', 'ᴵ', '##ᴺ', '##ᵀ', '##ᵃ', '##ছ', '##জ', '##ট', '##ড', '##ণ', '##ত', 'text', 'should', 'be', 'one', '-', 'sentence', '-', 'per', '-', 'line', ',', 'with', 'empty', 'lines', 'between', 'documents', '.', '[SEP]', 'this', 'sample', 'text', 'is', 'public', 'domain', 'and', 'was', 'randomly', 'selected', 'from', 'project', 'gut', '##tenberg', '.', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] this text is included to make sure unicode is handled [MASK] : 力 加 勝 ##folk 区 ᴵ ##ᴺ ##ᵀ ##ᵃ ##ছ [MASK] ##ট ##ড ##ণ greasy text should be one [MASK] sentence - per - line , with empty lines [MASK] documents . [SEP] this sample text [MASK] public domain and was [MASK] selected from project gut ##tenberg . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 11 16 23 27 32 38 42 49 54
masked_lm_labels: properly 北 ##জ ##ত - , between is randomly
]
tokens ['[CLS]', 'the', 'rain', 'had', 'only', 'ceased', 'with', 'the', 'gray', 'streaks', 'of', 'morning', 'at', 'blazing', 'star', ',', 'and', 'the', 'settlement', 'awoke', 'to', 'a', 'moral', 'sense', 'of', 'clean', '##liness', ',', 'and', 'the', 'finding', 'of', 'forgotten', 'knives', ',', 'tin', 'cups', ',', 'and', 'smaller', 'camp', 'ut', '##ens', '##ils', ',', 'where', 'the', 'heavy', 'showers', 'had', 'washed', 'away', 'the', 'debris', 'and', 'dust', 'heap', '##s', 'before', 'the', 'cabin', 'doors', '.', '[SEP]', '##r', 'had', 'once', 'picked', 'up', 'on', 'the', 'highway', 'a', 'solid', 'chunk', 'of', 'gold', 'quartz', 'which', 'the', 'rain', 'had', 'freed', 'from', 'its', 'inc', '##umber', '##ing', 'soil', ',', 'and', 'washed', 'into', 'immediate', 'and', 'glittering', 'popularity', '.', 'possibly', 'this', 'may', 'have', 'been', 'the', 'reason', 'why', 'early', 'rise', '##rs', 'in', 'that', 'locality', ',', 'during', 'the', 'rainy', 'season', ',', 'adopted', 'a', 'thoughtful', 'habit', 'of', 'body', ',', 'and', 'seldom', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] the rain had only ceased with [MASK] [MASK] streaks of morning at blazing [MASK] , and the settlement awoke to a moral sense of clean ##liness , and the finding of forgotten knives , tin cups , and smaller camp ut ##ens ##ils , where the heavy showers [MASK] washed away the debris and dust [MASK] ##s before the cabin [MASK] . [SEP] ##r had once picked up on the highway a solid chunk [MASK] gold quartz which the rain [MASK] freed from its inc ##umber ##ing soil , and [MASK] into immediate and [MASK] popularity . possibly this may have been the reason why early rise ##rs [MASK] that locality [MASK] during [MASK] [MASK] [MASK] , adopted a thoughtful habit of body , and seldom [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 7 8 14 24 45 49 56 61 62 75 77 81 91 95 109 112 114 115 116
masked_lm_labels: the gray star of where had heap doors . of quartz had washed glittering in , the rainy season
]
tokens ['[CLS]', 'but', 'not', 'with', 'a', 'view', 'to', 'discovery', '.', 'a', 'leak', 'in', 'his', 'cabin', 'roof', ',', '-', '-', 'quite', 'consistent', 'with', 'his', 'careless', ',', 'imp', '##rov', '##ide', '##nt', 'habits', ',', '-', '-', 'had', 'rouse', '##d', 'him', 'at', '4', 'a', '.', 'm', '.', ',', 'with', 'a', 'flooded', '"', 'bunk', '"', 'and', 'wet', 'blankets', '.', 'the', 'chips', 'from', 'his', 'wood', 'pile', 'refused', 'to', 'kind', '##le', 'a', 'fire', 'to', 'dry', 'his', 'bed', '-', 'clothes', ',', 'and', 'he', 'had', 'rec', '##ours', '##e', 'to', 'a', 'more', 'provide', '##nt', 'neighbor', "'", 's', 'to', 'supply', 'the', 'deficiency', '.', 'this', 'was', 'nearly', 'opposite', '.', 'mr', '.', 'cass', '[SEP]', 'this', 'text', 'is', 'included', 'to', 'make', 'sure', 'unicode', 'is', 'handled', 'properly', ':', '力', '加', '勝', '北', '区', 'ᴵ', '##ᴺ', '##ᵀ', '##ᵃ', '##ছ', '##জ', '##ট', '##ড', '##ণ', '##ত', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] the rain had only ceased with [MASK] [MASK] streaks of morning at blazing [MASK] , and the settlement awoke to a moral sense of clean ##liness , and the finding of forgotten knives , tin cups , and smaller camp ut ##ens ##ils , where the heavy showers [MASK] washed away the debris and dust [MASK] ##s before the cabin [MASK] . [SEP] ##r had once picked up on the highway a solid chunk [MASK] gold quartz which the rain [MASK] freed from its inc ##umber ##ing soil , and [MASK] into immediate and [MASK] popularity . possibly this may have been the reason why early rise ##rs [MASK] that locality [MASK] during [MASK] [MASK] [MASK] , adopted a thoughtful habit of body , and seldom [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 7 8 14 24 45 49 56 61 62 75 77 81 91 95 109 112 114 115 116
masked_lm_labels: the gray star of where had heap doors . of quartz had washed glittering in , the rainy season
, tokens: [CLS] but not with a view to discovery . a [MASK] in his cabin roof , - - [MASK] consistent with his careless , [MASK] ##rov ##ide ##nt habits , - - had rouse ##d him at 4 a . m . , with a flooded [MASK] bunk " and wet blankets . [MASK] chips from [MASK] wood pile ##arion to kind ##le a fire to dry his [MASK] - clothes , and he had [MASK] ##ours ##e to 167 [MASK] provide ##nt neighbor ' s to supply the deficiency . this was nearly opposite . apartments . cass [SEP] this text is included [MASK] make sure unicode is handled properly : 力 加 [MASK] 北 [MASK] ᴵ ##ᴺ ##ᵀ ##ᵃ ##ছ ##জ ##ট ##ড ##ণ [MASK] [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 10 15 16 18 24 46 53 56 59 60 68 75 79 80 96 104 114 116 126
masked_lm_labels: leak , - quite imp " the his refused to bed rec a more mr to 勝 区 ##ত
]
tokens ['[CLS]', 'something', 'glitter', '##ed', 'in', 'the', 'nearest', 'red', 'pool', 'before', 'him', '.', 'gold', ',', 'surely', '!', 'but', ',', 'wonderful', 'to', 'relate', ',', 'not', 'an', 'irregular', ',', 'shape', '##less', 'fragment', 'of', 'crude', 'ore', ',', 'fresh', 'from', 'nature', "'", 's', 'cr', '##ucible', ',', 'but', 'a', 'bit', 'of', 'jewel', '##er', "'", 's', 'hand', '##ic', '##raf', '##t', 'in', 'the', 'form', 'of', 'a', 'plain', 'gold', 'ring', '.', 'looking', 'at', 'it', 'more', 'at', '##ten', '##tively', ',', 'he', 'saw', 'that', 'it', 'bore', 'the', 'inscription', ',', '"', 'may', 'to', 'cass', '.', '"', '[SEP]', 'like', 'most', 'of', 'his', 'fellow', 'gold', '-', 'seekers', ',', 'cass', 'was', 'super', '##sti', '##tious', '.', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] the rain had only ceased with [MASK] [MASK] streaks of morning at blazing [MASK] , and the settlement awoke to a moral sense of clean ##liness , and the finding of forgotten knives , tin cups , and smaller camp ut ##ens ##ils , where the heavy showers [MASK] washed away the debris and dust [MASK] ##s before the cabin [MASK] . [SEP] ##r had once picked up on the highway a solid chunk [MASK] gold quartz which the rain [MASK] freed from its inc ##umber ##ing soil , and [MASK] into immediate and [MASK] popularity . possibly this may have been the reason why early rise ##rs [MASK] that locality [MASK] during [MASK] [MASK] [MASK] , adopted a thoughtful habit of body , and seldom [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 7 8 14 24 45 49 56 61 62 75 77 81 91 95 109 112 114 115 116
masked_lm_labels: the gray star of where had heap doors . of quartz had washed glittering in , the rainy season
, tokens: [CLS] but not with a view to discovery . a [MASK] in his cabin roof , - - [MASK] consistent with his careless , [MASK] ##rov ##ide ##nt habits , - - had rouse ##d him at 4 a . m . , with a flooded [MASK] bunk " and wet blankets . [MASK] chips from [MASK] wood pile ##arion to kind ##le a fire to dry his [MASK] - clothes , and he had [MASK] ##ours ##e to 167 [MASK] provide ##nt neighbor ' s to supply the deficiency . this was nearly opposite . apartments . cass [SEP] this text is included [MASK] make sure unicode is handled properly : 力 加 [MASK] 北 [MASK] ᴵ ##ᴺ ##ᵀ ##ᵃ ##ছ ##জ ##ট ##ড ##ণ [MASK] [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 10 15 16 18 24 46 53 56 59 60 68 75 79 80 96 104 114 116 126
masked_lm_labels: leak , - quite imp " the his refused to bed rec a more mr to 勝 区 ##ত
, tokens: [CLS] something glitter ##ed in the nearest red [MASK] before him . gold , surely ! but , wonderful to relate , [MASK] [MASK] irregular , shape ##less fragment of crude ore [MASK] [MASK] from nature ' s cr ##ucible , but a bit of jewel ##er ' s hand ##ic ##raf ##t in the form of a plain gold [MASK] . [MASK] at it more at ##ten ##tively , he saw that it bore the inscription , " [MASK] to cass . " [SEP] like most of his fellow gold [MASK] seekers , cass [MASK] super [MASK] ##tious . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 8 20 22 23 32 33 55 60 62 64 79 91 94 95 97
masked_lm_labels: pool relate not an , fresh form ring looking it may - cass was ##sti
]
INFO:tensorflow:*** Example ***
INFO:tensorflow:tokens: [CLS] at [MASK] [unused513] reached the quay at the opposite [MASK] of the street ; [SEP] [MASK] , wonderful to relate [MASK] [MASK] an irregular , shape ##less [MASK] of crude ore [MASK] fresh from nature ' s disagreements ##ucible , muse a bit of jewel ##er ' s hand [MASK] ##raf ##t in the form of a plain gold ring . looking at [MASK] [MASK] at ##ten ##tively , he saw that it bore the inscription , " may to cass . " like most [MASK] his fellow gold - seekers , cass [MASK] super ##sti ##tious . [SEP]
INFO:tensorflow:input_ids: 101 2012 103 518 2584 1996 21048 2012 1996 4500 103 1997 1996 2395 1025 102 103 1010 6919 2000 14396 103 103 2019 12052 1010 4338 3238 103 1997 13587 10848 103 4840 2013 3267 1005 1055 23145 21104 1010 18437 1037 2978 1997 13713 2121 1005 1055 2192 103 27528 2102 1999 1996 2433 1997 1037 5810 2751 3614 1012 2559 2012 103 103 2012 6528 25499 1010 2002 2387 2008 2009 8501 1996 9315 1010 1000 2089 2000 16220 1012 1000 2066 2087 103 2010 3507 2751 1011 24071 1010 16220 103 3565 16643 20771 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:masked_lm_positions: 2 3 10 16 21 22 28 32 38 41 50 64 65 86 94 0 0 0 0 0
INFO:tensorflow:masked_lm_ids: 2197 2027 2203 2021 1010 2025 15778 1010 13675 2021 2594 2009 2062 1997 2001 0 0 0 0 0
INFO:tensorflow:masked_lm_weights: 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0
INFO:tensorflow:next_sentence_labels: 1
写入tfrecord的一个完整的数据