Bert 输出及例子

最新推荐文章于 2024-08-26 15:27:18 发布

桃汽宝

最新推荐文章于 2024-08-26 15:27:18 发布

阅读量3.5k

点赞数 9

分类专栏： Pytorch MRC Bert

本文链接：https://blog.csdn.net/weixin_44317740/article/details/113248250

版权

Pytorch 同时被 3 个专栏收录

30 篇文章 0 订阅

订阅专栏

MRC

4 篇文章 0 订阅

订阅专栏

Bert

2 篇文章 0 订阅

订阅专栏

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("bert-base-chinese")
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 624/624 [00:00<00:00, 171kB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 110k/110k [00:01<00:00, 109kB/s]
model = AutoModel.from_pretrained("bert-base-chinese")
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 412M/412M [08:38<00:00, 794kB/s]
model.eval()
BertModel(
  (embeddings): BertEmbeddings(
    (word_embeddings): Embedding(21128, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (token_type_embeddings): Embedding(2, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): BertEncoder(
    (layer): ModuleList(
      (0): BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (intermediate): BertIntermediate(
          (dense): Linear(in_features=768, out_features=3072, bias=True)
        )
        (output): BertOutput(
          (dense): Linear(in_features=3072, out_features=768, bias=True)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
      (1): BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (intermediate): BertIntermediate(
          (dense): Linear(in_features=768, out_features=3072, bias=True)
        )
        (output): BertOutput(
          (dense): Linear(in_features=3072, out_features=768, bias=True)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
      (2): BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (intermediate): BertIntermediate(
          (dense): Linear(in_features=768, out_features=3072, bias=True)
        )
        (output): BertOutput(
          (dense): Linear(in_features=3072, out_features=768, bias=True)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
      (3): BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (intermediate): BertIntermediate(
          (dense): Linear(in_features=768, out_features=3072, bias=True)
        )
        (output): BertOutput(
          (dense): Linear(in_features=3072, out_features=768, bias=True)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
      (4): BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (intermediate): BertIntermediate(
          (dense): Linear(in_features=768, out_features=3072, bias=True)
        )
        (output): BertOutput(
          (dense): Linear(in_features=3072, out_features=768, bias=True)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
      (5): BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (intermediate): BertIntermediate(
          (dense): Linear(in_features=768, out_features=3072, bias=True)
        )
        (output): BertOutput(
          (dense): Linear(in_features=3072, out_features=768, bias=True)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
      (6): BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (intermediate): BertIntermediate(
          (dense): Linear(in_features=768, out_features=3072, bias=True)
        )
        (output): BertOutput(
          (dense): Linear(in_features=3072, out_features=768, bias=True)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
      (7): BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (intermediate): BertIntermediate(
          (dense): Linear(in_features=768, out_features=3072, bias=True)
        )
        (output): BertOutput(
          (dense): Linear(in_features=3072, out_features=768, bias=True)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
      (8): BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (intermediate): BertIntermediate(
          (dense): Linear(in_features=768, out_features=3072, bias=True)
        )
        (output): BertOutput(
          (dense): Linear(in_features=3072, out_features=768, bias=True)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
      (9): BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (intermediate): BertIntermediate(
          (dense): Linear(in_features=768, out_features=3072, bias=True)
        )
        (output): BertOutput(
          (dense): Linear(in_features=3072, out_features=768, bias=True)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
      (10): BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (intermediate): BertIntermediate(
          (dense): Linear(in_features=768, out_features=3072, bias=True)
        )
        (output): BertOutput(
          (dense): Linear(in_features=3072, out_features=768, bias=True)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
      (11): BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (intermediate): BertIntermediate(
          (dense): Linear(in_features=768, out_features=3072, bias=True)
        )
        (output): BertOutput(
          (dense): Linear(in_features=3072, out_features=768, bias=True)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
  )
  (pooler): BertPooler(
    (dense): Linear(in_features=768, out_features=768, bias=True)
    (activation): Tanh()
  )
)
inputs = tokenizer("我想快点发论文", return_tensors="pt")
outputs = model(**inputs)
print(inputs)
{'input_ids': tensor([[ 101, 2769, 2682, 2571, 4157, 1355, 6389, 3152,  102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1]])}

> token_type_ids: This tensor will map every tokens to their corresponding segment (see below).
> attention_mask: This tensor is used to "mask" padded values in a batch of sequence with different lengths (see below).

print(outputs)
(tensor([[[-0.2791,  0.3020,  0.4071,  ...,  0.2707,  0.5302, -0.7799],
         [ 0.5064, -0.5631,  0.6345,  ..., -1.0366,  0.2625, -0.1994],
         [ 0.2194, -1.4004, -0.6083,  ..., -0.0954,  1.3527,  0.1086],
         ...,
         [-1.3459,  0.2393, -0.1635,  ...,  0.2543,  0.3820, -0.6676],
         [-0.1877,  0.2440, -0.7461,  ...,  0.7493,  1.2351, -0.6387],
         [-0.6366,  0.0020,  0.1719,  ...,  0.5995,  1.0313, -0.5601]]],
       grad_fn=<NativeLayerNormBackward>), tensor([[ 0.9998,  1.0000,  0.9645,  0.9815,  0.8940,  0.9501, -0.8997, -0.0834,
          0.9970, -0.9983,  1.0000,  0.9376, -0.7621, -0.9907,  0.9997, -0.9996,
         -0.3555,  0.9997,  0.9910,  0.3025,  1.0000, -1.0000, -0.9852,  0.4327,
          0.5883,  0.9987,  0.9467, -0.9238, -1.0000,  0.9963,  0.9257,  0.9996,
          0.9616, -1.0000, -0.9998,  0.9128, -0.3828,  0.9679, -0.9496, -0.9962,
         -0.9628, -0.9890,  0.9602, -0.9962, -0.9925,  0.3300, -1.0000, -1.0000,
          0.2199,  1.0000, -0.6426, -0.9999, -0.7568,  0.6757, -0.3493,  0.9893,
         -0.9990,  0.9890,  1.0000,  0.9505,  0.9990, -0.9834,  0.3967, -0.9999,
          1.0000, -0.9995, -0.9630,  0.9326,  1.0000,  1.0000, -0.0279,  0.9966,
          1.0000,  0.9974, -0.2731,  0.8287, -0.9996,  0.8198, -1.0000,  0.6264,
          1.0000,  0.9987, -0.8188,  0.9369, -0.9729, -0.9999, -1.0000,  0.9999,
         -0.4467,  0.9947,  0.9993, -0.9996, -1.0000,  0.9974, -0.9990, -0.9990,
         -0.9679,  0.9996, -0.8413,  0.2180, -0.6675,  0.9137, -0.7086, -0.8763,
          0.9966,  0.9984,  0.4311, -0.9998,  0.9999,  0.6164, -1.0000, -0.8914,
         -0.9612, -0.8292, -0.9808,  0.9998,  0.6123, -0.7170,  0.9969, -0.8448,
          0.7740, -0.9997, -0.9848, -0.9205,  0.9992,  1.0000,  0.9951, -0.9998,
          0.9070,  1.0000,  0.9754,  0.9973, -0.9578,  0.9927,  0.9775, -0.9949,
          0.5645, -0.7802,  1.0000,  0.9860,  0.9956, -0.9736,  0.9999, -0.9947,
          1.0000, -1.0000,  0.9985, -1.0000, -0.9999,  0.9999,  0.9669,  1.0000,
         -0.9731,  1.0000, -0.9966, -0.9997,  0.9994,  0.9355,  0.9992, -1.0000,
          0.9987, -0.7195,  0.1207,  0.3249, -1.0000,  0.9999, -0.8803,  1.0000,
          0.9976, -0.9951, -0.9669, -0.9974,  0.8122, -0.9981, -0.9993,  0.9339,
          0.1426,  0.9979, -0.9054, -0.9364,  0.9816, -0.9799, -1.0000,  0.9898,
         -0.1943,  0.9433,  0.9775, -0.2970,  0.9901,  0.9886, -0.0538,  0.9997,
          0.0103,  0.9945,  0.9997,  0.0133, -0.9757, -0.9922, -1.0000, -0.0833,
          1.0000,  0.0641, -0.9990,  0.9401, -1.0000,  0.8613, -0.9716, -0.8053,
         -0.9990, -1.0000,  0.9998, -0.9673, -0.9978,  0.7857, -0.8995,  0.0669,
         -0.9998,  0.8634,  0.9447,  0.8611,  0.6718, -0.9648, -0.9996,  0.9958,
         -0.9966, -0.1681,  0.9996,  1.0000,  0.9989,  0.1419,  0.7520,  0.9611,
          0.9263, -1.0000,  0.9811, -0.9986, -0.9358,  0.9998, -0.9983,  0.9885,
          1.0000,  0.9542,  1.0000, -0.8647, -0.9969, -0.9969,  1.0000,  0.9947,
          0.9998, -0.9988, -1.0000,  0.4735, -0.3244, -1.0000, -0.9999, -0.8571,
          0.9965,  1.0000,  0.8488, -0.9947, -0.9789, -0.9988,  1.0000, -0.9985,
          1.0000,  0.9533, -0.9769, -0.9691,  0.7433, -0.9720, -0.9994, -0.7511,
         -1.0000, -0.9813, -0.9999,  0.9946, -0.9994, -1.0000,  0.9784,  0.9999,
          0.9394, -1.0000,  0.9962,  0.9996, -0.9706, -0.9989,  0.9318, -1.0000,
          1.0000, -0.9965,  0.6798, -0.7347, -0.9824, -0.9889,  0.9999,  0.9979,
         -0.7660, -0.8894, -0.9910, -0.9998, -0.0392,  0.9967, -0.9843,  0.9958,
         -0.7512, -0.9937,  0.9647, -0.9993, -0.9992, -0.3546,  1.0000, -0.7632,
          1.0000,  0.9875,  1.0000,  0.9377, -0.9981,  0.9975,  0.3702, -0.7220,
         -0.9867, -0.9982,  0.9020,  0.3270, -0.7151, -0.9999,  1.0000,  0.9963,
          0.9862,  0.9793, -0.8209,  0.2737,  0.9307, -0.9918,  0.9983, -0.9997,
         -0.9612,  0.9971,  1.0000,  0.9990,  0.6963, -0.5588,  0.9991, -0.9821,
          0.9992, -0.9999,  0.9996, -0.9828,  0.9922, -0.8171, -0.9853,  1.0000,
          0.9108, -0.6582,  0.9999, -0.9978,  0.9967,  0.9996,  0.9976,  0.9977,
          0.8987,  1.0000, -0.7873, -0.9080, -0.9510, -0.9975, -0.9966, -1.0000,
          0.7787, -0.9999, -0.9692, -0.9584,  0.1324,  0.7053, -0.7222, -0.3167,
         -0.8888,  0.5796, -0.9228,  0.2450,  0.9680, -0.9947, -0.9684, -1.0000,
         -0.9987,  0.9993,  1.0000, -1.0000,  0.9365, -1.0000, -0.9996,  0.9907,
         -0.8451, -0.7223,  0.9997, -1.0000, -0.4415,  0.9998,  1.0000,  0.9986,
          1.0000,  0.1226, -1.0000, -0.9999, -1.0000, -1.0000, -0.9998,  0.9929,
          0.9578, -1.0000, -0.9742,  0.7187,  1.0000,  0.9738, -0.9971, -0.8223,
         -0.9919, -0.9984,  0.9314, -0.8885, -0.9992,  0.9994, -0.4979,  1.0000,
         -0.7022,  0.9989,  0.9748,  0.9106,  0.9810, -1.0000,  0.9882,  1.0000,
          0.9607, -1.0000, -0.7742, -0.9313, -1.0000, -0.2988,  0.9512,  0.9999,
         -1.0000, -0.8857, -0.9675,  0.3590,  0.9727,  0.9997,  0.9981,  0.9886,
          0.9842,  0.9912,  0.6137,  1.0000,  0.6532, -0.9994,  0.9954,  0.5485,
          0.5065, -0.9515,  0.9988,  0.3796,  1.0000,  0.9921,  0.1803, -0.9859,
         -0.9986,  0.9672,  1.0000, -0.9498, -0.4640, -0.9991, -1.0000, -0.9871,
         -0.4957,  0.5904, -0.9589, -0.9993,  0.1991,  0.9651,  1.0000,  1.0000,
          0.9999, -0.9969, -0.6101,  0.9843,  0.1021,  0.9973, -0.8886, -1.0000,
         -0.9251, -0.9997,  1.0000, -0.9677, -0.5649, -0.6651, -0.5434,  0.8480,
         -1.0000, -0.9628, -0.9986,  0.2291,  1.0000, -0.9999,  0.9837, -0.9990,
          0.2159,  0.6134,  0.5105,  0.9952, -0.9759, -0.7753, -0.7218, -0.6030,
          0.9849,  0.9954, -0.9997,  0.7897,  0.9976, -0.9826,  0.9989,  0.5673,
          0.9489,  0.9599,  1.0000,  0.9166,  0.9998,  0.9644,  1.0000,  0.9999,
         -0.9951,  0.8626,  0.8856, -0.8395, -0.4970,  0.9975,  0.9999, -0.8229,
         -0.9770, -0.9996,  0.9991,  1.0000,  1.0000, -0.8960,  0.9961, -0.9934,
          0.9984,  0.6706,  0.7453,  0.1472,  0.6518,  0.9986,  0.9955, -0.9999,
         -1.0000, -1.0000,  1.0000,  0.9999, -0.8356, -1.0000,  0.9984, -0.9732,
          0.9919,  0.9817,  0.9507, -0.9920,  0.9976, -0.9972,  0.3327,  0.3722,
          0.3626,  0.6790,  0.9978, -0.9987,  0.9520,  1.0000, -0.8148,  1.0000,
          0.5202, -0.9999,  1.0000, -0.9998, -0.9996, -0.2209,  1.0000,  0.9991,
          0.8423, -0.1879,  0.9997, -1.0000,  0.9999, -0.9999, -0.9204, -0.9967,
          0.9998, -0.9961, -0.9971, -0.8639,  0.9536,  0.9987, -0.9585,  0.9999,
         -0.7864, -0.9968, -0.4349, -0.9943, -0.9930, -0.9829, -0.8985, -1.0000,
          0.7084, -0.6908, -0.8356, -0.9948, -1.0000,  1.0000, -0.9500, -0.9885,
          1.0000, -0.9992, -1.0000,  0.9897, -0.9980,  0.6656,  0.9634,  0.9705,
          0.4371, -1.0000,  0.9464,  1.0000, -0.8603, -0.9867, -0.9457, -0.9981,
          0.7120,  0.9582,  0.9953, -0.6211,  0.8999,  0.8723,  0.7517, -0.5116,
          0.8407, -0.9990, -0.9974, -0.9970, -0.7413, -1.0000, -1.0000,  1.0000,
          1.0000,  1.0000, -0.6371, -0.6810,  0.7162,  0.9956, -0.9997, -0.8777,
          0.9699,  0.8117,  0.9271, -0.9991, -0.8475, -1.0000, -0.9698,  0.2998,
         -0.9945, -0.0448,  1.0000,  0.9998, -0.9997, -0.9898, -0.9219, -0.9991,
          0.9996,  0.9984,  0.9968, -0.9933, -0.5550,  0.9754, -0.7183, -0.9151,
         -0.9991, -0.9382, -1.0000,  0.9057, -0.9885, -1.0000,  0.9973,  1.0000,
          0.9242, -1.0000,  0.0660,  1.0000,  0.9342,  1.0000,  0.8927,  0.9997,
         -0.9955,  0.9968, -0.9999,  1.0000, -1.0000,  1.0000,  1.0000,  0.9991,
          0.9965, -0.9897,  0.9681, -0.9609,  0.5623,  0.9879, -0.6333, -0.9794,
          0.7290,  0.6294, -0.9982,  1.0000,  0.9107, -0.7149,  0.9776, -0.1279,
          0.9970,  0.0862, -0.9417,  0.9984,  1.0000,  0.9805,  1.0000,  0.9411,
          1.0000, -0.8665, -0.9994,  0.9989, -0.5152, -0.9079, -1.0000,  1.0000,
          0.9624, -1.0000, -0.9439, -0.5568,  0.6311,  1.0000,  0.9984,  0.9976,
          0.9940,  0.8607,  0.9985, -0.9319,  0.8799, -0.9996, -0.9878,  1.0000,
         -0.9966,  0.9999, -0.9905,  0.9212, -0.9967,  0.9057,  0.8096,  0.9610,
         -0.9925,  1.0000,  0.9798, -0.9822, -0.9991, -0.9775, -0.9966,  0.9923]],
       grad_fn=<TanhBackward>))
print(outputs[0].shape)
torch.Size([1, 9, 768])
print(outputs[1].shape)
torch.Size([1, 768])

BERT 输出两个张量：（另外可选输出两个张量具体内容）
1、outputs[0]是last_hidden_state
outputs[0]是每个token的表示，形状为(1, NB_TOKENS, REPRESENTATION_SIZE)具体内容
是基于token级别的

The first, token-based, representation can be leveraged if your task requires to keep the sequence representation and you want to operate at a token-level. This is particularly useful for Named Entity Recognition and Question-Answering.

实际一共包括四个维度，按照顺序分别是[# layers, # batches, # tokens, # features]：具体内容
The layer number，层数 (12 layers)
The batch number ，句子数(1 sentence)
The word / token number ，token数(22 tokens in our sentence)
The hidden unit / feature number (768 features)

其第一维是个列表


# `encoded_layers` is a Python list.print('     Type of encoded_layers: ', type(encoded_layers)) 

     Type of encoded_layers:  <class 'list'>

2、outputs[1]是pooler_output
outputs[1]是整个输入的合并表达，形状为(1, REPRESENTATION_SIZE)
提取整篇文章的表达，不是基于token级别的

The second, aggregated, representation is especially useful if you need to extract the overall context of the sequence and don’t require a fine-grained token-level. This is the case for Sentiment-Analysis of the sequence or Information Retrieval.

>>> options = ['意气风发', '街谈巷议']
>>> inputs = tokenizer(options, return_tensors="pt")
>>> outputs = model(**inputs)
>>> print(outputs[0].shape)
torch.Size([2, 6, 768])

example of pipeline

 from transformers import pipeline
>>> question_answerer = pipeline('question-answering')
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 473/473 [00:00<00:00, 124kB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 261M/261M [07:06<00:00, 612kB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 213k/213k [00:02<00:00, 106kB/s]
question_answerer({
...      'question': 'What is the name of the repository ?',
...      'context': 'Pipeline have been included in the huggingface/transformers repository'
... })
{'score': 0.5135956406593323, 'start': 35, 'end': 59, 'answer': 'huggingface/transformers'}