keras 的 example 文件 imdb_fasttext.py 解析

该文件功能上也是文本情感分类

默认的代码中 ngram_range = 1,这就差不多是常规的NLP处理,编号跟一个 Embedding,这就比较简单

所以我们还是分析一下 ngram_range > 1的情况,我们先设置 ngram_range = 2,这样的话,x_train 中的第一个句子,首先会进行如下变换,原句:

[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 19193, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 10311, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 12118, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]

变换后:

{(16, 6), (18, 4), (4, 173), (39, 4), (469, 4), (4613, 469), (4536, 1111), (130, 12), (28, 224), (26, 400), (77, 52), (112, 167), (76, 15), (215, 28), (19, 14), (226, 65), (65, 458), (38, 76), (12, 215), (13, 1247), (317, 46), (381, 15), (16, 4472), (36, 256), (407, 16), (25, 100), (32, 15), (194, 7486), (5, 723), (88, 4), (141, 6), (385, 39), (15, 297), (26, 141), (2025, 19), (16, 38), (103, 32), (5952, 15), (447, 4), (22, 12), (670, 2), (33, 4), (5, 14), (82, 10311), (16, 283), (18, 51), (13, 104), (28, 77), (2071, 56), (15, 16), (12, 8), (973, 1622), (15, 13), (1247, 4), (400, 317), (46, 7), (3766, 5), (17, 546), (515, 17), (124, 51), (14, 22), (1385, 65), (1334, 88), (297, 98), (22, 16), (56, 26), (1622, 1385), (480, 5), (25, 104), (10311, 8), (112, 50), (71, 87), (224, 92), (178, 32), (106, 5), (147, 2025), (5, 4), (104, 88), (19, 178), (38, 619), (144, 30), (25, 1415), (150, 4), (4, 12118), (626, 18), (87, 12), (107, 117), (43, 530), (476, 26), (104, 4), (2, 7), (3785, 33), (117, 5952), (5, 144), (65, 16), (192, 50), (7, 4), (98, 32), (226, 22), (17, 12), (4, 226), (66, 3941), (16, 82), (25, 124), (32, 2071), (5, 62), (546, 38), (35, 480), (173, 36), (1029, 13), (16, 5345), (4, 22), (386, 12), (530, 476), (284, 5), (3941, 4), (36, 135), (5535, 18), (5244, 16), (1, 14), (12118, 1029), (172, 4536), (316, 8), (530, 38), (33, 6), (8, 316), (5, 150), (256, 4), (7, 3766), (7486, 18), (88, 12), (167, 2), (4472, 113), (22, 17), (4, 2223), (22, 4), (6, 194), (1920, 4613), (172, 112), (48, 25), (4, 2), (838, 112), (43, 838), (4, 192), (6, 147), (16, 626), (5, 16), (1111, 17), (19193, 5), (723, 36), (4, 172), (22, 71), (16, 480), (135, 48), (38, 1334), (8, 4), (4, 381), (2, 336), (530, 973), (18, 19193), (619, 5), (4468, 66), (62, 386), (51, 36), (8, 106), (17, 515), (30, 5535), (458, 4468), (38, 13), (12, 16), (21, 134), (13, 447), (480, 66), (4, 1920), (16, 43), (2, 9), (66, 3785), (1415, 33), (15, 256), (336, 385), (5, 25), (5345, 19), (6, 22), (283, 5), (71, 43), (50, 670), (14, 407), (92, 25), (22, 21), (113, 103), (134, 476), (50, 16), (256, 5), (4, 130), (100, 43), (36, 71), (36, 28), (26, 480), (9, 35), (2223, 5244), (52, 5), (4, 107), (480, 284)}

 

这个,直接看看不懂是吧,上面注释中有一个简单的示例,就是说当 ngram_range = 2 时,如果输入是

[1, 4, 9, 4, 1, 4]

那么输出就是

{(4, 9), (4, 1), (1, 4), (9, 4)}

就是按照顺序两两组合,并去掉重复项

如果 ngram_value=3 时,如果输入是

[1, 4, 9, 4, 1, 4]

那么输出就是

[(1, 4, 9), (4, 9, 4), (9, 4, 1), (4, 1, 4)]

而外面有一个for循环,所以,当 ngram_value=3 时,上面那个句子就会变为:

{(1, 14, 22), (16, 6), (4472, 113, 103), (66, 3941, 4), (16, 82, 10311), (18, 4), (4, 173), (4613, 469), (130, 12), (26, 141, 6), (26, 400), (316, 8, 106), (12, 16, 43), (38, 1334, 88), (226, 65), (147, 2025, 19), (16, 5345, 19), (135, 48, 25), (65, 458), (104, 4, 226), (381, 15), (16, 4472), (4, 1920, 4613), (36, 256), (18, 4, 226), (4468, 66, 3941), (14, 22, 4), (71, 87, 12), (194, 7486), (5, 723), (88, 4), (141, 6), (385, 39), (144, 30, 5535), (2025, 19), (5244, 16, 480), (16, 38), (2071, 56, 26), (5952, 15), (22, 12), (33, 4), (18, 51), (43, 530, 38), (12118, 1029, 13), (28, 77), (15, 16), (12, 8), (973, 1622), (15, 13), (16, 6, 147), (476, 26, 400), (1247, 4), (3766, 5), (172, 4536, 1111), (17, 546), (4, 107, 117), (56, 26, 141), (48, 25, 1415), (1385, 65, 458), (22, 71, 87), (13, 1247, 4), (28, 224, 92), (1334, 88), (22, 16), (56, 26), (71, 43, 530), (317, 46, 7), (5, 16, 4472), (25, 104), (10311, 8), (224, 92), (16, 626, 18), (106, 5), (38, 619), (144, 30), (17, 515, 17), (25, 1415), (50, 16, 6), (150, 4), (6, 22, 12), (626, 18), (87, 12), (43, 530), (2, 7), (4, 2, 7), (117, 5952), (5, 144), (150, 4, 172), (65, 16), (6, 147, 2025), (385, 39, 4), (5, 14, 407), (5, 144, 30), (1334, 88, 12), (7, 4), (16, 38, 1334), (297, 98, 32), (226, 22), (16, 283, 5), (2, 7, 3766), (17, 12), (66, 3941), (32, 15, 16), (25, 124), (32, 2071), (5, 62), (35, 480), (400, 317, 46), (4, 192, 50), (19, 14, 22), (8, 4, 107), (39, 4, 172), (107, 117, 5952), (141, 6, 194), (284, 5), (36, 135), (52, 5, 14), (21, 134, 476), (5244, 16), (9, 35, 480), (1, 14), (50, 670, 2), (12118, 1029), (172, 4536), (10311, 8, 4), (530, 38), (226, 65, 16), (8, 316), (5, 150), (88, 12), (7, 3766), (18, 19193, 5), (8, 316, 8), (167, 2), (4, 12118, 1029), (113, 103, 32), (6, 194, 7486), (112, 50, 670), (22, 17), (4, 2223), (51, 36, 28), (22, 4), (6, 194), (256, 5, 25), (172, 112), (480, 284, 5), (838, 112), (16, 626), (4, 192), (530, 973, 1622), (256, 4, 2), (2, 336, 385), (1111, 17), (19193, 5), (82, 10311, 8), (16, 480), (22, 71), (135, 48), (5535, 18, 51), (38, 1334), (336, 385, 39), (16, 38, 619), (12, 16, 38), (4, 381), (2, 336), (480, 66, 3785), (117, 5952, 15), (619, 5), (62, 386), (51, 36), (12, 8, 316), (36, 135, 48), (4536, 1111, 17), (30, 5535), (458, 4468), (723, 36, 71), (21, 134), (98, 32, 2071), (480, 66), (838, 112, 50), (626, 18, 19193), (530, 476, 26), (16, 43), (13, 104, 88), (66, 3785), (1415, 33), (92, 25, 104), (65, 16, 38), (15, 256), (65, 458, 4468), (28, 77, 52), (619, 5, 25), (283, 5), (71, 43), (2223, 5244, 16), (50, 670), (14, 407), (38, 13, 447), (167, 2, 336), (22, 21, 134), (7, 3766, 5), (4, 22, 71), (50, 16), (1111, 17, 546), (476, 26, 480), (4, 130), (3941, 4, 173), (22, 4, 1920), (25, 124, 51), (88, 12, 16), (36, 71), (26, 480), (9, 35), (5952, 15, 256), (4, 107), (4613, 469, 4), (973, 1622, 1385), (4, 172, 4536), (4, 226, 65), (39, 4), (12, 16, 626), (469, 4), (16, 480, 66), (4536, 1111), (28, 224), (173, 36, 256), (77, 52), (36, 256, 5), (112, 167), (76, 15), (16, 4472, 113), (215, 28), (19, 14), (43, 530, 973), (172, 112, 167), (458, 4468, 66), (317, 46), (38, 76), (12, 215), (13, 1247), (407, 16), (407, 16, 82), (25, 100), (32, 15), (194, 7486, 18), (4, 22, 17), (36, 28, 224), (88, 4, 381), (15, 297), (4, 226, 22), (130, 12, 16), (26, 141), (103, 32), (447, 4), (670, 2), (77, 52, 5), (5, 4, 2223), (5, 14), (82, 10311), (16, 283), (13, 104), (2071, 56), (2, 9, 35), (5345, 19, 178), (400, 317), (46, 7), (4, 172, 112), (66, 3785, 33), (33, 6, 22), (515, 17), (124, 51), (14, 22), (5, 150, 4), (19193, 5, 62), (19, 178, 32), (1385, 65), (297, 98), (546, 38, 13), (4, 2223, 5244), (32, 2071, 56), (1622, 1385), (12, 16, 283), (480, 5), (104, 88, 4), (2025, 19, 14), (112, 50), (71, 87), (13, 447, 4), (22, 12, 215), (178, 32), (5, 4), (147, 2025), (104, 88), (19, 178), (15, 13, 1247), (226, 22, 21), (22, 17, 515), (447, 4, 192), (4, 381, 15), (4, 12118), (107, 117), (476, 26), (25, 100, 43), (104, 4), (22, 16, 43), (3785, 33), (8, 106, 5), (192, 50), (98, 32), (4, 226), (26, 400, 317), (134, 476, 26), (18, 51, 36), (16, 82), (530, 38, 76), (15, 256, 4), (546, 38), (173, 36), (670, 2, 9), (1029, 13), (16, 5345), (4, 22), (386, 12), (530, 476), (386, 12, 8), (3941, 4), (5535, 18), (1247, 4, 22), (17, 12, 16), (469, 4, 22), (316, 8), (33, 6), (256, 4), (51, 36, 135), (7486, 18), (76, 15, 13), (25, 104, 4), (4472, 113), (224, 92, 25), (1920, 4613), (17, 546, 38), (38, 619, 5), (1622, 1385, 65), (48, 25), (4, 2), (62, 386, 12), (43, 838), (283, 5, 16), (6, 147), (15, 297, 98), (5, 16), (100, 43, 838), (723, 36), (215, 28, 77), (36, 71, 43), (4, 172), (5, 723, 36), (3785, 33, 4), (1029, 13, 104), (8, 4), (192, 50, 16), (530, 973), (26, 480, 5), (43, 838, 112), (18, 19193), (112, 167, 2), (515, 17, 12), (4468, 66), (103, 32, 15), (8, 106), (3766, 5, 723), (4, 130, 12), (17, 515), (38, 13), (12, 16), (13, 447), (43, 530, 476), (381, 15, 297), (4, 1920), (7486, 18, 4), (46, 7, 4), (7, 4, 12118), (2, 9), (480, 5, 144), (336, 385), (5, 25), (1920, 4613, 469), (5345, 19), (15, 16, 5345), (6, 22), (14, 22, 16), (38, 76, 15), (92, 25), (22, 21), (5, 25, 100), (4, 173, 36), (113, 103), (134, 476), (1415, 33, 6), (25, 1415, 33), (87, 12, 16), (256, 5), (284, 5, 150), (35, 480, 284), (30, 5535, 18), (124, 51, 36), (106, 5, 4), (100, 43), (14, 407, 16), (16, 43, 530), (36, 28), (5, 62, 386), (2223, 5244), (33, 4, 130), (52, 5), (5, 25, 124), (12, 215, 28), (480, 284)}

然后对这一堆拆分合并出来的东西进行编码,如

(15833, 395):20001
(217, 17, 10655):20002
(999, 55, 76):20003
(9805, 1031, 17419):20004
(424, 383, 139):20005
(6213, 139):20006
(190, 4, 1631):20007
(4, 1300, 20):20008
(181, 8, 1271):20009
(1818, 11, 2642):20010
(26, 11, 25):20011
(2159, 80, 376):20012
(171, 5392, 306):20013
(6, 1703, 56):20014
(25, 701):20015
(18, 85, 2851):20016
(3048, 23, 111):20017
(93, 35, 4843):20018
(569, 56):20019
(2876, 60):20020
(42, 110, 17):20021

然后对x_train重新进行编码:对上面那一行句子,变换为如下形式:

[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 19193, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 10311, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 12118, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32, 2072507, 1266161, 2923546, 1154584, 1857063, 1616068, 1714474, 4329328, 3275896, 444409, 4244999, 357021, 2204386, 4017303, 2436358, 2414342, 4564906, 3770144, 3500479, 872174, 1863798, 943035, 4384822, 3298316, 195462, 1705545, 3445315, 2679380, 3897218, 1369975, 1444853, 2587497, 1223247, 1834019, 1337290, 4361931, 3051937, 1919727, 4240690, 3591932, 3247084, 1834019, 2667860, 1494110, 1668106, 4598984, 3336776, 1875202, 805524, 414096, 3487212, 376682, 1203953, 3715275, 2906786, 663022, 405105, 462105, 1266161, 366345, 535987, 1350662, 1664509, 385648, 774138, 3219183, 4457427, 1608894, 175325, 1154584, 1857063, 1069587, 2481469, 3957733, 3541066, 1912467, 3279661, 774138, 3526297, 4601076, 633982, 4334780, 175325, 1945899, 805117, 2640072, 3052531, 4551342, 4317108, 3008452, 325845, 4599613, 554077, 2657378, 1074210, 1845337, 2295163, 585777, 1856528, 2955556, 3303299, 2983986, 3078909, 881581, 1234294, 1293338, 175325, 3110261, 4404168, 3227723, 3500479, 1324117, 1270770, 621300, 1546451, 3489999, 2370600, 2453465, 2706265, 4340821, 2465168, 215741, 781388, 2768420, 787354, 4580535, 3649558, 793926, 4284058, 2808043, 2214076, 37409, 1631135, 4313922, 2423293, 4546416, 1925874, 3957466, 1566896, 3167924, 3482887, 4649257, 1533431, 2710579, 1273401, 1371645, 339091, 2955020, 1857063, 1048333, 371041, 4632662, 4462444, 4184160, 3683844, 1130189, 4103711, 634353, 3859054, 4111666, 4652943, 280577, 3342187, 2548984, 512978, 2450927, 623789, 4333054, 422568, 2075762, 2843163, 4361077, 1399187, 2888161, 1212796, 1645243, 4086532, 2231317, 1269240, 3947951, 1087730, 371041, 270624, 1653006, 1535085, 4120352, 3588582, 4601017, 2544481, 621300, 1837573, 4555554, 2941959, 1858587, 3278478, 1338615, 4086532, 270907, 893670, 3110261, 4682638, 3198549, 431569, 175325, 1288546, 3081025, 3950486, 2776183, 288850, 3764569, 1166022, 562043, 2037223, 1079073, 394421, 1224674, 2967649, 3535840, 447226, 4241137, 4175770, 1241778, 497478, 4309649, 4621166, 955680, 2298045, 3548735, 4076356, 3233816, 951955, 2732147, 3275823, 1028990, 2957268, 1138575, 3921875, 3503889, 1136819, 4662389, 3643878, 2644868, 1738871, 2149326, 3878050, 3198969, 4573590, 844683, 3040121, 3424259, 4502983, 3380639, 3118615, 2634687, 4512611, 4413319, 2735167, 3312975, 789314, 3200807, 4325231, 3584612, 4136033, 845060, 1595397, 788821, 4229838, 4232459, 3930004, 1332077, 4354177, 4538725, 781097, 1121875, 3778939, 1483516, 758071, 2903423, 220146, 3444506, 1326694, 3389795, 2446126, 3737982, 4223093, 2303557, 4175770, 584750, 276981, 3810085, 3854999, 2580922, 3449499, 395999, 1056156, 3775697, 4578057, 1800179, 1537039, 3737664, 3690970, 316277, 1358314, 974070, 2642928, 1471662, 275848, 1221052, 1177149, 1380969, 3529334, 3540380, 3340025, 714322, 4284620, 678093, 822874, 517410, 3403412, 1294402, 4462099, 3216640, 3192564, 3209961, 1912923, 3187483, 519710, 621657, 3411325, 2154118, 247815, 4357275, 2942429, 3613380, 1139125, 3667706, 751415, 1782248, 4411849, 3876765, 4368462, 1276321, 3554934, 767011, 3559939, 2170613, 334046, 4072955, 2837364, 937534, 1290614, 3842601, 2188555, 2878704, 3218101, 3641236, 3562312, 3220767, 2439327, 1987147, 2672246, 3085759, 124840, 1185458, 3827304, 1361435, 2081509, 430292, 2887994, 4631142, 2464758, 3138060, 828823, 352647, 4431844, 2448218, 2374328, 2055547, 3297674, 4432172, 4354034, 1001325, 1245515, 3294590, 1131018, 1588305, 2727175, 972595, 3823373, 844324, 109698, 105837, 3951184, 2410181, 1097190, 1999140, 4499486, 1801256, 1985466, 4617078, 1036023, 452670, 3985467, 3014724, 3740459, 3028703, 4260887, 3758711, 1522965, 4456916, 3459929, 1894454, 3679016, 208032, 127063, 818206, 784645, 4472892, 2195399, 2486473, 754806, 1113581, 767330, 4647531, 2237173, 3330033, 2104257, 2367187, 1055957, 4402226, 2179797, 294426, 288755]

变换理由是:

(1, 14):2072507
(14, 22):1266161
(22, 16):2923546
(16, 43):1154584
(43, 530):1857063
(530, 973):1616068
(973, 1622):1714474
(1622, 1385):4329328
(1385, 65):3275896
(65, 458):444409
(458, 4468):4244999
(4468, 66):357021
(66, 3941):2204386
(3941, 4):4017303
(4, 173):2436358
(173, 36):2414342
(36, 256):4564906
(256, 5):3770144
(5, 25):3500479
(25, 100):872174
(100, 43):1863798
(43, 838):943035
(838, 112):4384822
(112, 50):3298316
(50, 670):195462
(670, 2):1705545
(2, 9):3445315
(9, 35):2679380
(35, 480):3897218
(480, 284):1369975
(284, 5):1444853
(5, 150):2587497
(150, 4):1223247
(4, 172):1834019
(172, 112):1337290
(112, 167):4361931
(167, 2):3051937
(2, 336):1919727
(336, 385):4240690
(385, 39):3591932
(39, 4):3247084
(4, 172):1834019
(172, 4536):2667860
(4536, 1111):1494110
(1111, 17):1668106
(17, 546):4598984
(546, 38):3336776
(38, 13):1875202
(13, 447):805524
(447, 4):414096
(4, 192):3487212
(192, 50):376682
(50, 16):1203953
(16, 6):3715275
(6, 147):2906786
(147, 2025):663022
(2025, 19):405105
(19, 14):462105
(14, 22):1266161
(22, 4):366345
(4, 1920):535987
(1920, 4613):1350662
(4613, 469):1664509
(469, 4):385648
(4, 22):774138
(22, 71):3219183
(71, 87):4457427
(87, 12):1608894
(12, 16):175325
(16, 43):1154584
(43, 530):1857063
(530, 38):1069587
(38, 76):2481469
(76, 15):3957733
(15, 13):3541066
(13, 1247):1912467
(1247, 4):3279661
(4, 22):774138
(22, 17):3526297
(17, 515):4601076
(515, 17):633982
(17, 12):4334780
(12, 16):175325
(16, 626):1945899
(626, 18):805117
(18, 19193):2640072
(19193, 5):3052531
(5, 62):4551342
(62, 386):4317108
(386, 12):3008452
(12, 8):325845
(8, 316):4599613
(316, 8):554077
(8, 106):2657378
(106, 5):1074210
(5, 4):1845337
(4, 2223):2295163
(2223, 5244):585777
(5244, 16):1856528
(16, 480):2955556
(480, 66):3303299
(66, 3785):2983986
(3785, 33):3078909
(33, 4):881581
(4, 130):1234294
(130, 12):1293338
(12, 16):175325
(16, 38):3110261
(38, 619):4404168
(619, 5):3227723
(5, 25):3500479
(25, 124):1324117
(124, 51):1270770
(51, 36):621300
(36, 135):1546451
(135, 48):3489999
(48, 25):2370600
(25, 1415):2453465
(1415, 33):2706265
(33, 6):4340821
(6, 22):2465168
(22, 12):215741
(12, 215):781388
(215, 28):2768420
(28, 77):787354
(77, 52):4580535
(52, 5):3649558
(5, 14):793926
(14, 407):4284058
(407, 16):2808043
(16, 82):2214076
(82, 10311):37409
(10311, 8):1631135
(8, 4):4313922
(4, 107):2423293
(107, 117):4546416
(117, 5952):1925874
(5952, 15):3957466
(15, 256):1566896
(256, 4):3167924
(4, 2):3482887
(2, 7):4649257
(7, 3766):1533431
(3766, 5):2710579
(5, 723):1273401
(723, 36):1371645
(36, 71):339091
(71, 43):2955020
(43, 530):1857063
(530, 476):1048333
(476, 26):371041
(26, 400):4632662
(400, 317):4462444
(317, 46):4184160
(46, 7):3683844
(7, 4):1130189
(4, 12118):4103711
(12118, 1029):634353
(1029, 13):3859054
(13, 104):4111666
(104, 88):4652943
(88, 4):280577
(4, 381):3342187
(381, 15):2548984
(15, 297):512978
(297, 98):2450927
(98, 32):623789
(32, 2071):4333054
(2071, 56):422568
(56, 26):2075762
(26, 141):2843163
(141, 6):4361077
(6, 194):1399187
(194, 7486):2888161
(7486, 18):1212796
(18, 4):1645243
(4, 226):4086532
(226, 22):2231317
(22, 21):1269240
(21, 134):3947951
(134, 476):1087730
(476, 26):371041
(26, 480):270624
(480, 5):1653006
(5, 144):1535085
(144, 30):4120352
(30, 5535):3588582
(5535, 18):4601017
(18, 51):2544481
(51, 36):621300
(36, 28):1837573
(28, 224):4555554
(224, 92):2941959
(92, 25):1858587
(25, 104):3278478
(104, 4):1338615
(4, 226):4086532
(226, 65):270907
(65, 16):893670
(16, 38):3110261
(38, 1334):4682638
(1334, 88):3198549
(88, 12):431569
(12, 16):175325
(16, 283):1288546
(283, 5):3081025
(5, 16):3950486
(16, 4472):2776183
(4472, 113):288850
(113, 103):3764569
(103, 32):1166022
(32, 15):562043
(15, 16):2037223
(16, 5345):1079073
(5345, 19):394421
(19, 178):1224674
(178, 32):2967649
(1, 14, 22):3535840
(14, 22, 16):447226
(22, 16, 43):4241137
(16, 43, 530):4175770
(43, 530, 973):1241778
(530, 973, 1622):497478
(973, 1622, 1385):4309649
(1622, 1385, 65):4621166
(1385, 65, 458):955680
(65, 458, 4468):2298045
(458, 4468, 66):3548735
(4468, 66, 3941):4076356
(66, 3941, 4):3233816
(3941, 4, 173):951955
(4, 173, 36):2732147
(173, 36, 256):3275823
(36, 256, 5):1028990
(256, 5, 25):2957268
(5, 25, 100):1138575
(25, 100, 43):3921875
(100, 43, 838):3503889
(43, 838, 112):1136819
(838, 112, 50):4662389
(112, 50, 670):3643878
(50, 670, 2):2644868
(670, 2, 9):1738871
(2, 9, 35):2149326
(9, 35, 480):3878050
(35, 480, 284):3198969
(480, 284, 5):4573590
(284, 5, 150):844683
(5, 150, 4):3040121
(150, 4, 172):3424259
(4, 172, 112):4502983
(172, 112, 167):3380639
(112, 167, 2):3118615
(167, 2, 336):2634687
(2, 336, 385):4512611
(336, 385, 39):4413319
(385, 39, 4):2735167
(39, 4, 172):3312975
(4, 172, 4536):789314
(172, 4536, 1111):3200807
(4536, 1111, 17):4325231
(1111, 17, 546):3584612
(17, 546, 38):4136033
(546, 38, 13):845060
(38, 13, 447):1595397
(13, 447, 4):788821
(447, 4, 192):4229838
(4, 192, 50):4232459
(192, 50, 16):3930004
(50, 16, 6):1332077
(16, 6, 147):4354177
(6, 147, 2025):4538725
(147, 2025, 19):781097
(2025, 19, 14):1121875
(19, 14, 22):3778939
(14, 22, 4):1483516
(22, 4, 1920):758071
(4, 1920, 4613):2903423
(1920, 4613, 469):220146
(4613, 469, 4):3444506
(469, 4, 22):1326694
(4, 22, 71):3389795
(22, 71, 87):2446126
(71, 87, 12):3737982
(87, 12, 16):4223093
(12, 16, 43):2303557
(16, 43, 530):4175770
(43, 530, 38):584750
(530, 38, 76):276981
(38, 76, 15):3810085
(76, 15, 13):3854999
(15, 13, 1247):2580922
(13, 1247, 4):3449499
(1247, 4, 22):395999
(4, 22, 17):1056156
(22, 17, 515):3775697
(17, 515, 17):4578057
(515, 17, 12):1800179
(17, 12, 16):1537039
(12, 16, 626):3737664
(16, 626, 18):3690970
(626, 18, 19193):316277
(18, 19193, 5):1358314
(19193, 5, 62):974070
(5, 62, 386):2642928
(62, 386, 12):1471662
(386, 12, 8):275848
(12, 8, 316):1221052
(8, 316, 8):1177149
(316, 8, 106):1380969
(8, 106, 5):3529334
(106, 5, 4):3540380
(5, 4, 2223):3340025
(4, 2223, 5244):714322
(2223, 5244, 16):4284620
(5244, 16, 480):678093
(16, 480, 66):822874
(480, 66, 3785):517410
(66, 3785, 33):3403412
(3785, 33, 4):1294402
(33, 4, 130):4462099
(4, 130, 12):3216640
(130, 12, 16):3192564
(12, 16, 38):3209961
(16, 38, 619):1912923
(38, 619, 5):3187483
(619, 5, 25):519710
(5, 25, 124):621657
(25, 124, 51):3411325
(124, 51, 36):2154118
(51, 36, 135):247815
(36, 135, 48):4357275
(135, 48, 25):2942429
(48, 25, 1415):3613380
(25, 1415, 33):1139125
(1415, 33, 6):3667706
(33, 6, 22):751415
(6, 22, 12):1782248
(22, 12, 215):4411849
(12, 215, 28):3876765
(215, 28, 77):4368462
(28, 77, 52):1276321
(77, 52, 5):3554934
(52, 5, 14):767011
(5, 14, 407):3559939
(14, 407, 16):2170613
(407, 16, 82):334046
(16, 82, 10311):4072955
(82, 10311, 8):2837364
(10311, 8, 4):937534
(8, 4, 107):1290614
(4, 107, 117):3842601
(107, 117, 5952):2188555
(117, 5952, 15):2878704
(5952, 15, 256):3218101
(15, 256, 4):3641236
(256, 4, 2):3562312
(4, 2, 7):3220767
(2, 7, 3766):2439327
(7, 3766, 5):1987147
(3766, 5, 723):2672246
(5, 723, 36):3085759
(723, 36, 71):124840
(36, 71, 43):1185458
(71, 43, 530):3827304
(43, 530, 476):1361435
(530, 476, 26):2081509
(476, 26, 400):430292
(26, 400, 317):2887994
(400, 317, 46):4631142
(317, 46, 7):2464758
(46, 7, 4):3138060
(7, 4, 12118):828823
(4, 12118, 1029):352647
(12118, 1029, 13):4431844
(1029, 13, 104):2448218
(13, 104, 88):2374328
(104, 88, 4):2055547
(88, 4, 381):3297674
(4, 381, 15):4432172
(381, 15, 297):4354034
(15, 297, 98):1001325
(297, 98, 32):1245515
(98, 32, 2071):3294590
(32, 2071, 56):1131018
(2071, 56, 26):1588305
(56, 26, 141):2727175
(26, 141, 6):972595
(141, 6, 194):3823373
(6, 194, 7486):844324
(194, 7486, 18):109698
(7486, 18, 4):105837
(18, 4, 226):3951184
(4, 226, 22):2410181
(226, 22, 21):1097190
(22, 21, 134):1999140
(21, 134, 476):4499486
(134, 476, 26):1801256
(476, 26, 480):1985466
(26, 480, 5):4617078
(480, 5, 144):1036023
(5, 144, 30):452670
(144, 30, 5535):3985467
(30, 5535, 18):3014724
(5535, 18, 51):3740459
(18, 51, 36):3028703
(51, 36, 28):4260887
(36, 28, 224):3758711
(28, 224, 92):1522965
(224, 92, 25):4456916
(92, 25, 104):3459929
(25, 104, 4):1894454
(104, 4, 226):3679016
(4, 226, 65):208032
(226, 65, 16):127063
(65, 16, 38):818206
(16, 38, 1334):784645
(38, 1334, 88):4472892
(1334, 88, 12):2195399
(88, 12, 16):2486473
(12, 16, 283):754806
(16, 283, 5):1113581
(283, 5, 16):767330
(5, 16, 4472):4647531
(16, 4472, 113):2237173
(4472, 113, 103):3330033
(113, 103, 32):2104257
(103, 32, 15):2367187
(32, 15, 16):1055957
(15, 16, 5345):4402226
(16, 5345, 19):2179797
(5345, 19, 178):294426
(19, 178, 32):288755

然后也是进行 pad_sequences,pad之后的shape也是

x_train shape: (25000, 400)
x_test shape: (25000, 400)

然后送入神经网络进行训练,ngram_range = 3 时,神经网络的结构为:

________________________________________________________________________________________________________________________
Layer (type)                                          Output Shape                                    Param #
========================================================================================================================
embedding_1 (Embedding)                               (None, 400, 50)                                 234148350
________________________________________________________________________________________________________________________
global_average_pooling1d_1 (GlobalAveragePooling1D)   (None, 50)                                      0
________________________________________________________________________________________________________________________
dense_1 (Dense)                                       (None, 1)                                       51
========================================================================================================================
Total params: 234,148,401
Trainable params: 234,148,401
Non-trainable params: 0
________________________________________________________________________________________________________________________

如果ngram_range = 1,神经网络结构为:

________________________________________________________________________________________________________________________
Layer (type)                                          Output Shape                                    Param #
========================================================================================================================
embedding_1 (Embedding)                               (None, 400, 50)                                 1000000
________________________________________________________________________________________________________________________
global_average_pooling1d_1 (GlobalAveragePooling1D)   (None, 50)                                      0
________________________________________________________________________________________________________________________
dense_1 (Dense)                                       (None, 1)                                       51
========================================================================================================================
Total params: 1,000,051
Trainable params: 1,000,051
Non-trainable params: 0
________________________________________________________________________________________________________________________

可以看到参数量大了两百倍,因为原来max_features = 20000,而加上ngram之后,max_features  = 4682967,

如果ngram_range = 1 时如果GPU空间还够的话,可能加上ngram之后,GPU空间有可能就不足了;

看起来就是用ngram 来 代替原来简陋的 pad,提高一下识别效果;

以上是我的理解,如有错误欢迎提出

——————————————————————

总目录

keras的example文件解析

### 回答1: 这个错误提示是在Python中出现的,意思是在某个文件的 "__init__.py" 中无法找到名为 "keras" 的引用。这通常是因为代码中使用了某个名为 "keras" 的库或模块,但是该库或模块没有正确安装或导入。 要解决这个问题,可以尝试以下几个步骤: 1. 确保已经安装了名为 "keras" 的库或模块。可以使用命令行工具或包管理器来安装它,例如: pip install keras。 2. 检查代码中是否正确导入了 "keras" 库或模块。导入语句通常写在代码文件的开头,例如: import keras。 3. 检查是否有其他代码或库与 "keras" 冲突,例如命名相同,导致无法正确引用 "keras"。如果有冲突,需要修改代码以确保正确引用 "keras"。 希望这些步骤能够帮助解决问题。 ### 回答2: 这个错误提示说明在`__init__.py`文件中找不到名为'keras'的引用。当我们在Python项目中使用`import`语句导入一个模块时,Python会在指定的路径中查找对应文件并执行引入操作。如果在`__init__.py`文件中无法找到相应的引用,通常是由于以下几种可能原因导致的: 1. 模块未正确安装:如果运行的是自己开发的项目或者从其他来源下载的项目,可能没有正确地安装所需的库或模块。解决办法是使用`pip`或`conda`等软件包管理工具安装所需的库,例如`pip install keras`。 2. 名称拼写错误:检查代码中引用的模块名称是否正确拼写,Python对大小写敏感。例如,如果实际名称为`Keras`而不是`keras`,则会报错,需要修复引用的名称。 3. 环境变量设置错误:有时,Python解释器可能无法找到正确的库路径。可以检查是否正确设置了环境变量,特别是`PYTHONPATH`,以确保解释器可以找到库。 4. 模块版本不兼容:某些库或模块可能有不兼容的版本。检查所使用的库的版本要求,确保所使用的版本与当前项目兼容。 如果以上解决方法都无效,可能需要进一步调查和排查其他可能的原因。可以查看具体的错误信息和堆栈跟踪,以了解更多细节,并搜索相关的技术文档、论坛或社区以寻求帮助。 ### 回答3: 在解析器中,这个错误通常表示在“__init__.py文件中找不到“keras”这个模块的引用。出现这个错误可能有以下几个原因: 1. Keras没有正确安装:请检查您的计算机上是否已经安装了Keras库。可以通过在终端或命令提示符窗口中运行“pip install keras”来安装最新版本的Keras。 2. 环境变量设置错误:如果Keras已经安装但仍然出现这个错误,可能是由于环境变量设置错误导致的。请确保您已经正确设置了PYTHONPATH和PATH环境变量,以便解析器可以找到Keras库。 3. 文件路径错误:在“__init__.py文件中找不到Keras引用可能是由于文件路径设置错误导致的。请确保您的代码文件(含'__init__.py')在正确的目录中,并且Keras文件也在相同的目录中。 4. Keras版本不兼容:如果您使用的是较旧的Keras版本,可能会导致找不到引用的错误。请尝试更新到最新版本的Keras,并确保您的代码与所使用的版本兼容。 如果您仔细检查以上几个方面,并且问题仍然存在,那可能是由于其他的因素引起的错误。在这种情况下,您可以参考官方文档或寻求专家的帮助来解决这个问题。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值