文本分类练习二:按照THUCNews的子集对新闻所属类别进行分类

1. 特点:中文数据集、十个类别

2. 工具:TensorFlow

3. 数据集说明及代码示例:https://github.com/gaussic/text-classification-cnn-rnn

4. 对代码示例的run_cnn.py做如下修改(run_rnn.py可做类似修改),并将cnews数据子集放在data文件夹下,即可在PyCharm里运行代码(MacOS + PyCharm + TensorFlow 1.12.0 + Python 3.6)

if __name__ == '__main__':
    # if len(sys.argv) != 2 or sys.argv[1] not in ['train', 'test']:
    #     raise ValueError("""usage: python run_cnn.py [train / test]""")

    print('Configuring CNN model...')
    config = TCNNConfig()
    if not os.path.exists(vocab_dir):  # 如果不存在词汇表,重建
        build_vocab(train_dir, vocab_dir, config.vocab_size)
    categories, cat_to_id = read_category()
    words, word_to_id = read_vocab(vocab_dir)
    config.vocab_size = len(words)
    model = TextCNN(config)

    # if sys.argv[1] == 'train':
    #     train()
    # else:
    #     test()
    train()
    test()

5. 代码输出

/Users/gaoxuanxuan/anaconda3/envs/tensorflow/bin/python /Users/gaoxuanxuan/PycharmProjects/NLP/TextClassification/text-classification-cnn-rnn/run_cnn.py
Configuring CNN model...
WARNING:tensorflow:From /Users/gaoxuanxuan/PycharmProjects/NLP/TextClassification/text-classification-cnn-rnn/cnn_model.py:66: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

Configuring TensorBoard and Saver...
Loading training and validation data...
Time usage: 0:00:24
2019-03-03 21:34:01.237824: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Training and evaluating...
Epoch: 1
Iter:      0, Train Loss:    2.3, Train Acc:   3.12%, Val Loss:    2.3, Val Acc:   9.92%, Time: 0:00:07 *
Iter:    100, Train Loss:   0.86, Train Acc:  78.12%, Val Loss:    1.2, Val Acc:  68.96%, Time: 0:01:17 *
Iter:    200, Train Loss:   0.36, Train Acc:  89.06%, Val Loss:   0.72, Val Acc:  80.48%, Time: 0:02:24 *
Iter:    300, Train Loss:   0.18, Train Acc:  96.88%, Val Loss:   0.43, Val Acc:  89.72%, Time: 0:03:24 *
Iter:    400, Train Loss:   0.14, Train Acc:  96.88%, Val Loss:   0.36, Val Acc:  91.00%, Time: 0:04:22 *
Iter:    500, Train Loss:   0.22, Train Acc:  93.75%, Val Loss:   0.39, Val Acc:  91.16%, Time: 0:05:26 *
Iter:    600, Train Loss:    0.3, Train Acc:  90.62%, Val Loss:   0.33, Val Acc:  92.28%, Time: 0:06:42 *
Iter:    700, Train Loss:   0.11, Train Acc:  95.31%, Val Loss:   0.29, Val Acc:  92.92%, Time: 0:08:45 *
Epoch: 2
Iter:    800, Train Loss:  0.051, Train Acc:  98.44%, Val Loss:   0.29, Val Acc:  92.84%, Time: 0:10:57 
Iter:    900, Train Loss:   0.21, Train Acc:  93.75%, Val Loss:    0.3, Val Acc:  90.86%, Time: 0:12:56 
Iter:   1000, Train Loss:  0.044, Train Acc: 100.00%, Val Loss:   0.29, Val Acc:  91.52%, Time: 0:14:52 
Iter:   1100, Train Loss:   0.13, Train Acc:  98.44%, Val Loss:   0.28, Val Acc:  92.72%, Time: 0:17:00 
Iter:   1200, Train Loss:   0.06, Train Acc:  98.44%, Val Loss:   0.29, Val Acc:  93.14%, Time: 0:19:00 *
Iter:   1300, Train Loss:  0.084, Train Acc:  98.44%, Val Loss:   0.29, Val Acc:  90.76%, Time: 0:21:04 
Iter:   1400, Train Loss:   0.13, Train Acc:  93.75%, Val Loss:   0.19, Val Acc:  94.68%, Time: 0:23:24 *
Iter:   1500, Train Loss:  0.066, Train Acc:  98.44%, Val Loss:   0.21, Val Acc:  94.20%, Time: 0:25:49 
Epoch: 3
Iter:   1600, Train Loss: 0.0086, Train Acc: 100.00%, Val Loss:   0.18, Val Acc:  94.92%, Time: 0:27:49 *
Iter:   1700, Train Loss: 0.0068, Train Acc: 100.00%, Val Loss:   0.21, Val Acc:  94.60%, Time: 0:46:25 
Iter:   1800, Train Loss:  0.039, Train Acc:  98.44%, Val Loss:   0.18, Val Acc:  94.94%, Time: 3:12:36 *
Iter:   1900, Train Loss:  0.043, Train Acc: 100.00%, Val Loss:   0.18, Val Acc:  94.70%, Time: 4:55:34 
Iter:   2000, Train Loss: 0.0047, Train Acc: 100.00%, Val Loss:   0.21, Val Acc:  94.46%, Time: 7:22:15 
Iter:   2100, Train Loss:  0.015, Train Acc: 100.00%, Val Loss:   0.17, Val Acc:  95.26%, Time: 9:09:48 *
Iter:   2200, Train Loss:   0.13, Train Acc:  96.88%, Val Loss:   0.22, Val Acc:  93.24%, Time: 10:55:41 
Iter:   2300, Train Loss:  0.091, Train Acc:  95.31%, Val Loss:   0.22, Val Acc:  93.14%, Time: 12:44:59 
Epoch: 4
Iter:   2400, Train Loss:    0.1, Train Acc:  96.88%, Val Loss:   0.21, Val Acc:  94.24%, Time: 13:53:55 
Iter:   2500, Train Loss:  0.021, Train Acc: 100.00%, Val Loss:   0.19, Val Acc:  94.96%, Time: 15:43:02 
Iter:   2600, Train Loss:  0.012, Train Acc: 100.00%, Val Loss:    0.2, Val Acc:  95.00%, Time: 18:03:03 
Iter:   2700, Train Loss:  0.037, Train Acc:  98.44%, Val Loss:   0.18, Val Acc:  95.18%, Time: 19:59:22 
Iter:   2800, Train Loss:  0.041, Train Acc:  98.44%, Val Loss:    0.2, Val Acc:  94.46%, Time: 22:22:35 
Iter:   2900, Train Loss:  0.022, Train Acc: 100.00%, Val Loss:   0.22, Val Acc:  94.10%, Time: 1 day, 0:11:13 
Iter:   3000, Train Loss:  0.051, Train Acc:  98.44%, Val Loss:   0.18, Val Acc:  95.42%, Time: 1 day, 2:37:24 *
Iter:   3100, Train Loss:   0.14, Train Acc:  96.88%, Val Loss:   0.22, Val Acc:  94.08%, Time: 1 day, 4:24:21 
Epoch: 5
Iter:   3200, Train Loss: 0.0027, Train Acc: 100.00%, Val Loss:   0.18, Val Acc:  95.60%, Time: 1 day, 6:07:38 *
Iter:   3300, Train Loss:  0.001, Train Acc: 100.00%, Val Loss:   0.19, Val Acc:  95.16%, Time: 1 day, 8:19:46 
Iter:   3400, Train Loss: 0.0047, Train Acc: 100.00%, Val Loss:    0.2, Val Acc:  95.36%, Time: 1 day, 10:04:01 
Iter:   3500, Train Loss: 0.0057, Train Acc: 100.00%, Val Loss:   0.19, Val Acc:  95.18%, Time: 1 day, 12:14:28 
Iter:   3600, Train Loss:  0.011, Train Acc: 100.00%, Val Loss:   0.18, Val Acc:  95.26%, Time: 1 day, 13:10:23 
Iter:   3700, Train Loss:  0.076, Train Acc:  98.44%, Val Loss:    0.2, Val Acc:  94.50%, Time: 1 day, 13:12:36 
Iter:   3800, Train Loss: 0.0061, Train Acc: 100.00%, Val Loss:   0.19, Val Acc:  95.64%, Time: 1 day, 13:14:34 *
Iter:   3900, Train Loss:  0.014, Train Acc: 100.00%, Val Loss:    0.2, Val Acc:  94.86%, Time: 1 day, 13:16:45 
Epoch: 6
Iter:   4000, Train Loss:  0.016, Train Acc:  98.44%, Val Loss:   0.22, Val Acc:  94.34%, Time: 1 day, 13:18:47 
Iter:   4100, Train Loss:  0.034, Train Acc:  96.88%, Val Loss:   0.22, Val Acc:  94.82%, Time: 1 day, 13:20:49 
Iter:   4200, Train Loss: 0.0029, Train Acc: 100.00%, Val Loss:   0.23, Val Acc:  94.72%, Time: 1 day, 13:23:00 
Iter:   4300, Train Loss: 0.0052, Train Acc: 100.00%, Val Loss:   0.16, Val Acc:  96.10%, Time: 1 day, 13:24:04 *
Iter:   4400, Train Loss:  0.025, Train Acc:  98.44%, Val Loss:   0.18, Val Acc:  95.36%, Time: 1 day, 13:25:03 
Iter:   4500, Train Loss: 0.0013, Train Acc: 100.00%, Val Loss:   0.21, Val Acc:  95.06%, Time: 1 day, 13:26:02 
Iter:   4600, Train Loss:  0.028, Train Acc:  98.44%, Val Loss:   0.25, Val Acc:  93.72%, Time: 1 day, 13:26:59 
Epoch: 7
Iter:   4700, Train Loss:  0.014, Train Acc:  98.44%, Val Loss:   0.24, Val Acc:  94.42%, Time: 1 day, 13:27:55 
Iter:   4800, Train Loss: 0.0071, Train Acc: 100.00%, Val Loss:   0.18, Val Acc:  95.98%, Time: 1 day, 13:28:51 
Iter:   4900, Train Loss: 0.00074, Train Acc: 100.00%, Val Loss:    0.2, Val Acc:  95.42%, Time: 1 day, 13:29:53 
Iter:   5000, Train Loss: 0.00081, Train Acc: 100.00%, Val Loss:   0.18, Val Acc:  95.60%, Time: 1 day, 13:31:02 
Iter:   5100, Train Loss: 0.00093, Train Acc: 100.00%, Val Loss:   0.19, Val Acc:  95.78%, Time: 1 day, 13:32:01 
Iter:   5200, Train Loss:  0.001, Train Acc: 100.00%, Val Loss:   0.22, Val Acc:  94.86%, Time: 1 day, 13:33:02 
Iter:   5300, Train Loss: 0.0074, Train Acc: 100.00%, Val Loss:   0.19, Val Acc:  95.26%, Time: 1 day, 13:34:01 
No optimization for a long time, auto-stopping...
Loading test data...
Testing...
Test Loss:   0.12, Test Acc:  97.08%
Precision, Recall and F1-Score...
              precision    recall  f1-score   support

          体育       1.00      0.99      0.99      1000
          财经       0.96      0.98      0.97      1000
          房产       1.00      1.00      1.00      1000
          家居       0.98      0.91      0.94      1000
          教育       0.95      0.94      0.95      1000
          科技       0.96      0.99      0.98      1000
          时尚       0.96      0.98      0.97      1000
          时政       0.93      0.97      0.95      1000
          游戏       0.99      0.97      0.98      1000
          娱乐       0.98      0.97      0.98      1000

   micro avg       0.97      0.97      0.97     10000
   macro avg       0.97      0.97      0.97     10000
weighted avg       0.97      0.97      0.97     10000

Confusion Matrix...
[[990   0   0   0   2   2   1   4   1   0]
 [  0 985   0   1   1   3   0  10   0   0]
 [  0   0 997   1   1   0   0   1   0   0]
 [  0  20   2 906  17   5  10  36   1   3]
 [  0   6   0   5 945  13   8  18   2   3]
 [  0   3   0   2   0 988   3   1   2   1]
 [  2   0   0   2   4   1 982   0   2   7]
 [  0   9   0   2  13   5   0 970   1   0]
 [  1   1   1   3   7   2   9   3 970   3]
 [  0   2   0   5   3   5   7   0   3 975]]
Time usage: 0:00:30

Process finished with exit code 0

 

评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值