自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+
  • 博客(3)
  • 资源 (7)
  • 收藏
  • 关注

原创 Could not install packages due to an EnvironmentError: pip install 安装时遇到环境错误的解决办法

具体报错 Found existing installation: setuptools 18.5 Uninstalling setuptools-18.5:Could not install packages due to an EnvironmentError: [('/System/Library/Frameworks/Python.framework/Versions/2....

2019-06-19 19:05:53 2964

原创 Cannot uninstall 'numpy'. It is a distutils installed project and thus we cannot accurately determin

pip2 install tensorflow 安装python27版本的tensorflow的时候报错,报错信息如下:Installing collected packages: wrapt, tensorflow-estimator, numpy, six, keras-preprocessing, absl-py, astor, funcsigs, mock, backports...

2019-06-19 18:52:39 5201

原创 error: could not create 'xxxxxx': Permission denied

先说解决办法————将目标文件夹赋予权限即可:sudo chmod -R 777 xxxxxx然后输入密码。我再重新安装python27的时候遇到了这个问题 pip install python2 时候报错信息如下: creating build/temp.macosx-10.14-intel-2.7 creating build/temp.macosx-10.1...

2019-06-19 18:29:12 8462

中科院高级人工智能符号主义知识点总结.pdf

中国科学院大学 高级人工智能课程 符号主义部分 罗平老师部分 所有考点 重点 总结和证明 有完整的思路曲线 对每一个考点都有涵盖和展开证明 如归结原理的完备性

2020-01-04

ag_news_csv.tgz

496,835 条来自 AG 新闻语料库 4 大类别超过 2000 个新闻源的新闻文章,数据集仅仅援用了标题和描述字段。每个类别分别拥有 30,000 个训练样本及 1900 个测试样本。 README: AG's News Topic Classification Dataset Version 3, Updated 09/09/2015 ORIGIN AG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search, etc), xml, data compression, data streaming, and any other non-commercial activity. For more information, please refer to the link http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html . The AG's news topic classification dataset is constructed by Xiang Zhang ([email protected]) from the dataset above. It is used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015). DESCRIPTION The AG's news topic classification dataset is constructed by choosing 4 largest classes from the original corpus. Each class contains 30,000 training samples and 1,900 testing samples. The total number of training samples is 120,000 and testing 7,600. The file classes.txt contains a list of classes corresponding to each label. The files train.csv and test.csv contain all the training samples as comma-sparated values. There are 3 columns in them, corresponding to class index (1 to 4), title and description. The title and description are escaped using double quotes ("), and any internal double quote is escaped by 2 double quotes (""). New lines are escaped by a backslash followed with an "n" character, that is "\n".

2019-05-28

ag_news数据集

496,835 条来自 AG 新闻语料库 4 大类别超过 2000 个新闻源的新闻文章,数据集仅仅援用了标题和描述字段。每个类别分别拥有 30,000 个训练样本及 1900 个测试样本。 README: AG's News Topic Classification Dataset Version 3, Updated 09/09/2015 ORIGIN AG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search, etc), xml, data compression, data streaming, and any other non-commercial activity. For more information, please refer to the link http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html . The AG's news topic classification dataset is constructed by Xiang Zhang ([email protected]) from the dataset above. It is used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015). DESCRIPTION The AG's news topic classification dataset is constructed by choosing 4 largest classes from the original corpus. Each class contains 30,000 training samples and 1,900 testing samples. The total number of training samples is 120,000 and testing 7,600. The file classes.txt contains a list of classes corresponding to each label. The files train.csv and test.csv contain all the training samples as comma-sparated values. There are 3 columns in them, corresponding to class index (1 to 4), title and description. The title and description are escaped using double quotes ("), and any internal double quote is escaped by 2 double quotes (""). New lines are escaped by a backslash followed with an "n" character, that is "\n".

2019-05-28

fine_tuning_data.zip 可直接用bert进行微调的中文情绪数据

具体使用方法可以看我的博客:https://blog.csdn.net/weixin_40015791/article/details/90410083 下面也会简单介绍一下:在bert开源代码中的run_classifier.py中找到 processors = { "cola": ColaProcessor, "mnli": MnliProcessor, "mrpc": MrpcProcessor, "xnli": XnliProcessor, "intentdetection":IntentDetectionProcessor, "emotion":EmotionProcessor, #新加上这一行 } 然后在该文件中增加一个class: class EmotionProcessor(DataProcessor): """Processor for the MRPC data set (GLUE version).""" def get_train_examples(self, data_dir): """See base class.""" return self._create_examples( self._read_tsv(os.path.join(data_dir, "fine_tuning_train_data.tsv")), "train") #此处的名字和文件夹中的训练集的名字要保持一致 def get_dev_examples(self, data_dir): """See base class.""" return self._create_examples( self._read_tsv(os.path.join(data_dir, "fine_tuning_val_data.tsv")), "dev") def get_test_examples(self, data_dir): """See base class.""" return self._create_examples( self._read_tsv(os.path.join(data_dir, "fine_tuning_test_data.tsv")), "test") def get_labels(self): """See base class.""" return ["0", "1","2","3","4","5","6"] #七分类则从0到6 def _create_examples(self, lines, set_type): """Creates examples for the training and dev sets.""" examples = [] for (i, line) in enumerate(lines): if i == 0: continue guid = "%s-%s" % (set_type, i) if set_type == "test": label = "0" text_a = tokenization.convert_to_unicode(line[0]) else: label = tokenization.convert_to_unicode(line[0]) text_a = tokenization.convert_to_unicode(line[1]) examples.append( InputExample(guid=guid, text_a=text_a, text_b=None, label=label)) return examples 最后直接调用即可,运行的命令如下: python run_classifier.py \ --task_name=emotion \ --do_train=true \ --do_eval=true \ --data_dir=data \ #把数据解压到同一级的文件夹中,此处是该文件夹名字data --vocab_file=chinese_L-12_H-768_A-12/vocab.txt \ #中文数据要微调的原始bert模型 --bert_config_file=chinese_L-12_H-768_A-12/bert_config.json \ --init_checkpoint=chinese_L-12_H-768_A-12/bert_model.ckpt \ --max_seq_length=128 \ --train_batch_size=32 \ --learning_rate=2e-5 \ --num_train_epochs=3.0 \ --output_dir=output #生成文件所在的文件夹 大概9个小时,最后文件夹中会有三个文件 后缀分别为index/meta/00000-of-00001,分别将这个改成bert_model.ckpt.index/bert_model.ckpt.meta/bert_model.ckpt.data-00000-of-00001,再在同一个文件夹中放入chinese_L-12_H-768_A-12中的vocab.txt和bert_config.json 即最后该文件夹中有5个文件。然后像调用chinese_L-12_H-768_A-12一样将文件夹名改成自己的文件夹名即可。 bert-serving-start -model_dir output -num_worfer=3 即可调用微调后的语言通用模型。

2019-05-21

来自于NLPCC2013,解析成txt文件 不均衡分类 中文情感分析7类情感.zip

来自于NLPCC2013,解析后每一行为 情感\t句子 共有七类情感,且分布不均衡,划分训练集和测试集后数据数量为 1488 anger_data.txt 186 anger_test.txt 186 anger_val.txt 8:1:1 2459 disgust_data.txt 307disgust_test.txt 307 disgust_val.txt 8:1:1 201 fear_data.txt 50 fear_test.txt 50 fear_val.txt 4:1:1 2298happiness_data.txt 287happiness_test.txt 287happiness_val.txt 8:1:1 3286 like_data.txt 410 like_test.txt 410 like_val.txt 8:1:1 1917 sadness_data.txt 239 sadness_test.txt 239 sadness_val.txt 8:1:1 626 surprise_data.txt 78 surprise_test.txt 78 surprise_val.txt 8:1:1

2019-05-17

Multi-Domain Sentiment Datase英文情感分析数据 正负情感 semantic_data.zip

Multi-Domain Sentiment Dataset解析成txt文件,只提取出文本和对应标签。 positive和negative二分类。包括dvd,kitchen,books,electronics四个domain数据,每一个domain分别有positive和negative数据各1000条。 每一行为lable\tSentence。 详见https://www.cs.jhu.edu/~mdredze/publications/sentiment_acl07.pdf

2019-05-17

CapsuleNetWork:从TensorFlow复现代码理解胶囊网络(DynamicRoutingBetweenCapsules)

从TensorFlow复现代码理解胶囊网络(Dynamic Routing Between Capsules) 论文链接:https://arxiv.org/abs/1710.09829 Tensorflow代码复现链接:https://github.com/naturomics/CapsNet-Tensorflow

2018-10-17

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

提示
确定要删除当前文章?
取消 删除