原创 Share some pictures about my training


2018-06-06 16:33:26 333

原创 智能客服baseline

模块设计1、 问题理解问题分类、关键词识别、相似问题扩展2、 知识检索、排序相似度计算(lcs,BM25,tfidf+cosin,word2vec+cosin)3、 答案生成候选答案选择、排序。问题分类确定分类体系 标注分类数据训练分类模型模型选择与参数调节关键词识别Tfidf信息熵,互信息,卡方语义角色标注相似...

2018-05-14 15:55:46 1200

原创 短文本相似度计算

短文本相似度计算方法最长公共子序列编辑距离相同单词个数/序列长度word2vec+余弦相似度Sentence2Vector https://blog.csdn.net/qjzcy/article/details/51882959?spm= structured semantic models)(BOW/CNN/RNN) https:...

2018-05-02 20:32:02 15255

原创 人民币金额大写转小写

将大写的金额转换成小写 对:“壹佰零壹亿叁仟零叁拾万零陆佰零陆元玖角玖分” 按”亿”,”万”,”元”进行切分 得到 [‘壹佰零壹’, ‘叁仟零叁拾’, ‘零陆佰零陆’, ‘玖角玖分’] 对list中每一个串进行转换然后乘以它的进制 最后累加import remoney_dict={ "壹":1, "贰":2,"叁":3, "肆":4, "伍":5,"陆":6,&qu

2018-04-24 16:38:22 3880 1

转载 Beam Search To Optimal Beam Search

Beam Search—>Optimal Beam Search原文 Beam Search Diverse Beam Search原文 example diverse-beam-search Optimal Beam Search…

2018-03-22 13:41:52 301

原创 从图片中提取人脸

提取一个人脸from PIL import Imageimport face_recognition# Load the jpg file into a numpy arrayimage = face_recognition.load_image_file("one.jpg")# Find all the faces in the image using the default H...

2018-03-06 16:53:55 4181 1

原创 linux 命令

后台运行python 代码nohup python work.py & 少量文本win->linuxrz -nbe

2017-11-05 21:13:17 215

转载 同步锁synchronized 关键字的使用效果

package com.sf.LianCheng.demo;public class Runnable_demo implements Runnable{ private int ticket=10; public Runnable_demo(){ } @Override public void run() {

2017-08-18 09:23:09 526

原创 python 三元组找上下位相同的词

import jiebaimport nltkf=open("corpus.txt", 'r', encoding='utf-8',)sents=[]for line in f: sents.extend(jieba.cut(line.strip()))finder=nltk.collocations.TrigramCollocationFinder .from_words(sen

2017-08-16 18:20:48 1304

转载 tomcat7 类加载机制

1.OverviewLike many server applications, Tomcat installs a variety of class loaders (that is, classes that implement java.lang.ClassLoader) to allow different portions of the container, and the web app

2017-08-16 15:28:47 292

原创 python jieba

# encoding=utf-8import jiebaseg_list = jieba.cut("我来到北京清华大学", cut_all=True)print("Full Mode: " + "/ ".join(seg_list)) # 全模式seg_list = jieba.cut("我来到北京清华大学", cut_all=False)print("Default Mode: " + "

2017-08-16 13:52:44 882

原创 python word2vec

from gensim.models import Word2Vecfrom gensim.models.word2vec import LineSentencedef gen_embeddings(in_file, out_file, size=100): corpus = LineSentence(in_file) model = Word2Vec( sente

2017-08-16 13:46:54 805

原创 pythono nltk 元组

import nltk#使用 strip()方法删除输入行结尾的换行符。f=open("LianCheng.txt", 'r', encoding='utf-8',)sents=[]for line in f: sents.append(line.strip().split("\t"))sents[0]['供热', '双方', '室内', '温度', '存在', '争议', '时'

2017-08-16 13:32:42 575

原创 textsum

TypeError: a bytes-like object is required, not ‘str’TypeError: sequence item 0: expected str instance, bytes found RuntimeError: Coordinator stopped with threads still running: Thread-32[[ 21 21

2017-07-22 11:07:17 945

原创 win7 elasticsearch5.4.1安装

下载elasticsearch5.4.1 https://www.elastic.co/downloads/elasticsearch 解压, 然后在C:\elasticsearch-5.4.1\bin目录中运行elasticsearch.batC:\elasticsearch-5.4.1\bin>elasticsearch.bat 【sd3be0q】即为当前默认节点,显示started说明

2017-06-16 07:00:19 1536 2

原创 solr5.5.4扩展ansj_lucene5

solr5.5.4 http://mirror.bit.edu.cn/apache/lucene/solr/ansj https://github.com/NLPchina/ansj_seg 下载ansj源码,在ansj_lucene5_plug中添加org.ansj.solr.AnsjTokenizerFactorypackage org.ansj.solr;import java.i

2017-06-13 15:15:34 1090

原创 jupyterhub test

$jupyter notebook --generate-configIn [1]: from notebook.auth import passwdIn [2]: passwd()Enter password: Verify password: Out[2]: 'sha1:a...............b'$vim ~/.jupyter/jupyter_notebook_config.p

2017-05-10 10:01:31 773

转载 spring boot

application.properties 1.5.2# LOGGINGlogging.config= # Location of the logging configuration file. For instance `classpath:logback.xml` for Logbacklogging.exception-conversion-word=%wEx # Conversion

2017-05-09 16:31:43 632

原创 nested exception is java.lang.NoSuchMethodError: com.fasterxml.jackson.core.JsonGenerator.writeStart

spring boot 搭建的web工程中@RestController@RequestMapping("/")public class solrController { @RequestMapping(value = "/demo.json", method = RequestMethod.GET) @ResponseBody public HashMap<String,

2017-04-20 17:59:05 23977 1

原创 es

[elsearch@solr1 bin]$ ./elasticsearch 错误: 找不到或无法加载主类 org.elasticsearch.tools.JavaVersionChecker Elasticsearch requires at least Java 8 but your Java version from /usr/java/jdk1.8.0_91/bin/java does n

2017-04-06 14:43:19 3563

原创 20170329

My TensorBoard isn’t showing any data! What’s wrong?The first thing to do is ensure that TensorBoard is properly loading data from the correct directory. Launch tensorboard --logdir=DIRECTORY_PATH --de

2017-03-29 16:47:35 355

原创 20170323

linux 环境变量[root@master1 ~]# export[root@master1 ~]# echo $PATHvim ~/.bashrcvim /etc/profile命令输入:export PATH=$PATH:/mypath文件中加入:export PATH="/home/mypath/anaconda3/bin:$PATH"

2017-03-23 17:17:02 292

2017-03-01 14:37:31 399

原创 operation_hbase

package solr_search.tsf.hbase.domain;import java.io.IOException;import java.lang.reflect.Field;import java.util.ArrayList;import java.util.HashMap;import java.util.LinkedHashMap;import java.util.L

2017-03-01 14:33:09 264

原创 win7 keras

安装的是anaconda2 1. 下载安装Anaconda Python Distribution,网址:https://www.continuum.io/downloads#_windows 2. 在打开的Anaconda Prompt的命令行中输入“pip install keras” 3. 再接着输命令“conda install mingw libpython”下载theano

2017-01-13 11:38:34 481

原创 错误

def tran(document: Iterable[_]): mutable.Map[Int,String] = { val termFrequencies = mutable.HashMap.empty[Int, String] val hashFunc: Any => Int = getHashFunction document.foreach { term =>

2016-12-09 20:23:52 318

原创 spark file streams

For reading data from files on any file system compatible with the HDFS API (that is, HDFS, S3, NFS, etc.), a DStream can be created as:streamingContext.fileStream[KeyClass, ValueClass, InputFormatClas

2016-12-06 11:38:07 376

转载 Grouping by Query

In this example, we will use the parameter to find the top three results for “memory” in two group.query different price ranges: 0.00 to 99.99, and over 100.http://localhost:8983/solr/techproducts/sel

2016-12-04 16:41:44 349

转载 spark hbase

HBase 的 CRUD 操作新版 API 中加入了 Connection,HAdmin成了Admin,HTable成了Table,而Admin和Table只能通过Connection获得。Connection的创建是个重量级的操作,由于Connection是线程安全的,所以推荐使用单例,其工厂方法需要一个HBaseConfiguration。val conf = HBaseConfiguratio

2016-12-02 10:24:08 450

转载 spark2.0 tfidf

MLpackage org.apache.spark.ml.featureimport org.apache.spark.annotation.Sinceimport org.apache.spark.ml.Transformerimport org.apache.spark.ml.attribute.AttributeGroupimport org.apache.spark.ml.param

2016-11-29 18:28:28 1130

原创 spring4包

Spring AOP:Spring的面向切面编程,提供AOP(Aspect-Oriented Programming,面向切面编程)的实现 Spring Aspects:Spring提供的对AspectJ框架的整合 Spring Beans:Spring IOC(Inversion of Control,控制倒转)的基础实现,包含访问配置文件、创建和管理bean等。 Spring Contex

2016-11-28 09:22:12 398

转载 java 反射demo

import java.lang.reflect.*;public class ReflectDemo { /** * 为了看清楚Java反射部分代码,所有异常我都最后抛出来给虚拟机处理 */ public static void main(String[] args) throws ClassNotFoundException, IllegalAccessExce

2016-11-22 11:21:45 537

原创 spark svm

dataimputimport org.apache.spark.{SparkConf, SparkContext}import org.apache.spark.mllib.linalg.Vectorsimport org.apache.spark.mllib.regression.LabeledPointimport org.apache.spark.mllib.util.MLUtils

2016-11-18 09:07:46 851

原创 theano GPU

http://deeplearning.net/software/theano/tutorial/using_gpu.html安装cuda 下载:https://developer.nvidia.com/cuda-downloads添加环境变量$ export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}$ export LD_LIBRARY_PATH

2016-11-04 15:28:31 306

原创 Ubuntu下添加开机启动

1.将你的启动jupyter.sh脚本复制到 /etc/init.d目录下 2.设置脚本执行权限sudo chmod 755 /etc/init.d/jupyter.sh3.将脚本放到启动脚本中去$ cd /etc/init.d$ sudo update-rc.d jupyter.sh defaults 100在jupyter.sh脚本中添加LSB信息#!/bin/sh### BEGIN

2016-11-04 15:05:06 652

springboot jsp demo

spring boot jsp demo



JupyterHub Documentation 1 Contents 3 1.1 Quickstart - Installation 3 1.2 Getting started with JupyterHub 6 1.3 How JupyterHub works 13 1.4 Web Security in JupyterHub 15 1.5 Using JupyterHub’s REST API 16 1.6 Authenticators 17 1.7 Spawners 19 1.8 Services 22 1.9 Configuration examples24 1.10 Troubleshooting 28 1.11 The JupyterHub API 32 1.12 Change log summary 39 1.13 Contributors 41 2 Indices and tables 43 3 Questions? Suggestions? 45 Python Module Inde



用与crf分词,标注训练语料。 nlpcc2015任务一的数据



新闻分类语料,9个类别财经,教育,军事,科技,政治等 按新闻标签爬取的。


