- 博客(294)
- 资源 (1)
- 收藏
- 关注
原创 Ngnix log to Elasticsearch
nginx-es.conf input { file { path => "/opt/logtest/nginx_access.log.1" start_position => "beginning" sincedb_path => "/opt/logstash-2.3.4/sincedb/" } ...
2016-08-03 17:39:22
343
原创 Install Logstash And Sample Conf
1. Download #wget https://download.elastic.co/logstash/logstash/logstash-2.3.4.tar.gz #tar -xzf logstash-2.3.4.tar.gz #cd logstash-2.3.4 #./bin/logstash-plugin install logstash-output-webhdfs...
2016-08-01 11:05:52
324
原创 大数据挖掘高质量博客
https://pkghosh.wordpress.com/2012/09/03/from-item-correlation-to-rating-prediction/ https://pkghosh.wordpress.com/?s=recommendation sifarish https://github.com/pranab/sifarish
2016-07-29 14:20:42
446
原创 Storm: monitor storm with supervisor
#yum install supervisor #vi /etc/supervisord.conf [program:storm-supervisor] command=/opt/apache-storm-0.9.3/bin/storm supervisor user=root autostart=true autorestart=true startsecs=10 st...
2015-09-02 15:58:29
252
原创 Solr: 5.2.1 install and config
1. upload solr-5.2.1.tgz install_solr_service.sh to the same dir 2.# install_solr_service.sh solr-5.2.1.tgz 3. #cd /var/solr/ #vi solr.in.sh modify solr's jvm configure #SOLR_HEAP="10...
2015-09-01 18:50:15
180
原创 Solr: index product and price for sellers and perfoming query and sorting
In my current project, the modle seller has multiply products with price, I want to index products and query them then sorting them by price , seller's credit ,the distance between the seller and ...
2015-08-25 16:58:53
164
原创 Top ML software
http://www.predictiveanalyticstoday.com/top-free-software-for-text-analysis-text-mining-text-analytics/
2015-08-05 15:02:02
204
原创 Curator: delay queue
curator http://curator.apache.org/curator-client/index.html
2015-08-03 16:15:07
165
原创 matlab install on ubuntu
http://blog.csdn.net/lanbing510/article/details/41698285
2015-07-10 13:59:05
163
原创 Solr: Using FunctionQuery in SOLR Sort Syntax
In my project, I got a similar problem likes http://stackoverflow.com/questions/27701533/using-functionquery-in-solr-sort-syntax I want to sort my documents by a custom score using function ...
2015-07-07 17:36:48
219
原创 Ubuntu: common errors
when run #sudo update-manager error: solution: sudo apt-get update && sudo apt-get dist-upgrade --------- update firefox flash plugin #tar -xzf install_flash_player_11_linux.x86_6...
2015-07-07 09:53:15
163
原创 Solr: integrate carrot2 with solr-5.1.0
I already integrated carrot2 with solr-4.x with my customerized chinese tokenizer successfully. But I run some errors following my series of blogs http://ylzhj02.iteye.com/blog/2152348 to adopt ca...
2015-07-01 10:42:22
203
原创 Solr: Spatial Search
1. schema <fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType" geo="true" distErrPct="0.025" maxDistErr="0.001" distanceUnits="kilometers"/&
2015-06-26 14:59:54
364
原创 Solr: Synonym Query
1. config schema.xml <fieldtype name="text_ch" class="solr.TextField"> <analyzer type="index"> <tokenizer class="org.lionsoul.jcseg.analyzer.JcsegTokenizerFactory" mode=&qu
2015-06-18 17:59:03
220
原创 Solr: Install solr to production
1. download solr-5.2.1.tgz 2. install #tar xzf solr-5.2.1.tgz solr-5.2.1/bin/install_solr_service.sh --strip-components=2 #./install_solr_service.sh solr-5.2.1.tgz 3. check solr status #servi...
2015-06-17 16:31:04
146
原创 SOLR: tika with OCR engine
I want to parse the content not just the metadata of a jpg picture. The following code is the test class import java.io.File; import java.io.FileInputStream; import java.io.IOException; impo...
2015-06-12 15:03:35
543
原创 Solr: Install tesseract-ocr
Install dependency #tar -jxzf leptonica-1.69.tar.bz2 #cd leptonica-1.69 #./configure#make -j4#sudo make install -------------------------- download tesseract-ocr-3.02.02.tar.gz #tar -xzf t...
2015-06-11 16:35:45
177
原创 用 Apache Tika 理解信息内容
www.ibm.com/developerworks/cn/opensource/tutorials/os-apache-tika/ http://www.tutorialspoint.com/tika/tika_quick_guide.htm
2015-06-09 16:53:20
148
原创 Android: 信息推送
Preferences http://www.cnblogs.com/hanyonglu/archive/2012/03/04/2378971.html
2015-06-08 16:58:08
129
原创 Neo4j: Create multiple relationships between the same two nodes
In my case, I want to build a addreebook in neo4j, which a person has mutiply cellphones and maybe some cellphones have the same concacter with same phone number but different nicknames. such as us...
2015-06-03 14:40:54
257
原创 Jubatus: Setup in Distributed Mode
References http://jubat.us/en/tutorial_distributed.html http://jubat.us/en/admin.html
2015-05-28 14:17:00
131
原创 Jubatus: Classify Example
1.create a mvn project with pom.xml <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0...
2015-05-28 10:26:30
199
原创 Jubatus: Realtime online ML Introduction
http://jubat.us/en/overview.html
2015-05-26 14:24:47
172
原创 Neo4j: Remote Restful API (java)
#git clone https://github.com/neo4j-contrib/java-rest-binding.git #git tag -l #git checkout neo4j-rest-graphdb-2.0.1 #mvn clean install In mvn project's pom.xml add <dependency> <...
2015-05-25 14:29:01
519
原创 Solr: Using solrJ to operate solr
References http://www.solrtutorial.com/solrj-tutorial.html https://cwiki.apache.org/confluence/display/solr/Using+SolrJ
2015-05-22 13:29:12
130
原创 Flume: morphline sink with solr 5.1.0
1. down flume 1.5.2 source code and change solr version to 5.1.0 2. compile and install 3. cp solr 4.10.1 related jars to lib dir to sove this error CloudSolrServer' (current frame, stack[2])...
2015-05-21 16:38:37
213
原创 Strom: Trident Fields and tuples
https://storm.apache.org/documentation/Trident-tutorial.html The Trident data model is the TridentTuple which is a named list of values. During a topology, tuples are incrementally built up throu...
2015-04-28 10:14:54
142
原创 HighQulity PPT on line
http://www.slideshare.net/yuhuang/large-scale-machine-learning-for-big-data
2015-04-24 15:33:21
132
原创 Spark: Spark Streaming
Spark Streaming uses a “micro-batch” architecture, where the streaming computation is treated as a continuous series of batch computations on small batches of data. Spark Streaming receives data fro...
2015-04-22 16:02:40
161
原创 Spark: cluters architecture
In distributed mode, Spark uses a master/slave architecture with one central coordinator and many distributed workers. The central coordinator is called the driver.The driver communicates with a p...
2015-04-22 10:51:33
173
原创 Spark: deploy cluster in standlone mode
Host: 192.168.0.135 192.168.0.136 192.168.0.137 master: 137 workers:135 136 1.Install spark on all hosts in /opt dir 2.Install SSH Remote Access 137#ssh-keygen 137#ssh-copy-id -i ~/.s...
2015-04-20 12:32:56
147
原创 Spark: Cluster Mode Overview
https://spark.apache.org/docs/latest/cluster-overview.html This document gives a short overview of how Spark runs on clusters, to make it easier to understand the components involved. Read throug...
2015-04-20 10:15:03
160
原创 Flume: avro source and sink
In order to flow the data across multiple agents or hops, the sink of the previous agent and source of the current hop need to be avro type with the sink pointing to the hostname (or IP address) and ...
2015-04-17 11:12:42
144
原创 Flume: hbase sink
flume.conf a1.sinks.hbase-sink1.channel = ch1 a1.sinks.hbase-sink1.type = hbase a1.sinks.hbase-sink1.table = users a1.sinks.hbase-sink1.columnFamily= info a1.sinks.hbase-sink1.serializer=org.ap...
2015-04-16 17:04:38
258
原创 Kite:Morphlines Introduction
http://kitesdk.org/docs/1.0.0/morphlines/ http://blog.cloudera.com/blog/2013/07/morphlines-the-easy-way-to-build-and-integrate-etl-apps-for-apache-hadoop/
2015-04-13 11:09:08
241
原创 Neo4j: fulltext search
Model @Indexed(indexType = IndexType.FULLTEXT, indexName = "TaskTile") private String title; Repository @Query("START n=node:TaskTile({0}) return n") Iterable<Task> fin...
2015-04-08 15:03:53
402
hadoop in action
2014-11-24
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人
RSS订阅