BigData
SeasonRun
君子不器
展开
-
Zookeeper集群配置详解
背景:3台linux redhat7.2, zookeeper-3.4.14,jdk1.8.0_2211.将zookeeper-3.4.14上传到服务器,配置环境变量;export ZOOKEEPER_HOME=/usr/local/src/zookeeper-3.4.14export PATH=$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:$PATH2....原创 2019-11-25 14:31:40 · 1911 阅读 · 0 评论 -
WARN LoadSnappy: Snappy native library not loaded
[root@master mapreduce_wordcount_python]# ./run.shDeleted hdfs://192.168.183.10:9000/outputpackageJobJar: [./map.py, ./red.py, /usr/local/src/hadoop-1.2.1/tmp/hadoop-unjar3556020200698085095/...原创 2019-08-12 16:25:15 · 248 阅读 · 0 评论 -
Hive Install
1.Download hive package and uncompress it.2.Create hive-site.xml under conf path, config params like that:<configuration> <property> <name>javax.jdo.opti...原创 2019-08-21 23:11:22 · 314 阅读 · 0 评论 -
Mysql install
1. Install mysqlyum install mysql -yyum install mysql-server -y2.Start mysql/etc/init.d/mysqld start3.Config user and passwdmysqladmin -u root password ****4.Login mysqlmysql -u root -...原创 2019-08-21 23:19:30 · 279 阅读 · 0 评论 -
Hive commands notes
hive> insert overwrite directory '/result' select * from userbehavior;hive> insert overwrite local directory '/hive_test/result' select * from userbehavior;hive> create table userbehavior...原创 2019-08-22 20:45:40 · 108 阅读 · 0 评论 -
kafka安装配置详解
1.启动zookeeper,jps---> QuorumPeerMain即启动成功, 详见https://blog.csdn.net/haiziccc/article/details/994688312.download kafka from http://mirrors.hust.edu.cn/apache/kafka/;3.将压缩文件上传至server目录,解压;4.进入ka...原创 2019-08-27 12:48:09 · 148 阅读 · 1 评论 -
kafka集群配置
配置kafka集群只需要在单节点基础上,将kafka安装包分发到各个节点,并修改config/server.properties文件中的broker.id,从0开始,如有三台机器,则依次修改为0,1,2;分别在三台机器上启动kafka,即可进行测试:./bin/kafka-server-start.sh config/server.properties ##启动kafka./bin/...原创 2019-08-27 12:52:51 · 158 阅读 · 0 评论 -
flume-kafka配置详解
目标:通过flume抓取日志消息传递给kafkaflume和kafka相关配置见前序文章,本文主要讲解如何实现flume将消息传递给kafka。1、配置flume conf文件,在flume/conf目录下创建conf文件,flume_kafka.conf# Name the components on this agenta1.sources = r1a1.sinks = k1...原创 2019-08-27 14:22:04 · 3263 阅读 · 0 评论 -
Hadoop 2.6.1安装配置详解
1.配置Java,我这里是用的1.7版本,配置好环境变量;2.下载Hadoop(https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/)并上传至服务器上;3.进入到/usr/local/src/hadoop-2.6.1/etc/hadoop目录下,开始配置:(1)hadoop-env.sh -> JAVA_HOME...原创 2019-08-31 09:16:50 · 377 阅读 · 0 评论 -
hadoop2 datanode/nodemanager启动失败,Streaming Job Failed问题解析
原来使用的hadoop-1.2.1,一切正常,后来切换到hadoop-2.6.1使用时遇到各种问题,以下进行详细分析:1.datanode无法启动这个要从执行Hadoop namenode -format说起,在第一次启动Hadoop之前需要进行初始化,执行该语句后会在master的/usr/local/src/hadoop-2.6.1/dfs/name目录(该目录配置在hdfs-site...原创 2019-09-01 15:21:57 · 822 阅读 · 0 评论 -
分类模型评价及AUC计算详解
1、分类模型评测指标在使用模型进行分类时,一般使用准确度(Accuracy)、精确率(Precision)、召回率(Recall)作为评测指标对模型的好坏进行评价,今天主要就auc进行介绍。混淆表 分类器预测类别 confusion table A(军事) B(科技) 实际类别 A(60) X11(50) X12(10) B(40) X...原创 2019-09-05 21:58:48 · 4067 阅读 · 0 评论 -
概率分类器-朴素贝叶斯(NaiveBeyesianClassification,NB)分类算法解析
所谓概率分类器,即计算待分类对象属于每个类别的概率,选择最大概率输出。朴素贝叶斯(NaiveBeyesianClassification,NB)公式:其中代表先验概率;接下来用一个实例来解释如何计算该分类概率。假设有1000篇文章,其中军事300篇,科技200篇,医疗500篇;则=p(军事)=300/1000=0.3=p(科技)=200/1000=0.2=p(医...原创 2019-09-05 23:20:55 · 2361 阅读 · 2 评论 -
相似性计算-基于余弦定理(一)
初学数据分析,利用余弦定理进行文章相似性计算,思路如下:1、对文章分词,利用结巴分词实现单词切割;2、去除所有标点符号;3、创建黑名单,既高频无用的词汇,例如“是”“的”“嗯”等词汇,从已分词组中去除;4、进行wordcount,统计各单词对应词频;5、将两篇文章分词结果进行合集,得到总的单词序列;6、按照总的单词序列分别对两篇文章分词组进行查找,得到两个词频向量;7、...原创 2019-09-19 14:49:27 · 339 阅读 · 0 评论 -
相似性计算-基于余弦定理(二)
如前序文章所述,本章讲述通过分词、去标点符号、去除黑名单、统计词频、获取词汇集、构建特征向量,最后利用余弦定理计算相似性。相对而言,该相似性计算逻辑、算法简单,能够快速得到两篇文章的相似性,但用途较局限,准确率待提高,后面会逐步比较其它相似性算法。(可参考http://bigdata-madesimple.com/implementing-the-five-most-popular-simil...原创 2019-09-20 10:41:19 · 323 阅读 · 1 评论 -
神经网络之数据拟合
近期学习神经网络,初步实现利用神经网络对训练集进行拟合。import tensorflow as tfimport numpy as npimport osos.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'import matplotlib.pyplot as plt#定义在(-0.5,0.5)之间随机生成50个数,并转换为50*1矩阵x_data ...原创 2019-09-24 14:30:07 · 7819 阅读 · 0 评论 -
神经网络之数据拟合(二)
本文在前一章节的基础上做了如下改进:1、步长与训练次数由定值改为取值范围,即在步长(0.01,1)之间训练次数在(1,100000)之间选取最优结果;2、增加计时器,关注耗时;import tensorflow as tfimport numpy as npimport osos.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'import m...原创 2019-09-25 08:27:04 · 1883 阅读 · 0 评论 -
python矩阵乘积运算(multiply/maumul/*/@)解析
在训练数据时经常涉及到矩阵运算,有段时间没有练习过了,手便生疏了,今天重新测了一把,python中各类矩阵运算举例如下,可以清楚的看到tf.matmul(A,C)=np.dot(A,C)= A@C都属于叉乘,而tf.multiply(A,C)= A*C=A∙C属于点乘。Python测试编码如下:import tensorflow as tfimport numpy as npa...原创 2019-09-25 13:57:07 · 40625 阅读 · 3 评论 -
pyplot输出图片中文显示乱码
pyplot输出图片中文显示乱码,如下图。添加如下配置即可,字体选择一个本地电脑安装有的。from pylab import mplmpl.rcParams['font.sans-serif']=['Microsoft YaHei']mpl.rcParams['axes.unicode_minus'] = False...原创 2019-10-08 08:26:26 · 721 阅读 · 0 评论 -
flume load_balance
This article describes how to config flume load balance, Master is the data generator, flume agent get logserver.log then transmit to slave1 and slave2, which is the collector, and save log to hdfs....原创 2019-08-25 08:36:44 · 178 阅读 · 0 评论 -
flume failover
Architecture like that:Master conf:# Name the components on this agenta1.sources = r1a1.sinks = k1 k2a1.channels = c1#describe/configure the sourcea1.sources.r1.type = execa1.sources...原创 2019-08-25 00:19:34 · 155 阅读 · 0 评论 -
flume to hdfs
With flume, you can collect log and storage to hdfs.First, you should create a hdfs path, like "hadoop fs -mkdir /flume_data_pool";Second, create a conf file in flume conf path,# Name the compon...原创 2019-08-24 23:55:32 · 188 阅读 · 0 评论 -
java.net.NoRouteToHostException: No route
when running hadoop program, errors like that.packageJobJar: [./map.py, ./red.py, /usr/local/src/hadoop-1.2.1/tmp/hadoop-unjar1576276863998395028/] [] /tmp/streamjob8951329431624757277.19/07/29 07:...原创 2019-07-29 16:50:20 · 821 阅读 · 0 评论 -
Python IP&Num convert
原创 2019-07-31 22:09:01 · 170 阅读 · 0 评论 -
Hadoop cluster deploy
1.prepare 3 VMs(hostname/IP, 1 master, 2 slaves);2.Install jdk and config env for java(both on 3 nodes);vim ~/.bashrcexport JAVA_HOME=/usr/local/src/jdk1.6.0_45export CLASSPATH=.:$CLASSPATH:$JA...原创 2019-07-28 09:47:17 · 186 阅读 · 0 评论 -
Countword
#!/usr/bin/pythonimport syslist = []current_word = Nonecount_pool = []sum = 0file1 = 'd:/share/1business.seg.txt'file = open(file1, 'r', encoding='UTF-8')for line in file: ss = line....原创 2019-08-06 08:53:34 · 210 阅读 · 0 评论 -
BinarySearch
#!/usr/bin/python#define initial index 0, and max index length-1#find ip_num in ip_lib_listip_num = 15ip_lib_list = [[2, 2, 3], [5, 3, 4], [7, 4, 5], [8, 5, 6], [12, 7, 8],[15, 9, 0], [19, 0, ...原创 2019-08-01 20:13:57 · 603 阅读 · 0 评论 -
TF-IDF(3)
#!/usr/bin/pythonimport osimport sys import gzip#suppose testresult1 contains two article which you want compare, testresult2 is the result of idffile_input_fd = 'd:/share/testresult1'idf_dict...原创 2019-08-06 09:42:20 · 99 阅读 · 0 评论 -
TF-IDF(1)
#!/usr/bin/pythonimport sys#split wordfile1 ='d:/share/articlelib'file = open(file1, 'r', encoding='UTF-8')for line in file: ss = line.strip().split('\t', 1) if len(ss) != 2: co...原创 2019-08-06 10:36:53 · 103 阅读 · 0 评论 -
TF-IDF(2)
#!/usr/bin/pythonimport sysimport mathcurrent_word = Nonecount_pool = []sum = 0#Total article docs_cnt = 524#the result of split wordsfile1 ='d:/share/testresult2'file = open(file1, 'r', ...原创 2019-08-06 10:38:43 · 98 阅读 · 0 评论 -
Similarity Computing based on TF-IDF and Cosine theorem
This is my first program for similarity computing, and will continuous optimization.Steps:1.Enough articles and split to words, at the same time you can get a word_dict(I think it's better with ...原创 2019-08-06 15:53:20 · 174 阅读 · 0 评论 -
中文分词词性名称注解
词性编码 词性名称 注 解 Ag 形语素 形容词性语素。形容词代码为 a,语素代码g前面置以A。 a 形容词 取英语形容词 adjective的第1个字母。 ad 副形词 ...转载 2019-08-06 20:32:37 · 3742 阅读 · 1 评论 -
zookeeper Install
I have download the zookeeper from https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.5.5/,then I created myid, modify the zoo.cfg, and add the env variable. When I start the service, i...原创 2019-08-13 20:13:26 · 154 阅读 · 0 评论 -
Similarity Computing based on LCS
# -*- coding: utf-8 -*-#!/usr/bin/pythonimport sysdef cal_score(first_str, second_str): len_vv = [[0]*50]*50 len1 = len(first_str) len2 = len(second_str) for i in range(1, len1...原创 2019-08-07 14:47:29 · 100 阅读 · 0 评论 -
HBase Install
1.Download hbase package, like hbase-0.98.0-hadoop.2.Upload package to server.3.Edit hbase-env.sh,# The java implementation to use. Java 1.6 required.# export JAVA_HOME=/usr/java/jdk1.6.0/ex...原创 2019-08-19 12:54:49 · 153 阅读 · 0 评论 -
Hbase command - shell
hbase shell:进入shell模式create 'user_table', 'user','age','sex':创建表describe 'user_table':查看表alter 'user_table', {NAME=>'age', METHOD=>'delete'}:删除表alter 'user_table', {NAME=>'flags', VER...原创 2019-08-19 14:20:41 · 170 阅读 · 0 评论 -
supervisors on storm ui display zero
supervisors on storm ui display zero, means slave node cannot monitor by zookeeper, so check the zookeeper status, then find it in standalone mode, not follower.Check and find something missed in ...原创 2019-08-14 13:47:51 · 96 阅读 · 0 评论 -
Word segment based on Jieba
First, u should install jieba, pip install jieba or download in http://pypi.python.org/pypi/jieba/, unpack and cd the storage directory install. or put it on site-packages directory.Second, u can u...原创 2019-08-07 20:00:22 · 174 阅读 · 0 评论 -
Flume deploy
Flume provide powerful functions to collect and move large logs from different source to storage like hdfs or hbase.Data collect flow like that:Every agent contains source, channel and sink....原创 2019-08-24 23:36:40 · 88 阅读 · 0 评论 -
org.apache.hadoop.hdfs.server.namenode.SafeModeException
In safe mode, you cannot do anything.you may use the bellow cmds:#get safemode statushadoop dfsadmin -safemode get#change state to Safe modehadoop dfsadmin -safemode enter# leave safe mode...原创 2019-07-29 13:09:16 · 393 阅读 · 1 评论