jo_joo-CSDN博客

原创 <7> hive group by,order by

group by ...having1.select 后面非聚合列必须出现在group by 中2.除了普通的列就是一些聚合操作3.group by 后面可以跟表达式,比如substr(col)4.group by 使用了reduce操作,受限于reduce数量,设置reduce参数mapred.reduce.task改变reduce数量(hive-0.13)set mapreduce.j...

2018-08-16 21:37:35 1146

原创 <6> hive 表属性操作

表属性操作修改表名create table if not exists testchange(name string,value string);alter table testchange rename to a2;增加列(默认添加到最后)alter table tablename add columns(c1 string,c2 long);alter table a2 add ...

2018-08-16 21:35:41 375

原创 <5>hive 数据导出和动态分区

hive 数据导出1.hadoop命令的方式gethadoop fs -get path localPathtext(可以对多种不同格式进行操作,相当于输出流改成了text)hadoop fs -text path > e2.txt2.通过insert...directory方式insert overwrite local directory 'path' row format...

2018-08-16 21:33:56 417

原创 <4> hive 执行命令方式，数据加载

hive执行命令的方式hive -e "hql" 命令行执行hsqlhive -e "select * from mydb2.c1"hive -S 静默模式,控制台不输出Logging信息hive -S -e "select * from mydb2.c1"hive -v 详细模式,会把执行的sql打印出来hive -v -e "select * from mydb2.c1"hive...

2018-08-16 21:32:18 682

原创 <3>hive 查询，排序，函数

6.查询简单查询配置:set hive.fetch.task.conversion=more;/hive --hiveconf hive.fetch.task.conversion=more;/修改hive-site.xml文件;函数nvl(x,0)不为空的则为0当有值为空时的判断 is null/is not nullHQL严格区分字符大小写：select * from view_stu...

2018-08-16 21:28:58 251

原创 <2>hive 外部表，分区表，动态分区，桶表，视图

1.分区表create table sample_data(id int,name string,gender string,x int,y int,z int)row format delimited fields terminated by','load data inpath '/student/sampleData.txt' into table sample_data;cr...

2018-08-16 21:11:10 745

原创 <1>hive的一些基本操作

1.创建库create database if not exists mydb;添加注释create database if not exists mydb2comment 'this is test database';查看注释describe database mydb2;删除库drop database if exists mydb2;(库中没有表的情况)drop data...

2018-08-16 21:07:06 171

原创 Elasticsearch2.3.4+elasticsearch-jdbc-2.3.4.1同步oracle数据及简单调优

Elasticsearch2.3.4+elasticsearch-jdbc-2.3.4.1同步oracle数据及简单调优

2017-11-03 15:30:05 1414

原创 kafka环境搭建

已经搭建好的zookeeper集群下载kafka_2.11-0.10.0.1.tgztar -zxvf kafka_2.11-0.10.0.1.tgzmv kafka_2.11-0.10.0.1 kafka进入目录cd kafka/config修改配置文件vim server.propertiesbroker.id=1listeners=PLAI

2017-06-30 10:57:59 240

原创 elasticsearch2.4.3 javaAPI的一些基本操作

elasticsearch2.4.3一些基本操作import org.elasticsearch.client.Client;import org.elasticsearch.client.transport.TransportClient;import org.elasticsearch.common.settings.Settings;import org.elasticsearch.comm

2017-06-30 10:18:54 839

原创 java poi读取excel文件

需要导入poi-3.15.jarimport org.apache.poi.hssf.usermodel.HSSFCell;import org.apache.poi.hssf.usermodel.HSSFRow;import org.apache.poi.hssf.usermodel.HSSFSheet;import org.apache.poi.hssf.usermodel.HSSF

2017-06-30 10:00:53 356

原创 java控制scp传输文件

需要导入ganymed-ssh2-build210.jarimport ch.ethz.ssh2.Connection;import ch.ethz.ssh2.SCPClient;import java.io.IOException;/** * Created by Administrator on 2017/6/29. */public class testScp {

2017-06-30 09:56:16 1315

原创 sqoop环境配置AND常用基本操作01

sqoop安装官网下载sqoop-1.4.6.bin__hadoop-0.23.tar.gztar -xzvf sqoop-1.4.6.bin__hadoop-0.23.tar.gz设置环境变量 export HADOOP_COMMON_HOME=/home/spark/opt/hadoop-2.7(用来指明hadoop安装在哪个目录下)export HADOOP_MA

2017-04-27 10:27:00 480

原创 SPARK+ANSJ 中文分词基本操作

ANSJ 5.0.2这是一个基于n-Gram+CRF+HMM的中文分词的java实现.分词速度达到每秒钟大约200万字左右（mac air下测试），准确率能达到96%以上目前实现了.中文分词. 中文姓名识别 . 用户自定义词典,关键字提取，自动摘要，关键字标记等功能可以应用到自然语言处理等方面,适用于对分词效果要求高的各种项目.下载地址：http://maven.nlpc

2016-12-16 10:20:22 5205 1

jo_joo的博客