Hive Developing

Building hive from source

clone source

git clone https://git-wip-us.apache.org/repos/asf/hive.git
or 
git clone git@github.com:wankunde/hive.git

# ignore chmod changes
git config core.filemode false

build

git branch -va
// checkout the branch which you are intrest.
git checkout -b branch-0.14 origin/branch-0.14
// or git checkout --track origin/branch-0.14

// compile and dist
mvn clean install -DskipTests -Phadoop-2 -Pdist

//  generate protobuf code
cd ql
mvn clean install -DskipTests -Phadoop-2,protobuf

// generate Thrift code
mvn clean install -Phadoop-2,thriftif -DskipTests -Dthrift.home=/usr/local

Tips

  • An alternatvie method to active profile in pom.xml is add follow configuration in $MAVEN_HOME/conf/setting.xml. You can check the configuration using mvn help:active-profiles command.
  <profiles>
    <profile>
      <id>hadoop-local-dev</id>
      <properties>
        <maven.test.classpath></maven.test.classpath>
        <test.dfs.mkdir>D:\hadoop_tmp\test_dfs_mkdir</test.dfs.mkdir>
        <test.output.overwrite>true</test.output.overwrite>
        <thrift>D:\hadoop_tmp\thrift</thrift>
        <thrift.home>${thrift}\home</thrift.home>
        <thrift.gen.dir>${thrift}\gen_dir</thrift.gen.dir>
        <thrift.args></thrift.args>
      </properties>
    </profile>
  </profiles>

  <activeProfiles>
    <activeProfile>hadoop-2</activeProfile>
    <activeProfile>hadoop-local-dev</activeProfile>
  </activeProfiles>
  • Configure system environment variable HADOOP_HOME=D:\installs\hadoop-2.5.0-cdh5.2.0.

  • By default,before compile,maven will download many dependency packages and meet timeout exception.I use nexus and add <timeout>120000</timeout> configuration to <server> configuration item. Not Test

  • Hive HiveDeveloperFAQ Wiki
  • If you want to compile and test hive in Eclipse,continue the follow steps.

    • mvn eclipse:eclipse
    • import project
    • Configure - Convert to maven project
    • Some maven plugins may not work well,(may need to set local proxy host and port in eclipse),connect to m2e market and install m2e connectors what you need.(include antlr,build-helper)

    You can access the m2e market place from the preferences: Preferences>Maven>Discovery>Open Catalog. Installing WTP integration solved most plugin issues for me.

Setup the development environment

Install single node hadoop

Install hive

  • hive-site.xml

  • <configuration>
    <property>
      <name>hive.metastore.warehouse.dir</name>
      <value>/user/hive/warehouse</value>
    </property>
    
    
    <property>
      <name>hive.metastore.uris</name>
      <value>thrift://x.x.x.x:9083</value>
    </property>
    
    <property>
      <name>javax.jdo.option.ConnectionURL</name>
      <value>jdbc:mysql://x.x.x.x:3306/hive?createDatabaseIfNotExist=true</value>
    </property>
    
    <property>
      <name>javax.jdo.option.ConnectionDriverName</name>
      <value>com.mysql.jdbc.Driver</value>
    </property>
    
    <property>
      <name>javax.jdo.option.ConnectionUserName</name>
      <value>username</value>
    </property>
    <property>
      <name>javax.jdo.option.ConnectionPassword</name>
      <value>password</value>
    </property>
    
    <property>
      <name>mapred.child.java.opts</name>
      <value>-Xmx2g</value>
    </property>
    </configuration>
    • Start Hive metastore
    #!/bin/sh
    nohup bin/hive --service metastore >> logs/metastore.log 2>&1 &
    echo $! > logs/hive-metastore.pid
    • Error solutions:
    FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:javax.jdo.JDODataStoreException: An exception was thrown while adding/validating class(es) : Specified key was too long; max key length is 767 bytes
    com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

    Problem: Hive throws above exception when old version of MySQL is used as hive metastore.

    Solution: Set Latin1 as the charset for metastore
    mysql> alter database hive character set latin1.

    Hive test and develop

    Change hive log level

    bin/hive -hiveconf hive.root.logger=DEBUG,console

    Or change log4j properties.

    cp conf/hive-log4j.properties.template conf/hive-log4j.properties

    Connecting a Java Debugger to hive

    Example java remote debug

    • Run remote java program using script
    JVM_OPTS="-server -XX:+UseParNewGC -XX:+HeapDumpOnOutOfMemoryError"
    DEBUG="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=2345"
    JVM_OPTS="$JVM_OPTS $DEBUG"
    
    export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:${LOG_DIR}/gc-`date +'%Y%m%d%H%M'` -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=512M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/bh/logs/hbase/hbase.heapdump  -XX:+PrintAdaptiveSizePolicy -XX:+PrintFlagsFinal -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+UnlockExperimentalVMOptions -XX:InitiatingHeapOccupancyPercent=65 -XX:G1HeapRegionSize=32m -XX:G1RSetRegionEntries=16384 -XX:NewSize=1g -XX:MaxNewSize=1g -XX:MaxTenuringThreshold=1 -XX:SurvivorRatio=15 -XX:+UseNUMA -Xmx16384M -Xms16384M -Xprof" 
    
    $JAVA_HOME/bin/java $JVM_OPTS -cp tools-1.0.jar com.wankun.tools.hdfs.Test2
    // For JDK 1.5.x or higher
    -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=2345
    // For JDK 1.4.x
    -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=2345
    // For JDK 1.3.x
    -Xnoagent -Djava.compiler=NONE -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=2345
    • Add and run new remote debug configuration in eclipse

    Start hive debug

    • Server : hive --help --debug
    • Idea : run | edit configuration |default:remote:+ |configure the remote host and port| run the selected the remote debug

    • Practice :hive --debug:childSuspend=y -hiveconf hive.root.logger=DEBUG,console

    • 在执行hadoop命令的时候会执行HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS" ,而这两个变量中都含有jdwp的调试参数,这样会无法启动子程序,可以暂时注释掉该语句。

    "HADOOP_CLIENT_OPTS=           -XX:+UseParallelGC -agentlib:jdwp=transport=dt_socket,server=y,suspend=n"
    "HADOOP_OPTS= -XX:PermSize=128m -XX:MaxPermSize=128m -Dhadoop.log.dir=/var/log/hadoop-hdfs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,DRFA,RFAAUDIT -Djava.library.path=/usr/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true   -XX:+UseParallelGC -agentlib:jdwp=transport=dt_socket,server=y,address=8000,suspend=y -Dhadoop.security.logger=INFO,NullAppender"

    Run hive without a hadoop cluster

    export HIVE_OPTS='--hiveconf mapred.job.tracker=local --hiveconf fs.default.name=file:///tmp \
        --hiveconf hive.metastore.warehouse.dir=file:///tmp/warehouse \
        --hiveconf javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=/tmp/metastore_db;create=true'

    Hive test unit

    Two kind of unit tests

    • Normal unit test
    mvn test -Dtest=ClassName#methodName -Phadoop-2

    For example,
    mvn test -Dtest=TestAbc -Phadoop-2 which TestAbc is the test case.
    mvn test -Dtest='org.apache.hadoop.hive.ql.*' -Phadoop-2 .

    Help Links : Maven Surefire Plugin

    • Query files

    There are many test scripts. Not successed

    $ ls ql/src/test/queries/
    clientcompare  clientnegative  clientpositive  negative  positive
    
    // run test unit,ql as example
    cd ql
    mvn test -Dtest=TestCliDriver -Dqfile=groupby1.q -Phadoop-2
    
    //Take src/test/queries/clientpositive/groupby1.q for example.
    
    mvn test -Dmodule=ql -Phadoop-2 -Dtest=TestCliDriver -Dqfile=groupby1.q -Dtest.output.overwrite=true
    

    SQL 启动MapReduce调试

    set mapreduce.map.java.opts=-agentlib:jdwp=transport=dt_socket,server=y,address=8002,suspend=y;
    set mapreduce.reduce.java.opts=-agentlib:jdwp=transport=dt_socket,server=y,address=8003,suspend=y;
    set mapreduce.task.timeout=6000000;

    Hive 本地任务执行命令

    export HADOOP_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,address=8001,suspend=y"
    hadoop jar /usr/lib/hive/lib/hive-exec-1.1.0-cdh5.5.1.jar org.apache.hadoop.hive.ql.exec.mr.ExecDriver -localtask -plan file:/home/hadoop/tmp2/plan.xml   -jobconffile file:/home/hadoop/tmp2/jobconf.xml

    这样即可以开启本地调试

    Yarn Job 提交流程

    JobClient –> Job.submit() –>.. –>YARNRunner .. –> YarnClientImpl.submitApplcation() –> ClientRMService –> RMAppManager

    • RM中事件转发器:AsyncDispatcher

    • 通过rmcontext启动AM

        this.rmContext.getDispatcher().getEventHandler()
            .handle(new RMAppEvent(applicationId, RMAppEventType.START));
    这里看到getDispatcher取rmDispatcher。
    • rmDispatcher初始化在ResourceManager类中:
      rmDispatcher.register(SchedulerEventType.class, schedulerDispatcher);
          // Register event handler for RmAppEvents
          rmDispatcher.register(RMAppEventType.class,
              new ApplicationEventDispatcher(rmContext));
    
          // Register event handler for RmAppAttemptEvents
          rmDispatcher.register(RMAppAttemptEventType.class,
              new ApplicationAttemptEventDispatcher(rmContext));
    
          // Register event handler for RmNodes
          rmDispatcher.register(
              RMNodeEventType.class, new NodeEventDispatcher(rmContext));
    • ApplicationEventDispatcher.handle –> RMAppImpl.handle 这里的start事件仅仅做了一个状态转换处理

    Help Links1

    Help Links2

    
    https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-Unittestsanddebugging
    
    https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-DebuggingHiveCode
    https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ
    https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-DebuggingHiveCode
    https://cwiki.apache.org/confluence/display/Hive/HowToContribute
    https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-HowdoIimportintoEclipse?
    
    https://github.com/apache/hive/blob/trunk/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值