1. 大数据集群各组件版本
组件 | 版本 |
Hadoop | 2.6.0 |
Spark | 2.1.0 |
Hive | 1.2.1 |
Scala | 2.11.8 |
Java | 1.8.0_144 |
CDH | 5.13.0 |
2. Yarn的队列与资源
队列 | webui(访问需先申请windows 远程桌面) | 资源 |
zz | http://172.18.x.xx:8088 | zz_aa、zz_bb等一系列以zz为前缀的账号均共享zz池,2560vcore,10240G内存 |
3. Spark提交任务命令示例[ deploy-mode 可选:client/cluster ]
spark-submit \
--conf "spark.executorEnv.JAVA_HOME=/usr/jdk1.8.0_144" \
--conf "spark.yarn.appMasterEnv.JAVA_HOME=/usr/jdk1.8.0_144" \
--conf spark.kryoserializer.buffer.max=512m \
--conf spark.kryoserializer.buffer=256m \
--class com.xxx.spark.WordCount \
--master yarn \
--deploy-mode client \
--driver-memory 4g \
--executor-memory 1g \
--executor-cores 1 \
--num-executors 40 \
--queue zz \
spark-mllib-1.0-jar-with-dependencies.jar
或
nohup spark-submit \
--name KafkaSink2MongoForHeartBeat \
--class im.youni.contact.streaming.KafkaSink2MongoForHeartBeat \
--master yarn \
--deploy-mode client \
--driver-memory 6g \
--executor-memory 3g \
--executor-cores 3 \
--num-executors 4 \
--queue root.zz \
# --keytab /home/zx_realtime/zx_realtime.keytab \
# --principal zx_realtime@CENTER.DATAPLAT \
baseprofile-real-1.0-jar-with-dependencies.jar \
false KafkaSink2MongoForHeartBeat 172.18.2.102:18080,172.18.2.133:18080,172.18.16.175:18080 heartbeat userprofile_heartbeat_g1 \
>/dev/null 2>&1 &
4. Maven依赖
<properties>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<scala.version>2.11.8</scala.version>
<hadoop.version>2.6.0</hadoop.version>
<spark.version>2.1.0</spark.version>
<hive.version>1.2.1</hive.version>
</properties>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
<!--<scope>provided</scope>-->
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>${hive.version}</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.2</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
<configuration>
<args>
<arg>-dependencyfile</arg>
<arg>${project.build.directory}/.scala_dependencies</arg>
</args>
</configuration>
</execution>
</executions>
</plugin>
<!-- 编译插件 -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.6.0</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
<!-- scala编译插件 -->
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.1.6</version>
<configuration>
<scalaCompatVersion>2.11</scalaCompatVersion>
<scalaVersion>2.11.12</scalaVersion>
<encoding>UTF-8</encoding>
</configuration>
<executions>
<execution>
<id>compile-scala</id>
<phase>compile</phase>
<goals>
<goal>add-source</goal>
<goal>compile</goal>
</goals>
</execution>
<execution>
<id>test-compile-scala</id>
<phase>test-compile</phase>
<goals>
<goal>add-source</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
<!-- 打jar包插件(会包含所有依赖) -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.6</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<!-- 可以设置jar包的入口类(可选) -->
<!--<mainClass></mainClass>-->
</manifest>
</archive>
<includeSite>
</includeSite>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>