大数据生态环境系统,越来越依赖CDH生态。大部分公司都是用CDH来部署大数据生态架构,这种结构是运维的一大福音,但是对于开发确实一个噩梦一样,下载CDH版本的Spark,Hadoop依赖包实在试太慢了了,甚至有可能下载不了。
直接下载国外原厂镜像,很难下载的下来。阿里云maven私服不包含CDH版本spark,hadoop依赖包,在遍历了众多国内镜像后,发现华为云包含CDH版spark,hadoop镜像。但是华为云镜像只包含CDH6.xx相关的镜像,如果基础环境还是CDH5.xx智能使用华为镜像和原厂镜像交替才实现。
首先我们先看一看原厂镜像部署方式:
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
<spark.version>2.4.0.cloudera2</spark.version>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.main.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.main.version}</artifactId>
<version>${spark.version}</version>
</dependency>
这种方式简单直接,前提是要有足够的耐心,支持依赖包最多。
配置说明见:https://docs.cloudera.com/documentation/spark2/2-4-x/topics/spark2_maven_repo.html
https://docs.cloudera.com/documentation/spark2/2-4-x/topics/cds_24_maven_artifacts.html
阿里云对CDH spark 支持为零:https://maven.aliyun.com/mvn/view
华为镜像,需要修改maven_path/config/setting.xml
<mirror>
<id>alimaven</id>
<name>aliyun maven</name>
<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
<mirrorOf>central</mirrorOf>
</mirror>
<mirror>
<id>huaweicloud</id>
<mirrorOf>*</mirrorOf>
<url>https://mirrors.huaweicloud.com/repository/maven/</url>
</mirror>
详见:https://mirrors.huaweicloud.com/repository/maven/Org/apache/spark/spark-core_2.11/
混合部署方式如下:
1.编辑maven_path/config/setting.xml
<mirror>
<id>alimaven</id>
<name>aliyun maven</name>
<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
<mirrorOf>central</mirrorOf>
</mirror>
<mirror>
<id>huaweicloud</id>
<mirrorOf>*</mirrorOf>
<url>https://mirrors.huaweicloud.com/repository/maven/</url>
</mirror>
2.编辑pom.xml
<properties>
<scala.main.version>2.11</scala.main.version>
<scala.version>${scala.main.version}.12</scala.version>
<spark.version>2.4.0.cloudera2</spark.version>
<mysql.version>5.1.34</mysql.version>
<neo4j.version>1.7.5</neo4j.version>
<scopo_spark>provided</scopo_spark>
</properties>
<repositories>
<repository>
<id>cloudera_tmp</id>
<name>colud</name>
<url>https://repo.rdc.aliyun.com/repository/82963-release-FnoWLy/</url>
</repository>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
<repository>
<id>scala-tools.org</id>
<name>Scala-Tools Maven2 Repository</name>
<url>http://scala-tools.org/repo-releases</url>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>scala-tools.org</id>
<name>Scala-Tools Maven2 Repository</name>
<url>http://scala-tools.org/repo-releases</url>
</pluginRepository>
</pluginRepositories>
<dependencies>
<!-- <dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.4.0.cloudera2</version>
<scope>provided</scope>
</dependency>-->
<dependency>
<groupId>org.scalaj</groupId>
<artifactId>scalaj-http_${scala.main.version}</artifactId>
<version>2.4.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.main.version}</artifactId>
<version>${spark.version}</version>
<scope>${scopo_spark}</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.main.version}</artifactId>
<version>${spark.version}</version>
<scope>${scopo_spark}</scope>
</dependency>
<!-- <dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_${scala.main.version}</artifactId>
<version>${spark.version}</version>
</dependency>-->
<!-- https://mvnrepository.com/artifact/org.scalatest/scalatest -->
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_${scala.main.version}</artifactId>
<version>3.2.0-SNAP1</version>
<scope>test</scope>
</dependency>
<!--scala Test 生成测试报告-->
<dependency>
<groupId>org.pegdown</groupId>
<artifactId>pegdown</artifactId>
<version>1.4.2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/mysql/mysql-connector-java -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>${mysql.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.neo4j.driver/neo4j-java-driver -->
<dependency>
<groupId>org.neo4j.driver</groupId>
<artifactId>neo4j-java-driver</artifactId>
<version>${neo4j.version}</version>
</dependency>
</dependencies>
这样就可以下载了,CDH5.x相关的jar从cludera官网下载,其余jar包从华为云下载