六十三、Spark-读取数据并写入数据库

2401_85111528

于 2024-05-28 20:26:19 发布

阅读量339

点赞数 4

分类专栏：作者\/ 文章标签： spark 数据库大数据

本文链接：https://blog.csdn.net/2401_85111528/article/details/139277165

版权

作者\/ 专栏收录该内容

42 篇文章 0 订阅

订阅专栏

        <version>${scala.version}</version>

    </dependency>

    <!--SparkCore依赖-->

    <dependency>

        <groupId>org.apache.spark</groupId>

        <artifactId>spark-core_2.12</artifactId>

        <version>${spark.version}</version>

    </dependency>

    <!-- spark-streaming-->

    <dependency>

        <groupId>org.apache.spark</groupId>

        <artifactId>spark-streaming_2.12</artifactId>

        <version>${spark.version}</version>

    </dependency>

    <!--spark-streaming+Kafka依赖-->

    <dependency>

        <groupId>org.apache.spark</groupId>

        <artifactId>spark-streaming-kafka-0-10_2.12</artifactId>

        <version>${spark.version}</version>

    </dependency>

    <!--SparkSQL依赖-->

    <dependency>

        <groupId>org.apache.spark</groupId>

        <artifactId>spark-sql_2.12</artifactId>

        <version>${spark.version}</version>

    </dependency>

    <!--SparkSQL+ Hive依赖-->

    <dependency>

        <groupId>org.apache.spark</groupId>

        <artifactId>spark-hive_2.12</artifactId>

        <version>${spark.version}</version>

    </dependency>

    <dependency>

        <groupId>org.apache.spark</groupId>

        <artifactId>spark-hive-thriftserver_2.12</artifactId>

        <version>${spark.version}</version>

    </dependency>

    <!--StructuredStreaming+Kafka依赖-->

    <dependency>

        <groupId>org.apache.spark</groupId>

        <artifactId>spark-sql-kafka-0-10_2.12</artifactId>

        <version>${spark.version}</version>

    </dependency>

    <!-- SparkMlLib机器学习模块,里面有ALS推荐算法-->

    <dependency>

        <groupId>org.apache.spark</groupId>

        <artifactId>spark-mllib_2.12</artifactId>

        <version>${spark.version}</version>

    </dependency>

    <dependency>

        <groupId>org.apache.hadoop</groupId>

        <artifactId>hadoop-client</artifactId>

        <version>2.7.5</version>

    </dependency>

    <dependency>

        <groupId>com.hankcs</groupId>

        <artifactId>hanlp</artifactId>

        <version>portable-1.7.7</version>

    </dependency>

    <dependency>

        <groupId>mysql</groupId>

        <artifactId>mysql-connector-java</artifactId>

        <version>8.0.23</version>

    </dependency>

    <dependency>

        <groupId>redis.clients</groupId>

        <artifactId>jedis</artifactId>

        <version>2.9.0</version>

    </dependency>

    <dependency>

        <groupId>com.alibaba</groupId>

        <artifactId>fastjson</artifactId>

        <version>1.2.47</version>

    </dependency>

    <dependency>

        <groupId>org.projectlombok</groupId>

        <artifactId>lombok</artifactId>

        <version>1.18.2</version>

        <scope>provided</scope>

    </dependency>

</dependencies>

<build>

    <sourceDirectory>src/main/scala</sourceDirectory>

    <plugins>

        <!-- 指定编译java的插件 -->

        <plugin>

            <groupId>org.apache.maven.plugins</groupId>

            <artifactId>maven-compiler-plugin</artifactId>

            <version>3.5.1</version>

        </plugin>

        <!-- 指定编译scala的插件 -->

        <plugin>

            <groupId>net.alchim31.maven</groupId>

            <artifactId>scala-maven-plugin</artifactId>

            <version>3.2.2</version>

            <executions>

                <execution>

                    <goals>

                        <goal>compile</goal>

                        <goal>testCompile</goal>

                    </goals>

                    <configuration>

                        <args>

                            <arg>-dependencyfile</arg>

                            <arg>${project.build.directory}/.scala_dependencies</arg>

                        </args>

                    </configuration>

                </execution>

            </executions>

        </plugin>

        <plugin>

            <groupId>org.apache.maven.plugins</groupId>

            <artifactId>maven-surefire-plugin</artifactId>

            <version>2.18.1</version>

            <configuration>

                <useFile>false</useFile>

                <disableXmlReport>true</disableXmlReport>

                <includes>

                    <include>**/*Test.*</include>

                    <include>**/*Suite.*</include>

                </includes>

            </configuration>

        </plugin>

        <plugin>

            <groupId>org.apache.maven.plugins</groupId>

            <artifactId>maven-shade-plugin</artifactId>

            <version>2.3</version>

            <executions>

                <execution>

                    <phase>package</phase>

                    <goals>

                        <goal>shade</goal>

                    </goals>

                    <configuration>

                        <filters>

                            <filter>

                                <artifact>*:*</artifact>

                                <excludes>

                                    <exclude>META-INF/*.SF</exclude>

                                    <exclude>META-INF/*.DSA</exclude>

                                    <exclude>META-INF/*.RSA</exclude>

                                </excludes>

                            </filter>

                        </filters>

                        <transformers>

                            <transformer

                                    implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">

                                <mainClass></mainClass>

                            </transformer>

                        </transformers>

                    </configuration>

                </execution>

            </executions>

        </plugin>

    </plugins>

</build>

注：pom依赖在业务实施中是极其重要的一环，相当于配置文件，例如可能需要的 jar 包，可能需要的 Scala 语言版本都在此处进行配置等等

创建数据库


CREATE TABLE `data` (

  `id` int(11) NOT NULL AUTO_INCREMENT,

  `name` varchar(255) DEFAULT NULL,

  `age` int(11) DEFAULT NULL,

  PRIMARY KEY (`id`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8;

业务逻辑

1、创建本地环境，并设置日志提示级别

val conf: SparkConf = new SparkConf().setAppName(“spark”).setMaster(“local[*]”)

val sc: SparkContext = new SparkContext(conf)

sc.setLogLevel(“WARN”)

2、加载数据，创建RDD

val dataRDD: RDD[(String, Int)] = sc.makeRDD(List((“tuomasi”, 21), (“孙悟空”, 19), (“猪八戒”, 20)))

3、分区迭代

dataRDD.foreachPartition(iter => {

})

4、加载驱动

val conn: Connection = DriverManager.getConnection(“jdbc:mysql://localhost:3306/bigdata?characterEncoding=UTF-8”, “root”, “123456”)

5、封装SQL语句

val sql: String = “INSERT INTO data (id, name, age) VALUES (NULL, ?, ?);”

val ps: PreparedStatement = conn.prepareStatement(sql)

6、数据处理

iter.foreach(t => { //t就表示每一条数据

val name: String = t._1

val age: Int = t._2

ps.setString(1, name)

ps.setInt(2, age)

ps.addBatch()

})

ps.executeBatch()

7、关闭连接

if (conn != null) conn.close()

if (ps != null) ps.close()

8、读取数据库

val getConnection = () => DriverManager.getConnection(“jdbc:mysql://localhost:3306/bigdata?characterEncoding=UTF-8”, “root”, “123456”)

9、SQL语句上下界设定以及分区数设置

val studentTupleRDD: JdbcRDD[(Int, String, Int)] = new JdbcRDD[(Int, String, Int)](

sc,

  getConnection,

  sql,

  1,      //id为1~20之间的记录进行提取

20,

1,

  mapRow

2401_85111528

关注

4
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
六十三、Spark-读取数据并写入数据库

注：pom依赖在业务实施中是极其重要的一环，相当于配置文件，例如可能需要的 jar 包，可能需要的 Scala 语言版本都在此处进行配置等等。
复制链接

扫一扫

专栏目录