idea中spark处理数据存储到mysql

1)首先idea中新建maven工程SparkWC
在pom.xml中导入配置

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.itstaredu.spark</groupId>
    <artifactId>SparkWC</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <scala.version>2.11.8</scala.version>
        <spark.version>2.2.0</spark.version>
        <hadoop.version>2.8.4</hadoop.version>
        <mysql.version>5.1.38</mysql.version>
        <encoding>UTF-8</encoding>
    </properties>

    <dependencies>
        <!-- scala的依赖导入 -->
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.version}</version>
        </dependency>

        <!-- spark的依赖导入 -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>${spark.version}</version>
        </dependency>

        <!-- hadoop-client API的导入 -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>${hadoop.version}</version>
        </dependency>

        <!-- mysql的连接驱动依赖 -->
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>${mysql.version}</version>
        </dependency>
    </dependencies>

    <build>
        <pluginManagement>
            <plugins>
                <!-- scala的编译插件 -->
                <plugin>
                    <groupId>net.alchim31.maven</groupId>
                    <artifactId>scala-maven-plugin</artifactId>
                    <version>3.2.2</version>
                </plugin>
                <!-- ava的编译插件 -->
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-compiler-plugin</artifactId>
                    <version>3.5.1</version>
                </plugin>
            </plugins>
        </pluginManagement>
        <plugins>
            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <executions>
                    <execution>
                        <id>scala-compile-first</id>
                        <phase>process-resources</phase>
                        <goals>
                            <goal>add-source</goal>
                            <goal>compile</goal>
                        </goals>
                    </execution>
                    <execution>
                        <id>scala-test-compile</id>
                        <phase>process-test-resources</phase>
                        <goals>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <executions>
                    <execution>
                        <phase>compile</phase>
                        <goals>
                            <goal>compile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>


            <!-- 打jar包插件 -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>2.4.3</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <filters>
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</project>

这里的hadoop依赖可加可不加,没有用到

数据文件itstar.log格式

20180724101954	http://java.itstar.com/java/course/javaeeadvanced.shtml
20180724101954	http://java.itstar.com/java/course/javaee.shtml
20180724101954	http://java.itstar.com/java/course/android.shtml
20180724101954	http://java.itstar.com/java/video.shtml
20180724101954	http://java.itstar.com/java/teacher.shtml
20180724101954	http://java.itstar.com/java/course/android.shtml
20180724101954	http://bigdata.itstar.com/bigdata/teacher.shtml
20180724101954	http://net.itstar.com/net/teacher.shtml
20180724101954	http://java.itstar.com/java/course/hadoop.shtml
20180724101954	http://java.itstar.com/java/course/base.shtml

需求:将数据按照url中的学院也就是java,net,bigdata分组,学院 学院访问次数存储到mysql中。

SparkWC工程新建一个单例对象UrlGroupCount

import java.net.URL
import java.sql.{Connection, DriverManager}
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}

object UrlGroupCount {
  def main(args: Array[String]): Unit = {
    //1.创建spark的程序入口
    val conf:SparkConf = new SparkConf().setAppName("UrlGroupCont1").setMaster("local[2]")
    val sc:SparkContext = new SparkContext(conf)

    //2.将数据切分
    val rdd1:RDD[String] = sc.textFile("e:/bigdata/itstar.log")

    //3.将数据切分
    val rdd2:RDD[(String,Int)] = rdd1.map(line => {
      val s: Array[String] = line.split("\t")
      //元组输出
      (s(1), 1)
    })

  //4.累加求和
  val rdd3:RDD[(String,Int)] = rdd2.reduceByKey(_+_)

  //5.分组
    val rdd4:RDD[(String,Int)] = rdd3.map(x => {
      val url = x._1
      val host = new URL(url).getHost.split("[.]")(0)
      //元组输出
      (host, x._2)
    })

    //6.根据学院分组
    val rdd5: RDD[(String, List[(String, Int)])] = rdd4.groupBy(_._1).mapValues(it => {
      //根据访问量排序,倒序
      it.toList.sortBy(_._2).reverse.take(1)
    })

    //7.把计算结果保存到mysql中
    rdd5.foreach(x => {
      //把数据写道mysql
      val conn: Connection = DriverManager.getConnection("jdbc:mysql://localhost:3306/urlcount?charatorEncoding=utf-8","root","root")
      //把spark结果插入到mysql中
      val sql = "INSERT INTO url_data (xueyuan,number_one) VALUES (?,?)"
      //执行sql 
      /**prepareStatement 预编译 无论多少次地使用同一个SQL命令,PreparedStatement
      * 都只对它解析和编译一次。当使用Statement对象时,
      * 每次执行一个SQL命令时,都会对它进行解析和编译。
      * PreparedStatement对象将会大大降低运行时间,当然也加快了访问数据库的速度。
      **/
      val statement = conn.prepareStatement(sql)

      statement.setString(1,x._1)
      statement.setString(2,x._2.toString())
      statement.executeUpdate()
      statement.close()
      conn.close()
    })
    sc.stop()
  }
}

打开本地数据库
先启动mysql服务
C:\Program Files\MySQL\MySQL Server 5.5\bin\mysqld.exe

然后使用SQLyog软件连接mysql数据库
在这里插入图片描述
现在mysql中创建存储的数据库urlcount

CREATE DATABASE urlcount;

应用该数据库

USE urlcount;

创建表

CREATE TABLE url_data(
uid INT PRIMARY KEY AUTO_INCREMENT,
xueyuan VARCHAR(50),
number_one VARCHAR(200)
)

idea中右键执行UrlGroupCount对象,待执行成功后在mysql中查询urlcount表

SELECT * FROM url_data;

在这里插入图片描述

  • 0
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值