目标:用spark将mysql数据库library中的lend_list表中的bookId,booktype,readerId三张表的共计513行数据统计出各自的top10。如下图:
第一步:idea创建项目
pom.xml :如下
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.3.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.3.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-graphx_2.11</artifactId>
<version>2.3.0</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.37</version>
</dependency>
</dependencies>
简单讲一下思路(包含但不限于):
1.读出mysql数据并将数据写入本地txt文件(F:\\Desktop\\data_booktype.txt)
2.从这个文件中读取数据并计数,累加,top排名
3.将top排名数据写入mysql