Spark[03]spark-shell、IDEA测试SparkSQL

Spark[03]spark-shell、IDEA测试sparksql


准备环境

资源列表

软件/工具版本
VMwareVMware® Workstation 16 Pro
Xshell6
filezilla3.7.3
Navicat11
IDEA2020.2

多台虚拟机部分数据如下

编号主机名主机域名ip地址
ToozkyToozky192.168.64.220
Toozky2Toozky2192.168.64.221
Toozky3Toozky3192.168.64.222

准备多台虚拟机并配置hadoop2.0环境,安装hadoop2.0和zookeeper
详见链接:Hadoop[03]启动DFS和Zookeeper(Hadoop2.0)

设置时间同步
详见链接:Linux(CentOS6)设置同步网络时间

配置spark集群
详见链接:Spark[01]Spark集群安装以及配置

Xshell连接多台虚拟机,启动zkServer、hdfs、spark集群

安装mysql

虚拟机①、②、③

虚拟机安装mysql(5),如虚拟机已安装mysql此步跳过

yum install mysql-server
service mysqld start
chkconfig --level 345 mysqld on

为mysql赋予远程连接权限(本机可连Linux的mysql)

查询本机ip地址和域名

win+s→cmdipconfig
由于虚拟机网络连接方式为NAT,则将VMnet8的IPv4复制

以太网适配器 VMware Network Adapter VMnet8:
   IPv4 地址 . . . . . . . . . . . . : 192.168.64.1

192.168.64.1就是虚拟机可通信的本机ip地址

win→设置→系统→关于
在这里插入图片描述
这里的设备名称就是本机的域名,将名称复制

192.168.64.1 Tuzki需要写在每台虚拟机的/etc/hosts文件中

虚拟机①

为本机创建可连接到虚拟机①mysql的账号

mysql
use mysql;
CREATE USER 'root'@'Tuzki' IDENTIFIED BY '123456';
grant all privileges  on *.* to root@'Tuzki';
flush privileges;
Navicat连接测试

在这里插入图片描述
Navicat创建连接后,创建test库,test_user表,并录入测试数据
在这里插入图片描述

编写JSON格式测试数据

虚拟机①
cd
vi testjsondb

按a或insert进入编辑

{"id":1, "name":"leo1", "age":18}
{"id":2, "name":"zhangsan", "age":19}
{"id":3, "name":"lisi", "age":17}

esc :wq

SparkSQL(spark-shell)

JSON数据测试

Xshell登录虚拟机①

spark-shell
var a=spark.read.json("file:///root/testjsondb")

查全表

a.show()

查name字段

a.select("name").show()

获取各条数据内的属性(list格式输出)

a.collectAsList

MySQL数据测试

Xshell登录虚拟机①

spark-shell

加载数据表

val jdbc_conf: Map[String, String] = Map(
      "url" -> "jdbc:mysql://Toozky:3306/test",
      "driver" -> "com.mysql.jdbc.Driver",
      "dbtable" -> "test_user",
      "user" -> "root",
      "password" -> "123456"
    )
val t_user=spark.read.format("jdbc").options(jdbc_conf).load()

查全表

t_user.show()

查name字段

t_user.select("uname").show()

spark.sql查询

为数据表创建临时视图

t_user.createOrReplaceTempView("test_user")

查全表

spark.sql("select * from test_user").show()

查平均年龄avg(uage)

spark.sql("select avg(uage) from test_user").show()

SparkSQL(IDEA)

IDEA创建普通Maven项目SparkSQLDemo

pom.xml

在pom.xml添加相关依赖

	<properties>
        <spark.version>2.1.1</spark.version>
        <hadoop.version>2.6.5</hadoop.version>
        <scala.version>2.11.8</scala.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>${hadoop.version}</version>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.47</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.11</version>
            <scope>compile</scope>
        </dependency>
    </dependencies>

log4j.properties

在resources目录创建log4j.properties

log4j.rootCategory=ERROR, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Set the default spark-shell log level to ERROR. When running the spark-shell, the
# log level for this class is used to overwrite the root logger's log level, so that
# the user can have different defaults for the shell and regular Spark apps.
log4j.logger.org.apache.spark.repl.Main=ERROR

# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark_project.jetty=ERROR
log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=ERROR
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=ERROR
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR

# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR

TestWordCount

在java目录创建TestWordCount

spark://Toozky:7077地址是spark alive主机的web首页中的地址
在这里插入图片描述

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.SQLContext;
import org.junit.Before;
import org.junit.Test;

import java.sql.DriverManager;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.Properties;

public class TestSparkSql {
    SQLContext sqlContext;
    Dataset test_user_set;
    Statement statement;

    @Before
    public void before() throws SQLException {
        SparkConf conf = new SparkConf().setAppName("test_sparksql_mysql001").setMaster("spark://Toozky:7077");
        JavaSparkContext sc = new JavaSparkContext(conf);
        sqlContext = new SQLContext(sc);

        String url = "jdbc:mysql://Toozky:3306/test";

        String table = "test_user";

        String user="root";
        String password="123456";
        String driver="com.mysql.jdbc.Driver";
        Properties conn = new Properties();
        conn.put("user", user);
        conn.put("password", password);
        conn.put("driver", driver);

        sqlContext.read().jdbc(url, table, conn).createOrReplaceTempView(table);
    }
    
	@Test
    public void testSelectAll() throws SQLException {
        System.out.println("====Test->SELECT * FROM test_user====");
        test_user_set = sqlContext.sql("select * from test_user");
        test_user_set.show();
    }

    @Test
    public void testSelect() throws SQLException {
        System.out.println("====Test->SELECT uname FROM test_user====");
        sqlContext.sql("select uname from test_user").show();

        System.out.println();
        System.out.println("====Test->SELECT uname=\"张三\"->info====");
        sqlContext.sql("select * from test_user where uname=\"张三\"").show();
    }
}

控制台

testSelectAll运行结果

在这里插入图片描述

testSelect运行结果

在这里插入图片描述

spark alive web

Completed Applications出现计算记录
其中Name是代码:setAppName(“test_sparksql_mysql001”) 中的test_sparksql_mysql001


以上就是本期总结的全部内容,愿大家相互学习,共同进步!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值