使Apache Spark和Mysql作数据分析



使用用spart-shell读取MySQL表中的数据

步骤1: 执行spark-shell命令,进入spark-shell命令行,执行命令如下:

bigdata@ubuntu1:~/run/spark/bin$ ./spark-shell --master spark://ubuntu1:7077 --jars /home/bigdata/run/spark/mysql-connector-java-5.1.30-bin.jar

执行结果如下:

bigdata@ubuntu1:~/run/spark/bin$ ./spark-shell --master spark://ubuntu1:7077 --jars /home/bigdata/run/spark/mysql-connector-java-5.1.30-bin.jar
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/05/08 01:40:28 WARN spark.SparkConf: The configuration key 'spark.history.updateInterval' has been deprecated as of Spark 1.3 and may be removed in the future. Please use the new key 'spark.history.fs.update.interval' instead.
17/05/08 01:40:46 WARN spark.SparkConf: The configuration key 'spark.history.updateInterval' has been deprecated as of Spark 1.3 and may be removed in the future. Please use the new key 'spark.history.fs.update.interval' instead.
17/05/08 01:40:46 WARN spark.SparkConf: The configuration key 'spark.history.updateInterval' has been deprecated as of Spark 1.3 and may be removed in the future. Please use the new key 'spark.history.fs.update.interval' instead.
17/05/08 01:40:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/05/08 01:40:57 WARN spark.SparkConf: The configuration key 'spark.history.updateInterval' has been deprecated as of Spark 1.3 and may be removed in the future. Please use the new key 'spark.history.fs.update.interval' instead.
17/05/08 01:41:01 WARN DataNucleus.General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/bigdata/run/spark/jars/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/bigdata/run/spark-2.1.0-bin-hadoop2.7/jars/datanucleus-api-jdo-3.2.6.jar."
17/05/08 01:41:01 WARN DataNucleus.General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/bigdata/run/spark-2.1.0-bin-hadoop2.7/jars/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/bigdata/run/spark/jars/datanucleus-rdbms-3.2.9.jar."
17/05/08 01:41:01 WARN DataNucleus.General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/bigdata/run/spark-2.1.0-bin-hadoop2.7/jars/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/bigdata/run/spark/jars/datanucleus-core-3.2.10.jar."
17/05/08 01:41:10 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://10.3.19.171:4040
Spark context available as 'sc' (master = spark://ubuntu1:7077, app id = app-20170508014050-0004).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.0
      /_/
         
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_25)
Type in expressions to have them evaluated.
Type :help for more information.

scala> 


步骤2: 创建变量sqlContext

scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc) 
warning: there was one deprecation warning; re-run with -deprecation for details
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@6164b3a2

步骤3:从Mysql中加载数据

scala> val dataframe_mysql = sqlContext.read.format("jdbc").option("url", "jdbc:mysql://127.0.0.1:3306/mydatabase").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "mytable").option("user", "myname").option("password", "mypassword").load()
dataframe_mysql: org.apache.spark.sql.DataFrame = [id: string, grouptype: int ... 16 more fields]

步骤4:显示dataframe中的数据

scala> dataframe_mysql.show
+---+---------+-------+---------+------+------+---+--------------------+---+-----------+-----+----+--------------------+--------------------+-----+------+----------+---+
| id|grouptype|groupid|loginname|  name|   pwd|sex|            birthday|tel|mobilephone|email|isOk|       lastLoginTime|             addtime|intro|credit|experience|img|
+---+---------+-------+---------+------+------+---+--------------------+---+-----------+-----+----+--------------------+--------------------+-----+------+----------+---+
|  1|        1|      1|    admin| admin| admin|  1|2016-05-05 14:51:...|  1|          1|    1|   1|2016-05-10 14:52:...|2016-05-08 14:52:...|    1|     1|         1|  1|
|  2|        2|      2|   wanghb|wanghb|wanghb|  2|2016-05-10 14:56:...|  2|          2|    2|   2|2016-05-11 14:57:...|2016-05-10 14:57:...|    2|     2|        22|  2|
+---+---------+-------+---------+------+------+---+--------------------+---+-----------+-----+----+--------------------+--------------------+-----+------+----------+---+


步骤5:为了后续查询,将dataframe中的数据注册为一个临时表

scala> dataframe_mysql.registerTempTable("tmp_tablename")
warning: there was one deprecation warning; re-run with -deprecation for details

步骤6:现在可以从临时表"tmp_tablename"中查询数据

scala> dataframe_mysql.sqlContext.sql("select * from tmp_tablename").collect.foreach(println)
[1,1,1,admin,admin,admin,1,2016-05-05 14:51:58.0,1,1,1,1,2016-05-10 14:52:07.0,2016-05-08 14:52:12.0,1,1,1,1]
[2,2,2,wanghb,wanghb,wanghb,2,2016-05-10 14:56:58.0,2,2,2,2,2016-05-11 14:57:05.0,2016-05-10 14:57:08.0,2,2,22,2]


通过Spark将数据写入MySQL






  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值