spark1.3.0__for_hadoop2.4.1编译、安装与初步测试

--- 这个是一个朋友整理的。比较详细有保存价值。

-- 参考: http://spark.apache.org/docs/latest/building-spark.html

          http://spark.apache.org/docs/latest/sql-programming-guide.html#overview


-- 主要包括:
-- 1. Spark基于hadoop 2.4.1编译;
-- 2. Spark集群安装;
-- 3. Spark访问HDFS里的文件;
-- 4. Spark访问Hive表;
-- 5. Spark访问MySQL数据库表;


-- 留待读者解决:
-- Spark访问Oracle、MS SQL Server


---------------------------------------------------------------------------------------------------
-- ############################################################################################# --
-- 一、编译Spark 1.3.0


-- 前期准备:
-- 从官网下载spark-1.3.0.tgz包到/opt/software下,并解压:
cd /opt/software/
tar -xvf spark-1.3.0.tgz


-- 第一步. 修改 spark-1.3.0目录下的pom.xml文件,匹配自己的软件版本,我的修改如下:
cd /opt/software/spark-1.3.0


vi pom.xml  -- 修改如下相关软件的版本


<java.version>1.7</java.version>
<hadoop.version>2.4.1</hadoop.version>
<protobuf.version>2.5.0</protobuf.version>
<hbase.version>0.98.9-hadoop2</hbase.version>
<zookeeper.version>3.4.6</zookeeper.version>
<derby.version>10.11.1.1</derby.version>




-- 注意:如果要支持Scala 2.11,请运行以下脚本:-- 该版本太新了,目前还有些组件不支持,所以不建议修改
sh dev/change-version-to-2.11.sh


---------------------------------------------------------------------------------------------------
-- 第二步. 尝试用maven编译Spark (maven安装略)


export MAVEN_OPTS="-Xmx4g -XX:MaxPermSize=1024M -XX:ReservedCodeCacheSize=1024m"
nohup mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.1 -Phive -Phive-thriftserver -DskipTests clean package -Dtar &


-- mvn编译成功后,打印类似如下:


[WARNING] sourceDirectory is not specified or does not exist value=/opt/software/spark-1.3.0/external/kafka-assembly/src/main/scala
Saving to outputFile=/opt/software/spark-1.3.0/external/kafka-assembly/scalastyle-output.xml
Processed 0 file(s)
Found 0 errors
Found 0 warnings
Found 0 infos
Finished in 0 ms
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Spark Project Parent POM ........................... SUCCESS [ 19.631 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 35.222 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 22.597 s]
[INFO] Spark Project Core ................................. SUCCESS [11:54 min]
[INFO] Spark Project Bagel ................................ SUCCESS [01:05 min]
[INFO] Spark Project GraphX ............................... SUCCESS [03:21 min]
[INFO] Spark Project Streaming ............................ SUCCESS [05:03 min]
[INFO] Spark Project Catalyst ............................. SUCCESS [05:57 min]
[INFO] Spark Project SQL .................................. SUCCESS [07:25 min]
[INFO] Spark Project ML Library ........................... SUCCESS [07:53 min]
[INFO] Spark Project Tools ................................ SUCCESS [ 44.746 s]
[INFO] Spark Project Hive ................................. SUCCESS [05:12 min]
[INFO] Spark Project REPL ................................. SUCCESS [02:38 min]
[INFO] Spark Project YARN ................................. SUCCESS [03:01 min]
[INFO] Spark Project Hive Thrift Server ................... SUCCESS [02:50 min]
[INFO] Spark Project Assembly ............................. SUCCESS [03:53 min]
[INFO] Spark Project External Twitter ..................... SUCCESS [01:11 min]
[INFO] Spark Project External Flume Sink .................. SUCCESS [02:43 min]
[INFO] Spark Project External Flume ....................... SUCCESS [01:45 min]
[INFO] Spark Project External MQTT ........................ SUCCESS [03:24 min]
[INFO] Spark Project External ZeroMQ ...................... SUCCESS [01:09 min]
[INFO] Spark Project External Kafka ....................... SUCCESS [02:01 min]
[INFO] Spark Project Examples ............................. SUCCESS [08:49 min]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [ 15.687 s]
[INFO] Spark Project External Kafka Assembly .............. SUCCESS [ 55.975 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:24 h
[INFO] Finished at: 2015-03-20T20:56:16+08:00
[INFO] Final Memory: 105M/1751M
[INFO] ------------------------------------------------------------------------




------------------------------


-- mvn编译错误1. 


[INFO] Reactor Summary:
[INFO] 
[INFO] Spark Project Parent POM ........................... SUCCESS [ 21.537 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 31.171 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 16.630 s]
[INFO] Spark Project Core ................................. SUCCESS [11:43 min]
[INFO] Spark Project Bagel ................................ SUCCESS [01:13 min]
[INFO] Spark Project GraphX ............................... SUCCESS [03:45 min]
[INFO] Spark Project Streaming ............................ SUCCESS [06:08 min]
[INFO] Spark Project Catalyst ............................. SUCCESS [05:24 min]
[INFO] Spark Project SQL .................................. SUCCESS [07:18 min]
[INFO] Spark Project ML Library ........................... FAILURE [35:18 min]
[INFO] Spark Project Tools ................................ SKIPPED
[INFO] Spark Project Hive ................................. SKIPPED
[INFO] Spark Project REPL ................................. SKIPPED
[INFO] Spark Project YARN ................................. SKIPPED
[INFO] Spark Project Hive Thrift Server ................... SKIPPED
[INFO] Spark Project Assembly ............................. SKIPPED
[INFO] Spark Project External Twitter ..................... SKIPPED
[INFO] Spark Project External Flume Sink .................. SKIPPED
[INFO] Spark Project External Flume ....................... SKIPPED
[INFO] Spark Project External MQTT ........................ SKIPPED
[INFO] Spark Project External ZeroMQ ...................... SKIPPED
[INFO] Spark Project External Kafka ....................... SKIPPED
[INFO] Spark Project Examples ............................. SKIPPED
[INFO] Spark Project YARN Shuffle Service ................. SKIPPED
[INFO] Spark Project External Kafka Assembly .............. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:12 h
[INFO] Finished at: 2015-03-20T19:05:36+08:00
[INFO] Final Memory: 83M/1376M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project spark-mllib_2.10: Could not resolve dependencies for project org.apache.spark:spark-mllib_2.10:jar:1.3.0: Could not transfer artifact org.spire-math:spire_2.10:jar:0.7.4 from/to central (https://repo1.maven.org/maven2): GET request of: org/spire-math/spire_2.10/0.7.4/spire_2.10-0.7.4.jar from central failed: Read timed out -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :spark-mllib_2.10


-- mvn编译错误1解决:
-- 直接根据如下URL下载spire_2.10-0.7.4.jar 包到/root/.m2/repository/org/spire-math/spire_2.10/0.7.4/ 
-- 因为我是用root用户编译的,所以对应的是/root,如果你用其他用户的话,可能目录不一样哦
http://search.maven.org/#browse%7C1724544790




---------------------------------------------------------------------------------------------------
-- 第三步. 第二步操作测试mvn编译成功后,可以用如下命令生成安装包:(当然第二步也可以不要,直接执行“第三步”生成安装包,但安全起见,还是先执行第二步测试一下)
-- 注意:在执行之前,先看一下java和javac的版本是否一致。


cd /opt/software/spark-1.3.0
export MAVEN_OPTS="-Xmx4g -XX:MaxPermSize=1024M -XX:ReservedCodeCacheSize=1024m"
nohup ./make-distribution.sh --tgz --skip-java-test -Dyarn.version=2.2.0 -Dhadoop.version=2.2.0 -Pyarn -Phive -Phive-thriftserver


-- 第三步操作成功后,将在 /opt/software/spark-1.3.0目录下生成 spark-1.3.0-bin-2.4.1.tgz 安装包


---------------------------------------------------------------------------------------------------
-- ############################################################################################# --
-- 二、安装Spark 1.3.0


-- 安装前,我的hadoop集群是安装在hadoop用户下:
----------------------------------------------------------------
| IP              | 主机名             |         角   色       |
----------------------------------------------------------------
| 192.168.117.193 | funshion-hadoop193 | NameNode,SparkMaster  |
----------------------------------------------------------------
| 192.168.117.194 | funshion-hadoop194 | DataNode,SparkSlave   |
----------------------------------------------------------------
| 192.168.117.195 | funshion-hadoop195 | DataNode,SparkSlave   |
----------------------------------------------------------------
| 192.168.117.196 | funshion-hadoop196 | DataNode,SparkSlave  |
----------------------------------------------------------------




---------------------------------------------------------------------------------------------------
-- 第一步 将 spark-1.3.0-bin-2.4.1.tgz 解压到/usr/local/,并创建相关软链接。
-- (下面操作分别在funshion-hadoop193、funshion-hadoop194、funshion-hadoop195、funshion-hadoop196四个节点以root用户执行)


tar -xvf /opt/software/spark-1.3.0/spark-1.3.0-bin-2.4.1.tgz
mv /opt/software/spark-1.3.0/spark-1.3.0-bin-2.4.1 /usr/local/
cd /usr/local
chown -R hadoop.hadoop ./spark-1.3.0-bin-2.4.1
rm -rf spark
ln -s spark-1.3.0-bin-2.4.1 spark


---------------------------------------------------------------------------------------------------
-- 第二步 配置Spark
cd /usr/local/spark-1.3.0/conf


-- 2.1 编辑 slaves 文件:
[hadoop@funshion-hadoop193 conf]$ vi slaves


# A Spark Worker will be started on each of the machines listed below.
funshion-hadoop194
funshion-hadoop195
funshion-hadoop196


-- 2.2 编辑  spark-env.sh.template 文件:
-- 拷贝spark-env.sh.template为spark-env.sh,并编辑 spark-env.sh (注意:最后两行是用来支持LZO压缩的)


[hadoop@funshion-hadoop193 conf]$ vi spark-env.sh
export JAVA_HOME=/usr/java/latest
export SCALA_HOME=/usr/local/scala
export SPARK_MASTER_IP=funshion-hadoop193
export SPARK_WORKER_MEMORY=2g
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/usr/local/spark/lib:/usr/local/hadoop/lzo/lib
export SPARK_CLASSPATH=$SPARK_CLASSPATH:/usr/local/hadoop/lib/hadoop-lzo-0.4.20-SNAPSHOT.jar


-- 2.3 编辑  spark-defaults.conf 文件:
cd /usr/local/spark/conf/
cp  spark-defaults.conf.template  spark-defaults.conf


[hadoop@funshion-hadoop193 conf]$ vi spark-defaults.conf    -- 添加如下记录行:


spark.master  spark://funshion-hadoop193:7077
spark.yarn.jar  hdfs://funshion-hadoop193:8020/home/lib/spark.yarn.jar
spark.eventLog.enabled true
spark.eventLog.dir hdfs://funshion-hadoop193:8020/spark_log


-- 注意上面的两个HDFS目录,你需要创建一下:
hdfs dfs -mkdir -p /home/lib/spark.yarn.jar
hdfs dfs -mkdir /spark_log


-- 注意:上面2.3操作完成后,记得将其conf目录同步到其他节点:




-- 2.4 添加环境变量(我hadoop用户下的 ~/.bash_profile 文件全部内容如下)
-- (添加环境变量分别在funshion-hadoop193、funshion-hadoop194、funshion-hadoop195、funshion-hadoop196四个节点以hadoop用户执行)


---------------------------


[hadoop@funshion-hadoop193 spark]$ vi ~/.bash_profile


# .bash_profile


# Get the aliases and functions
if [ -f ~/.bashrc ]; then
    . ~/.bashrc
fi


# User specific environment and startup programs


PATH=$PATH:$HOME/bin


# export PATH


export JAVA_HOME=/usr/java/latest


export PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:/usr/local/bin


export HADOOP_INSTALL=/usr/local/hadoop
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_DEV_HOME=/usr/local/hadoop
export HADOOP_PREFIX=/usr/local/hadoop
export SCALA_HOME=/usr/local/scala
export PATH=$PATH:$SCALA_HOME/bin
export SPARK_HOME=/usr/local/spark
export PATH=$PATH:$SPARK_HOME/bin


export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native


export HIVE_HOME=/usr/local/hive
# export HBASE_HOME=/usr/local/hbase
# export ZK_HOME=/usr/local/zookeeper


export PATH=$PATH:$HADOOP_DEV_HOME/bin
export PATH=$PATH:$HADOOP_DEV_HOME/sbin
export PATH=$PATH:$HIVE_HOME/bin
# export PATH=$PATH:$HBASE_HOME/bin
# export PATH=$PATH:$ZK_HOME/bin


export HADOOP_MAPARED_HOME=${HADOOP_DEV_HOME}
export HADOOP_COMMON_HOME=${HADOOP_DEV_HOME}
export HADOOP_HDFS_HOME=${HADOOP_DEV_HOME}
export YARN_HOME=${HADOOP_DEV_HOME}
export HADOOP_YARN_HOME=${HADOOP_DEV_HOME}
export HADOOP_CLIENT_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
export HADOOP_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
export HDFS_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
export YARN_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop


export CLASSPATH=".:$JAVA_HOME/lib:$CLASSPATH"
export PATH="$JAVA_HOME/:$HADOOP_PREFIX/bin:$PATH"


# Native Path
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib/native"


# SET HADOOP_CLASSPATH
for file in `ls $HADOOP_HOME/share/hadoop/common/lib/*jar`
do
        HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$file
done


# SET HIVE_CLASSPATH
for file in `ls $HIVE_HOME/lib/*jar`
do
        HIVE_CLASSPATH=$HIVE_CLASSPATH:$file
done


export HADOOP_CLASSPATH=$HADOOP_CLASSPATH
export CLASSPATH=$CLASSPATH:$HADOOP_CLASSPATH:$HIVE_CLASSPATH


# SET JAVA_LIBRARY_PATH
for file in `ls $JAVA_HOME/lib/*jar`
do
        JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:$file
done
export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:$HADOOP_PREFIX/lib/native


export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/hadoop/lib/native:/usr/lib64
export PYTHONPATH=$PYTHONPATH:/usr/local/hadoop/etc/hadoop
export PATH=$PATH:$PYTHONPATH


export EXINIT='set ts=4 sw=4'


---------------------------


---------------------------------------------------------------------------------------------------
-- 第三步 启动Spark集群:
[hadoop@funshion-hadoop193 sbin]$ cd /usr/local/spark/sbin
[hadoop@funshion-hadoop193 sbin]$ pwd
/usr/local/spark/sbin


[hadoop@funshion-hadoop193 sbin]$ ./start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark-1.3.0-bin-2.4.1/sbin/../logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-funshion-hadoop193.out
funshion-hadoop194: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark-1.3.0-bin-2.4.1/sbin/../logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-funshion-hadoop194.out
funshion-hadoop196: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark-1.3.0-bin-2.4.1/sbin/../logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-funshion-hadoop196.out
funshion-hadoop195: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark-1.3.0-bin-2.4.1/sbin/../logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-funshion-hadoop195.out


-- 要关闭Spark集群的话,执行如下命令:
[hadoop@funshion-hadoop193 sbin]$ ./stop-all.sh
funshion-hadoop194: stopping org.apache.spark.deploy.worker.Worker
funshion-hadoop195: stopping org.apache.spark.deploy.worker.Worker
funshion-hadoop196: stopping org.apache.spark.deploy.worker.Worker
stopping org.apache.spark.deploy.master.Master


---------------------------------------------------------------------------------------------------
-- ############################################################################################# --
-- 三、测试Spark 1.3.0


-- 3.1 测试SparkPi


cd /usr/local/spark


./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master spark://funshion-hadoop193:7077 \
--num-executors 3 \
--driver-memory 2g \
--executor-memory 1g \
--executor-cores 1 \
--queue root.hadoop \
lib/spark-examples*.jar \
10


-- 上面命令输出如下(我们看到有一行打印“Pi is roughly 3.141544”代表执行是有返回结果的,是OK的):
Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/03/21 16:05:23 INFO SparkContext: Running Spark version 1.3.0
15/03/21 16:05:23 WARN SparkConf: 
SPARK_CLASSPATH was detected (set to ':/usr/local/hadoop/lib/hadoop-lzo-0.4.20-SNAPSHOT.jar').
This is deprecated in Spark 1.0+.


Please instead use:
 - ./spark-submit with --driver-class-path to augment the driver classpath
 - spark.executor.extraClassPath to augment the executor classpath
        
15/03/21 16:05:23 WARN SparkConf: Setting 'spark.executor.extraClassPath' to ':/usr/local/hadoop/lib/hadoop-lzo-0.4.20-SNAPSHOT.jar' as a work-around.
15/03/21 16:05:23 WARN SparkConf: Setting 'spark.driver.extraClassPath' to ':/usr/local/hadoop/lib/hadoop-lzo-0.4.20-SNAPSHOT.jar' as a work-around.
15/03/21 16:05:25 INFO SecurityManager: Changing view acls to: hadoop
15/03/21 16:05:25 INFO SecurityManager: Changing modify acls to: hadoop
15/03/21 16:05:25 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/03/21 16:05:26 INFO Slf4jLogger: Slf4jLogger started
15/03/21 16:05:26 INFO Remoting: Starting remoting
15/03/21 16:05:27 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@funshion-hadoop193:54001]
15/03/21 16:05:27 INFO Utils: Successfully started service 'sparkDriver' on port 54001.
15/03/21 16:05:27 INFO SparkEnv: Registering MapOutputTracker
15/03/21 16:05:27 INFO SparkEnv: Registering BlockManagerMaster
15/03/21 16:05:27 INFO DiskBlockManager: Created local directory at /tmp/spark-3637a018-6da9-446b-9fe6-b4cd75d346c4/blockmgr-4bb0b6f0-a816-46af-b5e3-b8c3ffaa3c04
15/03/21 16:05:27 INFO MemoryStore: MemoryStore started with capacity 1060.3 MB
15/03/21 16:05:28 INFO HttpFileServer: HTTP File server directory is /tmp/spark-5a5dd05a-e1e1-4c68-8517-b7e63ebcaab3/httpd-6eed070f-8636-40c6-8461-4b61a31fb3a0
15/03/21 16:05:28 INFO HttpServer: Starting HTTP Server
15/03/21 16:05:28 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/21 16:05:28 INFO AbstractConnector: Started SocketConnector@0.0.0.0:50159
15/03/21 16:05:28 INFO Utils: Successfully started service 'HTTP file server' on port 50159.
15/03/21 16:05:28 INFO SparkEnv: Registering OutputCommitCoordinator
15/03/21 16:05:29 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/21 16:05:29 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
15/03/21 16:05:29 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/03/21 16:05:29 INFO SparkUI: Started SparkUI at http://funshion-hadoop193:4040
15/03/21 16:05:30 INFO SparkContext: Added JAR file:/usr/local/spark-1.3.0-bin-2.4.1/lib/spark-examples-1.3.0-hadoop2.4.1.jar at http://192.168.117.193:50159/jars/spark-examples-1.3.0-hadoop2.4.1.jar with timestamp 1426925130170
15/03/21 16:05:30 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@funshion-hadoop193:7077/user/Master...
15/03/21 16:05:31 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20150321160531-0000
15/03/21 16:05:31 INFO AppClient$ClientActor: Executor added: app-20150321160531-0000/0 on worker-20150321160018-funshion-hadoop195-46031 (funshion-hadoop195:46031) with 2 cores
15/03/21 16:05:31 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150321160531-0000/0 on hostPort funshion-hadoop195:46031 with 2 cores, 1024.0 MB RAM
15/03/21 16:05:31 INFO AppClient$ClientActor: Executor added: app-20150321160531-0000/1 on worker-20150321160019-funshion-hadoop196-53113 (funshion-hadoop196:53113) with 2 cores
15/03/21 16:05:31 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150321160531-0000/1 on hostPort funshion-hadoop196:53113 with 2 cores, 1024.0 MB RAM
15/03/21 16:05:31 INFO AppClient$ClientActor: Executor added: app-20150321160531-0000/2 on worker-20150321160018-funshion-hadoop194-56515 (funshion-hadoop194:56515) with 2 cores
15/03/21 16:05:31 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150321160531-0000/2 on hostPort funshion-hadoop194:56515 with 2 cores, 1024.0 MB RAM
15/03/21 16:05:32 INFO AppClient$ClientActor: Executor updated: app-20150321160531-0000/0 is now RUNNING
15/03/21 16:05:32 INFO AppClient$ClientActor: Executor updated: app-20150321160531-0000/0 is now LOADING
15/03/21 16:05:32 INFO AppClient$ClientActor: Executor updated: app-20150321160531-0000/1 is now RUNNING
15/03/21 16:05:32 INFO AppClient$ClientActor: Executor updated: app-20150321160531-0000/2 is now LOADING
15/03/21 16:05:32 INFO AppClient$ClientActor: Executor updated: app-20150321160531-0000/2 is now RUNNING
15/03/21 16:05:32 INFO AppClient$ClientActor: Executor updated: app-20150321160531-0000/1 is now LOADING
15/03/21 16:05:32 INFO NettyBlockTransferService: Server created on 33985
15/03/21 16:05:32 INFO BlockManagerMaster: Trying to register BlockManager
15/03/21 16:05:32 INFO BlockManagerMasterActor: Registering block manager funshion-hadoop193:33985 with 1060.3 MB RAM, BlockManagerId(<driver>, funshion-hadoop193, 33985)
15/03/21 16:05:32 INFO BlockManagerMaster: Registered BlockManager
15/03/21 16:05:35 INFO EventLoggingListener: Logging events to hdfs://funshion-hadoop193:8020/spark_log/app-20150321160531-0000
15/03/21 16:05:35 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
15/03/21 16:05:36 INFO SparkContext: Starting job: reduce at SparkPi.scala:35
15/03/21 16:05:36 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:35) with 10 output partitions (allowLocal=false)
15/03/21 16:05:36 INFO DAGScheduler: Final stage: Stage 0(reduce at SparkPi.scala:35)
15/03/21 16:05:36 INFO DAGScheduler: Parents of final stage: List()
15/03/21 16:05:36 INFO DAGScheduler: Missing parents: List()
15/03/21 16:05:36 INFO DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:31), which has no missing parents
15/03/21 16:05:37 INFO MemoryStore: ensureFreeSpace(1848) called with curMem=0, maxMem=1111794647
15/03/21 16:05:37 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1848.0 B, free 1060.3 MB)
15/03/21 16:05:37 INFO MemoryStore: ensureFreeSpace(1296) called with curMem=1848, maxMem=1111794647
15/03/21 16:05:37 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1296.0 B, free 1060.3 MB)
15/03/21 16:05:37 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on funshion-hadoop193:33985 (size: 1296.0 B, free: 1060.3 MB)
15/03/21 16:05:37 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
15/03/21 16:05:37 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:839
15/03/21 16:05:37 INFO DAGScheduler: Submitting 10 missing tasks from Stage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:31)
15/03/21 16:05:37 INFO TaskSchedulerImpl: Adding task set 0.0 with 10 tasks
15/03/21 16:05:39 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@funshion-hadoop194:60276/user/Executor#622568548] with ID 2
15/03/21 16:05:39 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, funshion-hadoop194, PROCESS_LOCAL, 1340 bytes)
15/03/21 16:05:39 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, funshion-hadoop194, PROCESS_LOCAL, 1340 bytes)
15/03/21 16:05:39 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@funshion-hadoop195:51291/user/Executor#1321504504] with ID 0
15/03/21 16:05:39 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, funshion-hadoop195, PROCESS_LOCAL, 1340 bytes)
15/03/21 16:05:39 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, funshion-hadoop195, PROCESS_LOCAL, 1340 bytes)
15/03/21 16:05:39 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@funshion-hadoop196:42388/user/Executor#1779514149] with ID 1
15/03/21 16:05:39 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, funshion-hadoop196, PROCESS_LOCAL, 1340 bytes)
15/03/21 16:05:39 INFO TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, funshion-hadoop196, PROCESS_LOCAL, 1340 bytes)
15/03/21 16:05:40 INFO BlockManagerMasterActor: Registering block manager funshion-hadoop194:42041 with 530.3 MB RAM, BlockManagerId(2, funshion-hadoop194, 42041)
15/03/21 16:05:40 INFO BlockManagerMasterActor: Registering block manager funshion-hadoop195:47926 with 530.3 MB RAM, BlockManagerId(0, funshion-hadoop195, 47926)
15/03/21 16:05:40 INFO BlockManagerMasterActor: Registering block manager funshion-hadoop196:36975 with 530.3 MB RAM, BlockManagerId(1, funshion-hadoop196, 36975)
15/03/21 16:05:49 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on funshion-hadoop196:36975 (size: 1296.0 B, free: 530.3 MB)
15/03/21 16:05:49 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on funshion-hadoop194:42041 (size: 1296.0 B, free: 530.3 MB)
15/03/21 16:05:50 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on funshion-hadoop195:47926 (size: 1296.0 B, free: 530.3 MB)
15/03/21 16:05:50 INFO TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, funshion-hadoop196, PROCESS_LOCAL, 1340 bytes)
15/03/21 16:05:50 INFO TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, funshion-hadoop196, PROCESS_LOCAL, 1340 bytes)
15/03/21 16:05:50 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 10827 ms on funshion-hadoop196 (1/10)
15/03/21 16:05:50 INFO TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 10833 ms on funshion-hadoop196 (2/10)
15/03/21 16:05:50 INFO TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8, funshion-hadoop196, PROCESS_LOCAL, 1340 bytes)
15/03/21 16:05:50 INFO TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, funshion-hadoop196, PROCESS_LOCAL, 1340 bytes)
15/03/21 16:05:50 INFO TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 119 ms on funshion-hadoop196 (3/10)
15/03/21 16:05:50 INFO TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 156 ms on funshion-hadoop196 (4/10)
15/03/21 16:05:50 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 11123 ms on funshion-hadoop194 (5/10)
15/03/21 16:05:50 INFO TaskSetManager: Finished task 8.0 in stage 0.0 (TID 8) in 70 ms on funshion-hadoop196 (6/10)
15/03/21 16:05:50 INFO TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 72 ms on funshion-hadoop196 (7/10)
15/03/21 16:05:50 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 11096 ms on funshion-hadoop194 (8/10)
15/03/21 16:05:51 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 11423 ms on funshion-hadoop195 (9/10)
15/03/21 16:05:51 INFO DAGScheduler: Stage 0 (reduce at SparkPi.scala:35) finished in 13.243 s
15/03/21 16:05:51 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 11447 ms on funshion-hadoop195 (10/10)
15/03/21 16:05:51 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
15/03/21 16:05:51 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:35, took 14.928861 s
Pi is roughly 3.141544
15/03/21 16:05:51 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
15/03/21 16:05:51 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
15/03/21 16:05:51 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
15/03/21 16:05:51 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
15/03/21 16:05:51 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
15/03/21 16:05:51 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
15/03/21 16:05:51 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
15/03/21 16:05:51 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
15/03/21 16:05:51 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
15/03/21 16:05:51 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
15/03/21 16:05:51 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
15/03/21 16:05:51 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
15/03/21 16:05:51 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
15/03/21 16:05:51 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
15/03/21 16:05:51 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
15/03/21 16:05:51 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
15/03/21 16:05:51 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
15/03/21 16:05:51 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
15/03/21 16:05:51 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
15/03/21 16:05:51 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
15/03/21 16:05:51 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
15/03/21 16:05:51 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
15/03/21 16:05:51 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
15/03/21 16:05:51 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
15/03/21 16:05:51 INFO SparkUI: Stopped Spark web UI at http://funshion-hadoop193:4040
15/03/21 16:05:51 INFO DAGScheduler: Stopping DAGScheduler
15/03/21 16:05:51 INFO SparkDeploySchedulerBackend: Shutting down all executors
15/03/21 16:05:51 INFO SparkDeploySchedulerBackend: Asking each executor to shut down
15/03/21 16:05:51 INFO OutputCommitCoordinator$OutputCommitCoordinatorActor: OutputCommitCoordinator stopped!
15/03/21 16:05:51 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
15/03/21 16:05:51 INFO MemoryStore: MemoryStore cleared
15/03/21 16:05:51 INFO BlockManager: BlockManager stopped
15/03/21 16:05:51 INFO BlockManagerMaster: BlockManagerMaster stopped
15/03/21 16:05:51 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/03/21 16:05:51 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
15/03/21 16:05:51 INFO SparkContext: Successfully stopped SparkContext
15/03/21 16:05:51 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.


---------------------------------------------------------------------------------------------------
-- 3.2 测试Spark shell(访问HDFS文件)
-- 3.2.1 cd 到 /usr/local/spark
[hadoop@funshion-hadoop193 spark]$ cd /usr/local/spark


-- 3.2.2 查看集群目录 /user/hadoop 是否存在(不存在,就创建一下)
[hadoop@funshion-hadoop193 spark]$ hdfs dfs -ls hdfs://funshion-hadoop193:8020/user/hadoop/
Found 3 items
drwx------   - hadoop supergroup          0 2015-03-18 08:00 hdfs://funshion-hadoop193:8020/user/hadoop/.Trash
drwxr-xr-x   - hadoop supergroup          0 2015-03-21 15:05 hdfs://funshion-hadoop193:8020/user/hadoop/.sparkStaging
drwxr-xr-x   - hadoop supergroup          0 2015-03-20 10:28 hdfs://funshion-hadoop193:8020/user/hadoop/hive


-- 3.2.3 将 /user/local/spark/README.md 文件拷贝到集群的/user/hadoop目录下:
[hadoop@funshion-hadoop193 spark]$ hdfs dfs -copyFromLocal /usr/local/spark/README.md hdfs://funshion-hadoop193:8020/user/hadoop/


-- 3.2.4 检查上一步操作是否成功(我们看到/user/hadoop目录下已经有README.md文件了):
[hadoop@funshion-hadoop193 spark]$ hdfs dfs -ls hdfs://funshion-hadoop193:8020/user/hadoop/
Found 4 items
drwx------   - hadoop supergroup          0 2015-03-18 08:00 hdfs://funshion-hadoop193:8020/user/hadoop/.Trash
drwxr-xr-x   - hadoop supergroup          0 2015-03-21 15:05 hdfs://funshion-hadoop193:8020/user/hadoop/.sparkStaging
-rw-r--r--   3 hadoop supergroup       3629 2015-03-21 16:28 hdfs://funshion-hadoop193:8020/user/hadoop/README.md
drwxr-xr-x   - hadoop supergroup          0 2015-03-20 10:28 hdfs://funshion-hadoop193:8020/user/hadoop/hive


-- 3.2.4 测试Spark shell
[hadoop@funshion-hadoop193 spark]$ pwd
/usr/local/spark
[hadoop@funshion-hadoop193 spark]$ ./bin/spark-shell --master spark://funshion-hadoop193:7077
Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/03/21 16:32:33 INFO SecurityManager: Changing view acls to: hadoop
15/03/21 16:32:33 INFO SecurityManager: Changing modify acls to: hadoop
15/03/21 16:32:33 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/03/21 16:32:33 INFO HttpServer: Starting HTTP Server
15/03/21 16:32:33 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/21 16:32:33 INFO AbstractConnector: Started SocketConnector@0.0.0.0:52784
15/03/21 16:32:33 INFO Utils: Successfully started service 'HTTP class server' on port 52784.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.3.0
      /_/


Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_75)
Type in expressions to have them evaluated.
Type :help for more information.
15/03/21 16:32:46 INFO SparkContext: Running Spark version 1.3.0
15/03/21 16:32:46 WARN SparkConf: 
SPARK_CLASSPATH was detected (set to ':/usr/local/hadoop/lib/hadoop-lzo-0.4.20-SNAPSHOT.jar').
This is deprecated in Spark 1.0+.


Please instead use:
 - ./spark-submit with --driver-class-path to augment the driver classpath
 - spark.executor.extraClassPath to augment the executor classpath
        
15/03/21 16:32:46 WARN SparkConf: Setting 'spark.executor.extraClassPath' to ':/usr/local/hadoop/lib/hadoop-lzo-0.4.20-SNAPSHOT.jar' as a work-around.
15/03/21 16:32:46 WARN SparkConf: Setting 'spark.driver.extraClassPath' to ':/usr/local/hadoop/lib/hadoop-lzo-0.4.20-SNAPSHOT.jar' as a work-around.
15/03/21 16:32:47 INFO SecurityManager: Changing view acls to: hadoop
15/03/21 16:32:47 INFO SecurityManager: Changing modify acls to: hadoop
15/03/21 16:32:47 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/03/21 16:32:48 INFO Slf4jLogger: Slf4jLogger started
15/03/21 16:32:48 INFO Remoting: Starting remoting
15/03/21 16:32:48 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@funshion-hadoop193:56709]
15/03/21 16:32:48 INFO Utils: Successfully started service 'sparkDriver' on port 56709.
15/03/21 16:32:48 INFO SparkEnv: Registering MapOutputTracker
15/03/21 16:32:48 INFO SparkEnv: Registering BlockManagerMaster
15/03/21 16:32:48 INFO DiskBlockManager: Created local directory at /tmp/spark-fceb073a-5114-4ca1-aa3b-35cdc4905eec/blockmgr-ea7850bc-f902-4f9a-aef2-7228d35b2a2c
15/03/21 16:32:48 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
15/03/21 16:32:49 INFO HttpFileServer: HTTP File server directory is /tmp/spark-0de1d6ac-b075-4cea-aefa-4e5b7fe492c6/httpd-84193574-041c-4bdc-abbb-6033aa484d92
15/03/21 16:32:49 INFO HttpServer: Starting HTTP Server
15/03/21 16:32:49 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/21 16:32:49 INFO AbstractConnector: Started SocketConnector@0.0.0.0:37981
15/03/21 16:32:49 INFO Utils: Successfully started service 'HTTP file server' on port 37981.
15/03/21 16:32:49 INFO SparkEnv: Registering OutputCommitCoordinator
15/03/21 16:32:49 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/21 16:32:49 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
15/03/21 16:32:49 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/03/21 16:32:49 INFO SparkUI: Started SparkUI at http://funshion-hadoop193:4040
15/03/21 16:32:50 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@funshion-hadoop193:7077/user/Master...
15/03/21 16:32:50 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20150321163250-0002
15/03/21 16:32:50 INFO AppClient$ClientActor: Executor added: app-20150321163250-0002/0 on worker-20150321160018-funshion-hadoop195-46031 (funshion-hadoop195:46031) with 2 cores
15/03/21 16:32:50 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150321163250-0002/0 on hostPort funshion-hadoop195:46031 with 2 cores, 512.0 MB RAM
15/03/21 16:32:50 INFO AppClient$ClientActor: Executor added: app-20150321163250-0002/1 on worker-20150321160019-funshion-hadoop196-53113 (funshion-hadoop196:53113) with 2 cores
15/03/21 16:32:50 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150321163250-0002/1 on hostPort funshion-hadoop196:53113 with 2 cores, 512.0 MB RAM
15/03/21 16:32:50 INFO AppClient$ClientActor: Executor added: app-20150321163250-0002/2 on worker-20150321160018-funshion-hadoop194-56515 (funshion-hadoop194:56515) with 2 cores
15/03/21 16:32:50 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150321163250-0002/2 on hostPort funshion-hadoop194:56515 with 2 cores, 512.0 MB RAM
15/03/21 16:32:50 INFO AppClient$ClientActor: Executor updated: app-20150321163250-0002/1 is now LOADING
15/03/21 16:32:50 INFO AppClient$ClientActor: Executor updated: app-20150321163250-0002/0 is now LOADING
15/03/21 16:32:50 INFO AppClient$ClientActor: Executor updated: app-20150321163250-0002/2 is now LOADING
15/03/21 16:32:50 INFO AppClient$ClientActor: Executor updated: app-20150321163250-0002/0 is now RUNNING
15/03/21 16:32:50 INFO AppClient$ClientActor: Executor updated: app-20150321163250-0002/1 is now RUNNING
15/03/21 16:32:50 INFO AppClient$ClientActor: Executor updated: app-20150321163250-0002/2 is now RUNNING
15/03/21 16:32:51 INFO NettyBlockTransferService: Server created on 49153
15/03/21 16:32:51 INFO BlockManagerMaster: Trying to register BlockManager
15/03/21 16:32:51 INFO BlockManagerMasterActor: Registering block manager funshion-hadoop193:49153 with 265.4 MB RAM, BlockManagerId(<driver>, funshion-hadoop193, 49153)
15/03/21 16:32:51 INFO BlockManagerMaster: Registered BlockManager
15/03/21 16:32:55 INFO EventLoggingListener: Logging events to hdfs://funshion-hadoop193:8020/spark_log/app-20150321163250-0002
15/03/21 16:32:55 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
15/03/21 16:32:55 INFO SparkILoop: Created spark context..
Spark context available as sc.
15/03/21 16:32:57 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
15/03/21 16:32:57 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@funshion-hadoop194:52091/user/Executor#247904344] with ID 2
15/03/21 16:32:58 INFO BlockManagerMasterActor: Registering block manager funshion-hadoop194:43331 with 265.4 MB RAM, BlockManagerId(2, funshion-hadoop194, 43331)
15/03/21 16:32:58 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@funshion-hadoop196:38636/user/Executor#-1065092827] with ID 1
15/03/21 16:32:58 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@funshion-hadoop195:47721/user/Executor#700969315] with ID 0
15/03/21 16:32:58 INFO BlockManagerMasterActor: Registering block manager funshion-hadoop196:58778 with 265.4 MB RAM, BlockManagerId(1, funshion-hadoop196, 58778)
15/03/21 16:32:58 INFO BlockManagerMasterActor: Registering block manager funshion-hadoop195:40409 with 265.4 MB RAM, BlockManagerId(0, funshion-hadoop195, 40409)


scala> sc
res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@4b3734e9


scala> val file = sc.textFile("hdfs://funshion-hadoop193:8020/user/hadoop/README.md")
15/03/21 16:33:12 INFO MemoryStore: ensureFreeSpace(238253) called with curMem=0, maxMem=278302556
15/03/21 16:33:12 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 232.7 KB, free 265.2 MB)
15/03/21 16:33:12 INFO MemoryStore: ensureFreeSpace(33723) called with curMem=238253, maxMem=278302556
15/03/21 16:33:12 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 32.9 KB, free 265.2 MB)
15/03/21 16:33:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on funshion-hadoop193:49153 (size: 32.9 KB, free: 265.4 MB)
15/03/21 16:33:13 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
15/03/21 16:33:13 INFO SparkContext: Created broadcast 0 from textFile at <console>:21
file: org.apache.spark.rdd.RDD[String] = hdfs://funshion-hadoop193:8020/user/hadoop/README.md MapPartitionsRDD[1] at textFile at <console>:21


scala> val sparks = file.filter(line => line.contains("Spark"))
sparks: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[2] at filter at <console>:23


scala> sparks.count
15/03/21 16:33:45 INFO GPLNativeCodeLoader: Loaded native gpl library from the embedded binaries
15/03/21 16:33:45 INFO LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev e8c11c2be93b965abb548411379b203dabcbce79]
15/03/21 16:33:45 INFO FileInputFormat: Total input paths to process : 1
15/03/21 16:33:45 INFO SparkContext: Starting job: count at <console>:26
15/03/21 16:33:45 INFO DAGScheduler: Got job 0 (count at <console>:26) with 2 output partitions (allowLocal=false)
15/03/21 16:33:45 INFO DAGScheduler: Final stage: Stage 0(count at <console>:26)
15/03/21 16:33:45 INFO DAGScheduler: Parents of final stage: List()
15/03/21 16:33:45 INFO DAGScheduler: Missing parents: List()
15/03/21 16:33:45 INFO DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[2] at filter at <console>:23), which has no missing parents
15/03/21 16:33:46 INFO MemoryStore: ensureFreeSpace(2880) called with curMem=271976, maxMem=278302556
15/03/21 16:33:46 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2.8 KB, free 265.1 MB)
15/03/21 16:33:46 INFO MemoryStore: ensureFreeSpace(2067) called with curMem=274856, maxMem=278302556
15/03/21 16:33:46 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.0 KB, free 265.1 MB)
15/03/21 16:33:46 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on funshion-hadoop193:49153 (size: 2.0 KB, free: 265.4 MB)
15/03/21 16:33:46 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
15/03/21 16:33:46 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:839
15/03/21 16:33:46 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MapPartitionsRDD[2] at filter at <console>:23)
15/03/21 16:33:46 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
15/03/21 16:33:46 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, funshion-hadoop195, NODE_LOCAL, 1316 bytes)
15/03/21 16:33:46 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, funshion-hadoop194, NODE_LOCAL, 1316 bytes)
15/03/21 16:33:47 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on funshion-hadoop194:43331 (size: 2.0 KB, free: 265.4 MB)
15/03/21 16:33:47 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on funshion-hadoop195:40409 (size: 2.0 KB, free: 265.4 MB)
15/03/21 16:33:47 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on funshion-hadoop194:43331 (size: 32.9 KB, free: 265.4 MB)
15/03/21 16:33:47 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on funshion-hadoop195:40409 (size: 32.9 KB, free: 265.4 MB)
15/03/21 16:33:49 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 3675 ms on funshion-hadoop195 (1/2)
15/03/21 16:33:49 INFO DAGScheduler: Stage 0 (count at <console>:26) finished in 3.761 s
15/03/21 16:33:49 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 3724 ms on funshion-hadoop194 (2/2)
15/03/21 16:33:49 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
15/03/21 16:33:49 INFO DAGScheduler: Job 0 finished: count at <console>:26, took 4.055437 s
res1: Long = 19




---------------------------------------------------------------------------------------------------
-- 3.3 测试Spark SQL访问Hive表:


-- 3.3.1 将hive-site.xml配置文件拷贝到/usr/local/spark/conf/,(拷贝后,建议重启一下spark集群)


cp $HIVE_HOME/conf/hive-site.xml /usr/local/spark/conf/
cp $HIVE_HOME/lib/mysql-connector-java-5.1.17-bin.jar /usr/local/spark/lib/


--  注意1:我是在funshion-hadoop192、funshion-hadoop193两台服务器上都跑有Hive Metastore服务
-- (如果各hive客户端都配置访问两个metastore服务端口,且两个metastore服务访问两个mysql数据库A、B;
--  且两台mysql数据库A、B是双向复制的话,Hive层面就是真正的“HA”(高可用)了。)


-- hive metastore服务启动命令类似如下:
cd $HIVE_HOME
nohup hive --service metastore -p 10000 &


-- 启动hive metastore服务后,可以通过如下命令查看10000端口是否在监听:
[hadoop@funshion-hadoop192 hive]$ netstat -anl |grep 10000
tcp        0      0 0.0.0.0:10000               0.0.0.0:*                   LISTEN      
tcp        0      0 192.168.117.192:10000       192.168.117.193:38363       ESTABLISHED 


-- 注意2:我的hive-site.xml 文件配置类似如下,
-------------------------------------------------
<property>
  <name>hive.metastore.uris</name>
  <value>thrift://funshion-hadoop192:10000,thrift://funshion-hadoop193:10000</value>
</property>


<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://192.168.117.193:3306/hive?createDatabaseIfNotExist=true&amp;useUnicode=true&amp;characterEncoding=UTF-8</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>


<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>


<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hive</value>
  <description>username to use against metastore database</description>
</property>


<property>
  <name>hadoop.security.credential.provider.path</name>
  <value>jceks://hdfs@funshion-hadoop193:8020/user/hadoop/hive/conf/hive.jceks</value>
</property>


<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>/home/hadoop/hive/warehouse</value>
  <description>location of default database for the warehouse</description>
</property>


-------------------------------------------------




[hadoop@funshion-hadoop193 spark]$ pwd
/usr/local/spark
[hadoop@funshion-hadoop193 spark]$ ./bin/spark-shell --master spark://funshion-hadoop193:7077
Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/03/21 17:19:27 INFO SecurityManager: Changing view acls to: hadoop
15/03/21 17:19:27 INFO SecurityManager: Changing modify acls to: hadoop
15/03/21 17:19:27 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/03/21 17:19:27 INFO HttpServer: Starting HTTP Server
15/03/21 17:19:27 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/21 17:19:27 INFO AbstractConnector: Started SocketConnector@0.0.0.0:40063
15/03/21 17:19:27 INFO Utils: Successfully started service 'HTTP class server' on port 40063.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.3.0
      /_/


Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_75)
Type in expressions to have them evaluated.
Type :help for more information.
15/03/21 17:19:42 INFO SparkContext: Running Spark version 1.3.0
15/03/21 17:19:42 WARN SparkConf: 
SPARK_CLASSPATH was detected (set to ':/usr/local/hadoop/lib/hadoop-lzo-0.4.20-SNAPSHOT.jar').
This is deprecated in Spark 1.0+.


Please instead use:
 - ./spark-submit with --driver-class-path to augment the driver classpath
 - spark.executor.extraClassPath to augment the executor classpath
        
15/03/21 17:19:42 WARN SparkConf: Setting 'spark.executor.extraClassPath' to ':/usr/local/hadoop/lib/hadoop-lzo-0.4.20-SNAPSHOT.jar' as a work-around.
15/03/21 17:19:42 WARN SparkConf: Setting 'spark.driver.extraClassPath' to ':/usr/local/hadoop/lib/hadoop-lzo-0.4.20-SNAPSHOT.jar' as a work-around.
15/03/21 17:19:42 INFO SecurityManager: Changing view acls to: hadoop
15/03/21 17:19:42 INFO SecurityManager: Changing modify acls to: hadoop
15/03/21 17:19:42 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/03/21 17:19:44 INFO Slf4jLogger: Slf4jLogger started
15/03/21 17:19:44 INFO Remoting: Starting remoting
15/03/21 17:19:44 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@funshion-hadoop193:45440]
15/03/21 17:19:44 INFO Utils: Successfully started service 'sparkDriver' on port 45440.
15/03/21 17:19:44 INFO SparkEnv: Registering MapOutputTracker
15/03/21 17:19:45 INFO SparkEnv: Registering BlockManagerMaster
15/03/21 17:19:45 INFO DiskBlockManager: Created local directory at /tmp/spark-69b67fe8-574d-4476-b020-06740c36c98a/blockmgr-fec621cd-f28f-424c-8854-9d7e842b5212
15/03/21 17:19:45 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
15/03/21 17:19:45 INFO HttpFileServer: HTTP File server directory is /tmp/spark-7459d9d7-3f68-4f89-b16b-5a20d44b7fba/httpd-61ca1574-6693-4b38-ad22-bee627890833
15/03/21 17:19:45 INFO HttpServer: Starting HTTP Server
15/03/21 17:19:45 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/21 17:19:45 INFO AbstractConnector: Started SocketConnector@0.0.0.0:54196
15/03/21 17:19:45 INFO Utils: Successfully started service 'HTTP file server' on port 54196.
15/03/21 17:19:45 INFO SparkEnv: Registering OutputCommitCoordinator
15/03/21 17:19:46 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/21 17:19:46 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
15/03/21 17:19:46 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/03/21 17:19:46 INFO SparkUI: Started SparkUI at http://funshion-hadoop193:4040
15/03/21 17:19:46 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@funshion-hadoop193:7077/user/Master...
15/03/21 17:19:47 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20150321171947-0000
15/03/21 17:19:47 INFO AppClient$ClientActor: Executor added: app-20150321171947-0000/0 on worker-20150321171905-funshion-hadoop195-43185 (funshion-hadoop195:43185) with 2 cores
15/03/21 17:19:47 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150321171947-0000/0 on hostPort funshion-hadoop195:43185 with 2 cores, 512.0 MB RAM
15/03/21 17:19:47 INFO AppClient$ClientActor: Executor added: app-20150321171947-0000/1 on worker-20150321171905-funshion-hadoop194-34245 (funshion-hadoop194:34245) with 2 cores
15/03/21 17:19:47 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150321171947-0000/1 on hostPort funshion-hadoop194:34245 with 2 cores, 512.0 MB RAM
15/03/21 17:19:47 INFO AppClient$ClientActor: Executor added: app-20150321171947-0000/2 on worker-20150321171905-funshion-hadoop196-48202 (funshion-hadoop196:48202) with 2 cores
15/03/21 17:19:47 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150321171947-0000/2 on hostPort funshion-hadoop196:48202 with 2 cores, 512.0 MB RAM
15/03/21 17:19:48 INFO AppClient$ClientActor: Executor updated: app-20150321171947-0000/0 is now RUNNING
15/03/21 17:19:48 INFO AppClient$ClientActor: Executor updated: app-20150321171947-0000/1 is now RUNNING
15/03/21 17:19:48 INFO AppClient$ClientActor: Executor updated: app-20150321171947-0000/2 is now LOADING
15/03/21 17:19:48 INFO AppClient$ClientActor: Executor updated: app-20150321171947-0000/2 is now RUNNING
15/03/21 17:19:48 INFO AppClient$ClientActor: Executor updated: app-20150321171947-0000/1 is now LOADING
15/03/21 17:19:48 INFO AppClient$ClientActor: Executor updated: app-20150321171947-0000/0 is now LOADING
15/03/21 17:19:48 INFO NettyBlockTransferService: Server created on 56884
15/03/21 17:19:48 INFO BlockManagerMaster: Trying to register BlockManager
15/03/21 17:19:48 INFO BlockManagerMasterActor: Registering block manager funshion-hadoop193:56884 with 265.4 MB RAM, BlockManagerId(<driver>, funshion-hadoop193, 56884)
15/03/21 17:19:48 INFO BlockManagerMaster: Registered BlockManager
15/03/21 17:19:51 INFO EventLoggingListener: Logging events to hdfs://funshion-hadoop193:8020/spark_log/app-20150321171947-0000
15/03/21 17:19:51 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
15/03/21 17:19:51 INFO SparkILoop: Created spark context..
Spark context available as sc.
15/03/21 17:19:53 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
15/03/21 17:19:55 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@funshion-hadoop194:58559/user/Executor#-1666693618] with ID 1
15/03/21 17:19:56 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@funshion-hadoop195:44023/user/Executor#2077708725] with ID 0
15/03/21 17:19:56 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@funshion-hadoop196:55503/user/Executor#282621553] with ID 2
15/03/21 17:19:56 INFO BlockManagerMasterActor: Registering block manager funshion-hadoop194:41519 with 265.4 MB RAM, BlockManagerId(1, funshion-hadoop194, 41519)
15/03/21 17:19:56 INFO BlockManagerMasterActor: Registering block manager funshion-hadoop195:35169 with 265.4 MB RAM, BlockManagerId(0, funshion-hadoop195, 35169)
15/03/21 17:19:56 INFO BlockManagerMasterActor: Registering block manager funshion-hadoop196:40584 with 265.4 MB RAM, BlockManagerId(2, funshion-hadoop196, 40584)


scala> sc
res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@53077f45


scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@78624930


scala> sqlContext.sql("FROM web.pv2 SELECT time, ip, fck, mac, userid, fpc, version limit 10").collect().foreach(println)
15/03/21 17:20:24 INFO metastore: Trying to connect to metastore with URI thrift://funshion-hadoop192:10000
15/03/21 17:20:24 INFO metastore: Connected to metastore.
15/03/21 17:20:25 INFO SessionState: No Tez session required at this point. hive.execution.engine=mr.
15/03/21 17:20:26 INFO ParseDriver: Parsing command: FROM web.pv2 SELECT time, ip, fck, mac, userid, fpc, version limit 10
15/03/21 17:20:26 INFO ParseDriver: Parse Completed
15/03/21 17:20:30 INFO deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/03/21 17:20:30 INFO MemoryStore: ensureFreeSpace(392934) called with curMem=0, maxMem=278302556
15/03/21 17:20:30 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 383.7 KB, free 265.0 MB)
15/03/21 17:20:31 INFO MemoryStore: ensureFreeSpace(70953) called with curMem=392934, maxMem=278302556
15/03/21 17:20:31 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 69.3 KB, free 265.0 MB)
15/03/21 17:20:31 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on funshion-hadoop193:56884 (size: 69.3 KB, free: 265.3 MB)
15/03/21 17:20:31 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
15/03/21 17:20:31 INFO SparkContext: Created broadcast 0 from broadcast at TableReader.scala:74
15/03/21 17:20:39 INFO GPLNativeCodeLoader: Loaded native gpl library from the embedded binaries
15/03/21 17:20:39 INFO LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev e8c11c2be93b965abb548411379b203dabcbce79]
15/03/21 17:20:39 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:40 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:40 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:40 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:40 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:40 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:41 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:41 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:41 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:41 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:41 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:41 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:41 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:41 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:42 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:42 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:42 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:42 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:42 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:42 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:42 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:42 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:42 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:42 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:42 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:42 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:42 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:43 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:43 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:43 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:43 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:43 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:43 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:43 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:43 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:43 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:43 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:43 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:43 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:44 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:44 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:44 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:44 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:44 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:44 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:44 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:44 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:44 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:44 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:44 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:44 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:44 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:44 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:45 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:45 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:45 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:45 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:45 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:45 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:45 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:45 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:45 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:45 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:45 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:46 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:46 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:46 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:46 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:46 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:46 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:46 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:46 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:46 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:46 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:46 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:47 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:47 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:47 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:47 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:47 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:47 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:47 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:47 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:47 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:47 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:47 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:47 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:48 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:48 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:48 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:48 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:48 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:48 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:48 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:48 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:48 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:48 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:48 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:49 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:49 INFO NetworkTopology: Adding a new node: /default-rack/192.168.117.194:50010
15/03/21 17:20:49 INFO NetworkTopology: Adding a new node: /default-rack/192.168.117.196:50010
15/03/21 17:20:49 INFO NetworkTopology: Adding a new node: /default-rack/192.168.117.195:50010
15/03/21 17:20:49 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:49 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:49 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:49 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:49 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:49 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:49 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:49 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:49 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:49 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:49 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:49 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:50 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:50 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:50 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:50 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:50 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:50 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:50 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:50 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:50 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:50 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:50 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:50 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:50 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:50 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:51 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:51 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:51 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:51 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:51 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:51 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:51 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:51 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:51 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:51 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:51 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:51 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:51 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:51 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:51 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:52 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:52 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:52 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:52 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:52 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:52 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:52 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:52 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:52 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:52 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:52 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:52 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:52 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:52 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:53 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:53 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:53 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:53 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:53 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:53 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:53 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:53 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:53 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:53 INFO NetworkTopology: Adding a new node: /default-rack/192.168.117.196:50010
15/03/21 17:20:53 INFO NetworkTopology: Adding a new node: /default-rack/192.168.117.195:50010
15/03/21 17:20:53 INFO NetworkTopology: Adding a new node: /default-rack/192.168.117.194:50010
15/03/21 17:20:53 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:53 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:53 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:53 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:53 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:53 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:53 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:54 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:54 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:54 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:54 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:54 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:54 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:54 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:54 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:54 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:54 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:54 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:54 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:54 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:54 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:54 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:55 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:55 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:55 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:55 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:55 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:55 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:55 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:55 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:55 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:55 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:55 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:55 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:55 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:56 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:56 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:56 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:56 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:56 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:56 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:56 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:56 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:56 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:56 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:56 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:56 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:56 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:56 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:56 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:56 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:56 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:57 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:57 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:57 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:57 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:57 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:57 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:57 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:57 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:57 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:57 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:57 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:57 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:20:57 INFO SparkContext: Starting job: runJob at SparkPlan.scala:121
15/03/21 17:20:57 INFO DAGScheduler: Got job 0 (runJob at SparkPlan.scala:121) with 1 output partitions (allowLocal=false)
15/03/21 17:20:57 INFO DAGScheduler: Final stage: Stage 0(runJob at SparkPlan.scala:121)
15/03/21 17:20:57 INFO DAGScheduler: Parents of final stage: List()
15/03/21 17:20:58 INFO DAGScheduler: Missing parents: List()
15/03/21 17:20:58 INFO DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[682] at map at SparkPlan.scala:96), which has no missing parents
15/03/21 17:20:59 INFO MemoryStore: ensureFreeSpace(231264) called with curMem=463887, maxMem=278302556
15/03/21 17:20:59 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 225.8 KB, free 264.7 MB)
15/03/21 17:20:59 INFO MemoryStore: ensureFreeSpace(155760) called with curMem=695151, maxMem=278302556
15/03/21 17:20:59 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 152.1 KB, free 264.6 MB)
15/03/21 17:20:59 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on funshion-hadoop193:56884 (size: 152.1 KB, free: 265.2 MB)
15/03/21 17:20:59 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
15/03/21 17:20:59 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:839
15/03/21 17:20:59 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (MapPartitionsRDD[682] at map at SparkPlan.scala:96)
15/03/21 17:20:59 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
15/03/21 17:20:59 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, funshion-hadoop196, NODE_LOCAL, 1476 bytes)
15/03/21 17:21:00 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on funshion-hadoop196:40584 (size: 152.1 KB, free: 265.3 MB)
15/03/21 17:21:02 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on funshion-hadoop196:40584 (size: 69.3 KB, free: 265.2 MB)
15/03/21 17:21:06 INFO DAGScheduler: Stage 0 (runJob at SparkPlan.scala:121) finished in 7.570 s
15/03/21 17:21:06 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 7549 ms on funshion-hadoop196 (1/1)
15/03/21 17:21:06 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
15/03/21 17:21:06 INFO DAGScheduler: Job 0 finished: runJob at SparkPlan.scala:121, took 9.200024 s
[1425700800,106.39.223.13,142566222966d10,,0,,]
[1425700800,171.126.92.234,1419652425640v4,001E90B48B29,0,uoc_0_,3.0.3.45]
[1425700800,115.48.155.99,142278045504f40,48D22446F0FD,0,uoc_0_,3.0.3.45]
[1425700800,42.84.215.124,1425297728b6ec5,8C89A57242F8,0,uoc_0_,3.0.1.30]
[1425700800,27.36.219.185,142570079711a1a,,0,,]
[1425700800,42.63.106.214,142570079690d2b,,0,,]
[1425700800,119.177.15.114,14241507820428d,00245404264E,0,uoc_0_,3.0.1.30]
[1425700800,42.63.106.214,1425700796594da,,0,,]
[1425700800,180.149.143.146,1425700800d0502,,0,,]
[1425700800,111.201.153.164,1378541151a3eea,E0B9A51A05E0,0,oin_0_,3.0.3.45]


scala> 15/03/21 17:25:04 INFO BlockManager: Removing broadcast 1
15/03/21 17:25:04 INFO BlockManager: Removing block broadcast_1_piece0
15/03/21 17:25:04 INFO MemoryStore: Block broadcast_1_piece0 of size 155760 dropped from memory (free 277607405)
15/03/21 17:25:04 INFO BlockManagerInfo: Removed broadcast_1_piece0 on funshion-hadoop193:56884 in memory (size: 152.1 KB, free: 265.3 MB)
15/03/21 17:25:04 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
15/03/21 17:25:04 INFO BlockManager: Removing block broadcast_1
15/03/21 17:25:04 INFO MemoryStore: Block broadcast_1 of size 231264 dropped from memory (free 277838669)
15/03/21 17:25:04 INFO BlockManagerInfo: Removed broadcast_1_piece0 on funshion-hadoop196:40584 in memory (size: 152.1 KB, free: 265.3 MB)
15/03/21 17:25:05 INFO ContextCleaner: Cleaned broadcast 1




scala> sqlContext.sql("FROM web.pv2 SELECT count(*) WHERE year='2015' and month='03' and day='09' and hour='10'").collect().foreach(println)
15/03/21 17:33:53 INFO ParseDriver: Parsing command: FROM web.pv2 SELECT count(*) WHERE year='2015' and month='03' and day='09' and hour='10'
15/03/21 17:33:53 INFO ParseDriver: Parse Completed
15/03/21 17:33:54 INFO MemoryStore: ensureFreeSpace(387646) called with curMem=463887, maxMem=278302556
15/03/21 17:33:54 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 378.6 KB, free 264.6 MB)
15/03/21 17:33:55 INFO MemoryStore: ensureFreeSpace(70619) called with curMem=851533, maxMem=278302556
15/03/21 17:33:55 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 69.0 KB, free 264.5 MB)
15/03/21 17:33:55 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on funshion-hadoop193:56884 (size: 69.0 KB, free: 265.3 MB)
15/03/21 17:33:55 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0
15/03/21 17:33:55 INFO SparkContext: Created broadcast 2 from broadcast at TableReader.scala:74
15/03/21 17:33:56 INFO SparkContext: Starting job: collect at SparkPlan.scala:83
15/03/21 17:33:57 INFO FileInputFormat: Total input paths to process : 4
15/03/21 17:33:57 INFO DAGScheduler: Registering RDD 688 (mapPartitions at Exchange.scala:100)
15/03/21 17:33:57 INFO DAGScheduler: Got job 1 (collect at SparkPlan.scala:83) with 1 output partitions (allowLocal=false)
15/03/21 17:33:57 INFO DAGScheduler: Final stage: Stage 2(collect at SparkPlan.scala:83)
15/03/21 17:33:57 INFO DAGScheduler: Parents of final stage: List(Stage 1)
15/03/21 17:33:57 INFO DAGScheduler: Missing parents: List(Stage 1)
15/03/21 17:33:57 INFO DAGScheduler: Submitting Stage 1 (MapPartitionsRDD[688] at mapPartitions at Exchange.scala:100), which has no missing parents
15/03/21 17:33:57 INFO MemoryStore: ensureFreeSpace(202320) called with curMem=922152, maxMem=278302556
15/03/21 17:33:57 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 197.6 KB, free 264.3 MB)
15/03/21 17:33:57 INFO MemoryStore: ensureFreeSpace(129167) called with curMem=1124472, maxMem=278302556
15/03/21 17:33:57 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 126.1 KB, free 264.2 MB)
15/03/21 17:33:57 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on funshion-hadoop193:56884 (size: 126.1 KB, free: 265.2 MB)
15/03/21 17:33:57 INFO BlockManagerMaster: Updated info of block broadcast_3_piece0
15/03/21 17:33:57 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:839
15/03/21 17:33:57 INFO DAGScheduler: Submitting 3 missing tasks from Stage 1 (MapPartitionsRDD[688] at mapPartitions at Exchange.scala:100)
15/03/21 17:33:57 INFO TaskSchedulerImpl: Adding task set 1.0 with 3 tasks
15/03/21 17:33:57 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, funshion-hadoop196, NODE_LOCAL, 1465 bytes)
15/03/21 17:33:57 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 2, funshion-hadoop194, NODE_LOCAL, 1466 bytes)
15/03/21 17:33:57 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 3, funshion-hadoop195, NODE_LOCAL, 1466 bytes)
15/03/21 17:33:57 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on funshion-hadoop196:40584 (size: 126.1 KB, free: 265.2 MB)
15/03/21 17:33:58 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on funshion-hadoop196:40584 (size: 69.0 KB, free: 265.2 MB)
15/03/21 17:33:58 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on funshion-hadoop195:35169 (size: 126.1 KB, free: 265.3 MB)
15/03/21 17:33:58 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on funshion-hadoop194:41519 (size: 126.1 KB, free: 265.3 MB)
15/03/21 17:34:01 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on funshion-hadoop195:35169 (size: 69.0 KB, free: 265.2 MB)
15/03/21 17:34:01 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on funshion-hadoop194:41519 (size: 69.0 KB, free: 265.2 MB)
15/03/21 17:34:02 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 4500 ms on funshion-hadoop196 (1/3)
15/03/21 17:34:05 INFO TaskSetManager: Finished task 2.0 in stage 1.0 (TID 3) in 8102 ms on funshion-hadoop195 (2/3)
15/03/21 17:34:08 INFO DAGScheduler: Stage 1 (mapPartitions at Exchange.scala:100) finished in 10.438 s
15/03/21 17:34:08 INFO DAGScheduler: looking for newly runnable stages
15/03/21 17:34:08 INFO DAGScheduler: running: Set()
15/03/21 17:34:08 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 2) in 10440 ms on funshion-hadoop194 (3/3)
15/03/21 17:34:08 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
15/03/21 17:34:08 INFO DAGScheduler: waiting: Set(Stage 2)
15/03/21 17:34:08 INFO DAGScheduler: failed: Set()
15/03/21 17:34:08 INFO DAGScheduler: Missing parents for Stage 2: List()
15/03/21 17:34:08 INFO DAGScheduler: Submitting Stage 2 (MapPartitionsRDD[692] at map at SparkPlan.scala:83), which is now runnable
15/03/21 17:34:08 INFO MemoryStore: ensureFreeSpace(200192) called with curMem=1253639, maxMem=278302556
15/03/21 17:34:08 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 195.5 KB, free 264.0 MB)
15/03/21 17:34:08 INFO MemoryStore: ensureFreeSpace(127644) called with curMem=1453831, maxMem=278302556
15/03/21 17:34:08 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 124.7 KB, free 263.9 MB)
15/03/21 17:34:08 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on funshion-hadoop193:56884 (size: 124.7 KB, free: 265.0 MB)
15/03/21 17:34:08 INFO BlockManagerMaster: Updated info of block broadcast_4_piece0
15/03/21 17:34:08 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:839
15/03/21 17:34:08 INFO DAGScheduler: Submitting 1 missing tasks from Stage 2 (MapPartitionsRDD[692] at map at SparkPlan.scala:83)
15/03/21 17:34:08 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
15/03/21 17:34:08 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 4, funshion-hadoop194, PROCESS_LOCAL, 1056 bytes)
15/03/21 17:34:08 INFO BlockManager: Removing broadcast 3
15/03/21 17:34:08 INFO BlockManager: Removing block broadcast_3_piece0
15/03/21 17:34:08 INFO MemoryStore: Block broadcast_3_piece0 of size 129167 dropped from memory (free 276850248)
15/03/21 17:34:08 INFO BlockManagerInfo: Removed broadcast_3_piece0 on funshion-hadoop193:56884 in memory (size: 126.1 KB, free: 265.2 MB)
15/03/21 17:34:08 INFO BlockManagerMaster: Updated info of block broadcast_3_piece0
15/03/21 17:34:08 INFO BlockManager: Removing block broadcast_3
15/03/21 17:34:08 INFO MemoryStore: Block broadcast_3 of size 202320 dropped from memory (free 277052568)
15/03/21 17:34:08 INFO BlockManagerInfo: Removed broadcast_3_piece0 on funshion-hadoop196:40584 in memory (size: 126.1 KB, free: 265.3 MB)
15/03/21 17:34:08 INFO BlockManagerInfo: Removed broadcast_3_piece0 on funshion-hadoop195:35169 in memory (size: 126.1 KB, free: 265.3 MB)
15/03/21 17:34:08 INFO BlockManagerInfo: Removed broadcast_3_piece0 on funshion-hadoop194:41519 in memory (size: 126.1 KB, free: 265.3 MB)
15/03/21 17:34:08 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on funshion-hadoop194:41519 (size: 124.7 KB, free: 265.2 MB)
15/03/21 17:34:08 INFO ContextCleaner: Cleaned broadcast 3
15/03/21 17:34:08 INFO MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to sparkExecutor@funshion-hadoop194:58559
15/03/21 17:34:08 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 176 bytes
15/03/21 17:34:09 INFO DAGScheduler: Stage 2 (collect at SparkPlan.scala:83) finished in 1.120 s
15/03/21 17:34:09 INFO DAGScheduler: Job 1 finished: collect at SparkPlan.scala:83, took 12.743659 s
15/03/21 17:34:09 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 4) in 1055 ms on funshion-hadoop194 (1/1)
15/03/21 17:34:09 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 
[1302875]


-- 然后用hive验证两个查询:
[hadoop@funshion-hadoop193 lib]$ hive


Logging initialized using configuration in file:/usr/local/apache-hive-1.0.0-bin/conf/hive-log4j.properties
hive> use web;
OK
Time taken: 1.194 seconds
hive> FROM web.pv2 SELECT time, ip, fck, mac, userid, fpc, version limit 10;
OK
1425139200 42.236.234.126 1405150429lj8hn AC220B7F6748 0 uoc_0_ 3.0.1.30
1425139200 218.29.215.246 1425139395a9cef 0
1425139200 58.243.98.165 14251391979c831 0
1425139200 123.125.71.50 142513920049edd 0
1425139200 125.44.54.118 137856542564zl4 20CF30E648AB 0 uoc_0_ 3.0.1.30
1425139200 122.139.44.143 1425139262d0717 0
1425139200 221.215.146.34 1414606324dx62z DFBE2ED3B408 0 uoc_0_ 3.0.3.36
1425139200 42.237.191.77 14251392436991e 0
1425139200 123.119.227.3 1425139201c570b 0
1425139200 42.237.191.77 14251392436991e 0
Time taken: 4.856 seconds, Fetched: 10 row(s)
hive> FROM web.pv2 SELECT count(*) WHERE year='2015' and month='03' and day='09' and hour='10';
Query ID = hadoop_20150321173737_db387447-29af-4199-80c8-85aa01070f67
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1426913373071_0013, Tracking URL = http://funshion-hadoop193:8088/proxy/application_1426913373071_0013/
Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1426913373071_0013
Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1
2015-03-21 17:37:50,761 Stage-1 map = 0%,  reduce = 0%
2015-03-21 17:38:04,307 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 4.39 sec
2015-03-21 17:38:07,589 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 19.98 sec
2015-03-21 17:38:21,527 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 23.52 sec
MapReduce Total cumulative CPU time: 23 seconds 520 msec
Ended Job = job_1426913373071_0013
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 3  Reduce: 1   Cumulative CPU: 23.52 sec   HDFS Read: 229940119 HDFS Write: 8 SUCCESS
Total MapReduce CPU Time Spent: 23 seconds 520 msec
OK
1302875
Time taken: 53.738 seconds, Fetched: 1 row(s)
hive> 


-- 第一个查询,由于各引擎limit语句可能根据的顺序不同,所以结果不一样完全可以理解(不算错误)。
-- 第二个查询,返回的结果均是 1302875 ,结果完全一致,到此该步结束。


---------------------------------------------------------------------------------------------------
-- 3.4 测试Spark SQL访问HDFS的Json文件:
-- 参考:http://spark.apache.org/docs/latest/sql-programming-guide.html#running-sql-queries-programmatically
--       的“JSON Datasets”章节


[hadoop@funshion-hadoop193 spark]$ hdfs dfs -copyFromLocal /usr/local/spark/examples/src/main/resources/people.json hdfs://funshion-hadoop193:8020/user/hadoop/
[hadoop@funshion-hadoop193 spark]$ hdfs dfs -ls /user/hadoop
Found 5 items
drwx------   - hadoop supergroup          0 2015-03-22 08:00 /user/hadoop/.Trash
drwxr-xr-x   - hadoop supergroup          0 2015-03-21 15:05 /user/hadoop/.sparkStaging
-rw-r--r--   3 hadoop supergroup       3629 2015-03-21 16:28 /user/hadoop/README.md
drwxr-xr-x   - hadoop supergroup          0 2015-03-20 10:28 /user/hadoop/hive
-rw-r--r--   3 hadoop supergroup         73 2015-03-22 14:10 /user/hadoop/people.json


[hadoop@funshion-hadoop193 spark]$ pwd
/usr/local/spark


[hadoop@funshion-hadoop193 spark]$ hdfs dfs -cat hdfs://funshion-hadoop193:8020/user/hadoop/people.json
{"name":"Michael"}
{"name":"Andy", "age":30}
{"name":"Justin", "age":19}


-- 如上两步所示,我们成功将people.json文件上传到hadoop集群的/user/hadoop目录下,接下来对该HDFS文件操作:
[hadoop@funshion-hadoop193 spark]$ pwd
/usr/local/spark
[hadoop@funshion-hadoop193 spark]$ ./bin/spark-shell --master spark://funshion-hadoop193:7077
Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/03/22 14:13:08 INFO SecurityManager: Changing view acls to: hadoop
15/03/22 14:13:08 INFO SecurityManager: Changing modify acls to: hadoop
15/03/22 14:13:08 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/03/22 14:13:08 INFO HttpServer: Starting HTTP Server
15/03/22 14:13:08 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/22 14:13:08 INFO AbstractConnector: Started SocketConnector@0.0.0.0:39459
15/03/22 14:13:09 INFO Utils: Successfully started service 'HTTP class server' on port 39459.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.3.0
      /_/


Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_75)
Type in expressions to have them evaluated.
Type :help for more information.
15/03/22 14:13:23 INFO SparkContext: Running Spark version 1.3.0
15/03/22 14:13:23 WARN SparkConf: 
SPARK_CLASSPATH was detected (set to ':/usr/local/hadoop/lib/hadoop-lzo-0.4.20-SNAPSHOT.jar').
This is deprecated in Spark 1.0+.


Please instead use:
 - ./spark-submit with --driver-class-path to augment the driver classpath
 - spark.executor.extraClassPath to augment the executor classpath
        
15/03/22 14:13:23 WARN SparkConf: Setting 'spark.executor.extraClassPath' to ':/usr/local/hadoop/lib/hadoop-lzo-0.4.20-SNAPSHOT.jar' as a work-around.
15/03/22 14:13:23 WARN SparkConf: Setting 'spark.driver.extraClassPath' to ':/usr/local/hadoop/lib/hadoop-lzo-0.4.20-SNAPSHOT.jar' as a work-around.
15/03/22 14:13:23 INFO SecurityManager: Changing view acls to: hadoop
15/03/22 14:13:23 INFO SecurityManager: Changing modify acls to: hadoop
15/03/22 14:13:23 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/03/22 14:13:25 INFO Slf4jLogger: Slf4jLogger started
15/03/22 14:13:25 INFO Remoting: Starting remoting
15/03/22 14:13:25 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@funshion-hadoop193:56107]
15/03/22 14:13:25 INFO Utils: Successfully started service 'sparkDriver' on port 56107.
15/03/22 14:13:25 INFO SparkEnv: Registering MapOutputTracker
15/03/22 14:13:25 INFO SparkEnv: Registering BlockManagerMaster
15/03/22 14:13:25 INFO DiskBlockManager: Created local directory at /tmp/spark-be4233be-4ef0-4251-940b-08c620766731/blockmgr-ae7a8197-0325-4e20-84ec-11391e93fe05
15/03/22 14:13:25 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
15/03/22 14:13:26 INFO HttpFileServer: HTTP File server directory is /tmp/spark-a44278e3-078f-4587-a0bc-88c152936d7b/httpd-994d47ce-ca02-466d-a8fb-3196d45bcf49
15/03/22 14:13:26 INFO HttpServer: Starting HTTP Server
15/03/22 14:13:26 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/22 14:13:26 INFO AbstractConnector: Started SocketConnector@0.0.0.0:57374
15/03/22 14:13:26 INFO Utils: Successfully started service 'HTTP file server' on port 57374.
15/03/22 14:13:26 INFO SparkEnv: Registering OutputCommitCoordinator
15/03/22 14:13:27 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/22 14:13:27 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
15/03/22 14:13:27 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/03/22 14:13:27 INFO SparkUI: Started SparkUI at http://funshion-hadoop193:4040
15/03/22 14:13:28 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@funshion-hadoop193:7077/user/Master...
15/03/22 14:13:29 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20150322141329-0003
15/03/22 14:13:29 INFO AppClient$ClientActor: Executor added: app-20150322141329-0003/0 on worker-20150321171905-funshion-hadoop195-43185 (funshion-hadoop195:43185) with 2 cores
15/03/22 14:13:29 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150322141329-0003/0 on hostPort funshion-hadoop195:43185 with 2 cores, 512.0 MB RAM
15/03/22 14:13:29 INFO AppClient$ClientActor: Executor added: app-20150322141329-0003/1 on worker-20150321171905-funshion-hadoop194-34245 (funshion-hadoop194:34245) with 2 cores
15/03/22 14:13:29 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150322141329-0003/1 on hostPort funshion-hadoop194:34245 with 2 cores, 512.0 MB RAM
15/03/22 14:13:29 INFO AppClient$ClientActor: Executor added: app-20150322141329-0003/2 on worker-20150321171905-funshion-hadoop196-48202 (funshion-hadoop196:48202) with 2 cores
15/03/22 14:13:29 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150322141329-0003/2 on hostPort funshion-hadoop196:48202 with 2 cores, 512.0 MB RAM
15/03/22 14:13:29 INFO AppClient$ClientActor: Executor updated: app-20150322141329-0003/0 is now LOADING
15/03/22 14:13:29 INFO AppClient$ClientActor: Executor updated: app-20150322141329-0003/2 is now LOADING
15/03/22 14:13:29 INFO AppClient$ClientActor: Executor updated: app-20150322141329-0003/1 is now LOADING
15/03/22 14:13:29 INFO AppClient$ClientActor: Executor updated: app-20150322141329-0003/0 is now RUNNING
15/03/22 14:13:29 INFO AppClient$ClientActor: Executor updated: app-20150322141329-0003/1 is now RUNNING
15/03/22 14:13:29 INFO AppClient$ClientActor: Executor updated: app-20150322141329-0003/2 is now RUNNING
15/03/22 14:13:29 INFO NettyBlockTransferService: Server created on 53560
15/03/22 14:13:29 INFO BlockManagerMaster: Trying to register BlockManager
15/03/22 14:13:29 INFO BlockManagerMasterActor: Registering block manager funshion-hadoop193:53560 with 265.4 MB RAM, BlockManagerId(<driver>, funshion-hadoop193, 53560)
15/03/22 14:13:29 INFO BlockManagerMaster: Registered BlockManager
15/03/22 14:13:32 INFO EventLoggingListener: Logging events to hdfs://funshion-hadoop193:8020/spark_log/app-20150322141329-0003
15/03/22 14:13:32 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
15/03/22 14:13:32 INFO SparkILoop: Created spark context..
Spark context available as sc.
15/03/22 14:13:34 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
15/03/22 14:13:36 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@funshion-hadoop195:33241/user/Executor#161045615] with ID 0
15/03/22 14:13:37 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@funshion-hadoop196:42295/user/Executor#-915975088] with ID 2
15/03/22 14:13:37 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@funshion-hadoop194:39398/user/Executor#495772963] with ID 1
15/03/22 14:13:37 INFO BlockManagerMasterActor: Registering block manager funshion-hadoop195:43851 with 265.4 MB RAM, BlockManagerId(0, funshion-hadoop195, 43851)
15/03/22 14:13:37 INFO BlockManagerMasterActor: Registering block manager funshion-hadoop196:60072 with 265.4 MB RAM, BlockManagerId(2, funshion-hadoop196, 60072)
15/03/22 14:13:37 INFO BlockManagerMasterActor: Registering block manager funshion-hadoop194:32982 with 265.4 MB RAM, BlockManagerId(1, funshion-hadoop194, 32982)


scala> sc
res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@3e45d316


scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@15b7d9b8


scala> val df = sqlContext.jsonFile("hdfs://funshion-hadoop193:8020/user/hadoop/people.json")
15/03/22 14:15:09 INFO MemoryStore: ensureFreeSpace(238253) called with curMem=0, maxMem=278302556
15/03/22 14:15:09 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 232.7 KB, free 265.2 MB)
15/03/22 14:15:09 INFO MemoryStore: ensureFreeSpace(33723) called with curMem=238253, maxMem=278302556
15/03/22 14:15:09 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 32.9 KB, free 265.2 MB)
15/03/22 14:15:09 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on funshion-hadoop193:53560 (size: 32.9 KB, free: 265.4 MB)
15/03/22 14:15:09 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
15/03/22 14:15:09 INFO SparkContext: Created broadcast 0 from textFile at JSONRelation.scala:98
15/03/22 14:15:10 INFO GPLNativeCodeLoader: Loaded native gpl library from the embedded binaries
15/03/22 14:15:10 INFO LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev e8c11c2be93b965abb548411379b203dabcbce79]
15/03/22 14:15:10 INFO FileInputFormat: Total input paths to process : 1
15/03/22 14:15:10 INFO SparkContext: Starting job: reduce at JsonRDD.scala:51
15/03/22 14:15:10 INFO DAGScheduler: Got job 0 (reduce at JsonRDD.scala:51) with 2 output partitions (allowLocal=false)
15/03/22 14:15:10 INFO DAGScheduler: Final stage: Stage 0(reduce at JsonRDD.scala:51)
15/03/22 14:15:10 INFO DAGScheduler: Parents of final stage: List()
15/03/22 14:15:10 INFO DAGScheduler: Missing parents: List()
15/03/22 14:15:10 INFO DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[3] at map at JsonRDD.scala:51), which has no missing parents
15/03/22 14:15:10 INFO MemoryStore: ensureFreeSpace(3216) called with curMem=271976, maxMem=278302556
15/03/22 14:15:10 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.1 KB, free 265.1 MB)
15/03/22 14:15:10 INFO MemoryStore: ensureFreeSpace(2285) called with curMem=275192, maxMem=278302556
15/03/22 14:15:10 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.2 KB, free 265.1 MB)
15/03/22 14:15:10 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on funshion-hadoop193:53560 (size: 2.2 KB, free: 265.4 MB)
15/03/22 14:15:10 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
15/03/22 14:15:10 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:839
15/03/22 14:15:10 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MapPartitionsRDD[3] at map at JsonRDD.scala:51)
15/03/22 14:15:10 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
15/03/22 14:15:10 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, funshion-hadoop195, NODE_LOCAL, 1318 bytes)
15/03/22 14:15:10 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, funshion-hadoop196, NODE_LOCAL, 1318 bytes)
15/03/22 14:15:11 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on funshion-hadoop196:60072 (size: 2.2 KB, free: 265.4 MB)
15/03/22 14:15:11 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on funshion-hadoop195:43851 (size: 2.2 KB, free: 265.4 MB)
15/03/22 14:15:12 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on funshion-hadoop196:60072 (size: 32.9 KB, free: 265.4 MB)
15/03/22 14:15:12 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on funshion-hadoop195:43851 (size: 32.9 KB, free: 265.4 MB)
15/03/22 14:15:16 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 5406 ms on funshion-hadoop196 (1/2)
15/03/22 14:15:16 INFO DAGScheduler: Stage 0 (reduce at JsonRDD.scala:51) finished in 5.749 s
15/03/22 14:15:16 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 5738 ms on funshion-hadoop195 (2/2)
15/03/22 14:15:16 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
15/03/22 14:15:16 INFO DAGScheduler: Job 0 finished: reduce at JsonRDD.scala:51, took 6.122394 s
df: org.apache.spark.sql.DataFrame = [age: bigint, name: string]


scala> df.show()
15/03/22 14:15:43 INFO MemoryStore: ensureFreeSpace(238325) called with curMem=277477, maxMem=278302556
15/03/22 14:15:43 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 232.7 KB, free 264.9 MB)
15/03/22 14:15:43 INFO MemoryStore: ensureFreeSpace(33723) called with curMem=515802, maxMem=278302556
15/03/22 14:15:43 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 32.9 KB, free 264.9 MB)
15/03/22 14:15:43 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on funshion-hadoop193:53560 (size: 32.9 KB, free: 265.3 MB)
15/03/22 14:15:43 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0
15/03/22 14:15:43 INFO SparkContext: Created broadcast 2 from textFile at JSONRelation.scala:98
15/03/22 14:15:43 INFO FileInputFormat: Total input paths to process : 1
15/03/22 14:15:43 INFO SparkContext: Starting job: runJob at SparkPlan.scala:121
15/03/22 14:15:43 INFO DAGScheduler: Got job 1 (runJob at SparkPlan.scala:121) with 1 output partitions (allowLocal=false)
15/03/22 14:15:43 INFO DAGScheduler: Final stage: Stage 1(runJob at SparkPlan.scala:121)
15/03/22 14:15:43 INFO DAGScheduler: Parents of final stage: List()
15/03/22 14:15:43 INFO DAGScheduler: Missing parents: List()
15/03/22 14:15:43 INFO DAGScheduler: Submitting Stage 1 (MapPartitionsRDD[8] at map at SparkPlan.scala:96), which has no missing parents
15/03/22 14:15:43 INFO MemoryStore: ensureFreeSpace(4064) called with curMem=549525, maxMem=278302556
15/03/22 14:15:43 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 4.0 KB, free 264.9 MB)
15/03/22 14:15:43 INFO MemoryStore: ensureFreeSpace(2796) called with curMem=553589, maxMem=278302556
15/03/22 14:15:43 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 2.7 KB, free 264.9 MB)
15/03/22 14:15:43 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on funshion-hadoop193:53560 (size: 2.7 KB, free: 265.3 MB)
15/03/22 14:15:43 INFO BlockManagerMaster: Updated info of block broadcast_3_piece0
15/03/22 14:15:43 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:839
15/03/22 14:15:43 INFO DAGScheduler: Submitting 1 missing tasks from Stage 1 (MapPartitionsRDD[8] at map at SparkPlan.scala:96)
15/03/22 14:15:43 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
15/03/22 14:15:43 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, funshion-hadoop194, NODE_LOCAL, 1318 bytes)
15/03/22 14:15:44 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on funshion-hadoop194:32982 (size: 2.7 KB, free: 265.4 MB)
15/03/22 14:15:46 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on funshion-hadoop194:32982 (size: 32.9 KB, free: 265.4 MB)
15/03/22 14:15:49 INFO DAGScheduler: Stage 1 (runJob at SparkPlan.scala:121) finished in 5.280 s
15/03/22 14:15:49 INFO DAGScheduler: Job 1 finished: runJob at SparkPlan.scala:121, took 5.329337 s
15/03/22 14:15:49 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 5277 ms on funshion-hadoop194 (1/1)
15/03/22 14:15:49 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
15/03/22 14:15:49 INFO SparkContext: Starting job: runJob at SparkPlan.scala:121
15/03/22 14:15:49 INFO DAGScheduler: Got job 2 (runJob at SparkPlan.scala:121) with 1 output partitions (allowLocal=false)
15/03/22 14:15:49 INFO DAGScheduler: Final stage: Stage 2(runJob at SparkPlan.scala:121)
15/03/22 14:15:49 INFO DAGScheduler: Parents of final stage: List()
15/03/22 14:15:49 INFO DAGScheduler: Missing parents: List()
15/03/22 14:15:49 INFO DAGScheduler: Submitting Stage 2 (MapPartitionsRDD[8] at map at SparkPlan.scala:96), which has no missing parents
15/03/22 14:15:49 INFO MemoryStore: ensureFreeSpace(4064) called with curMem=556385, maxMem=278302556
15/03/22 14:15:49 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 4.0 KB, free 264.9 MB)
15/03/22 14:15:49 INFO MemoryStore: ensureFreeSpace(2796) called with curMem=560449, maxMem=278302556
15/03/22 14:15:49 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 2.7 KB, free 264.9 MB)
15/03/22 14:15:49 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on funshion-hadoop193:53560 (size: 2.7 KB, free: 265.3 MB)
15/03/22 14:15:49 INFO BlockManagerMaster: Updated info of block broadcast_4_piece0
15/03/22 14:15:49 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:839
15/03/22 14:15:49 INFO DAGScheduler: Submitting 1 missing tasks from Stage 2 (MapPartitionsRDD[8] at map at SparkPlan.scala:96)
15/03/22 14:15:49 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
15/03/22 14:15:49 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 3, funshion-hadoop194, NODE_LOCAL, 1318 bytes)
15/03/22 14:15:49 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on funshion-hadoop194:32982 (size: 2.7 KB, free: 265.4 MB)
15/03/22 14:15:49 INFO DAGScheduler: Stage 2 (runJob at SparkPlan.scala:121) finished in 0.166 s
15/03/22 14:15:49 INFO DAGScheduler: Job 2 finished: runJob at SparkPlan.scala:121, took 0.204439 s
15/03/22 14:15:49 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 3) in 168 ms on funshion-hadoop194 (1/1)
15/03/22 14:15:49 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 
age  name   
null Michael
30   Andy   
19   Justin 


scala> df.printSchema()
root
 |-- age: long (nullable = true)
 |-- name: string (nullable = true)




scala> df.select("name").show()
15/03/22 14:22:10 INFO MemoryStore: ensureFreeSpace(238325) called with curMem=1126281, maxMem=278302556
15/03/22 14:22:10 INFO MemoryStore: Block broadcast_10 stored as values in memory (estimated size 232.7 KB, free 264.1 MB)
15/03/22 14:22:10 INFO MemoryStore: ensureFreeSpace(33723) called with curMem=1364606, maxMem=278302556
15/03/22 14:22:10 INFO MemoryStore: Block broadcast_10_piece0 stored as bytes in memory (estimated size 32.9 KB, free 264.1 MB)
15/03/22 14:22:10 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on funshion-hadoop193:53560 (size: 32.9 KB, free: 265.2 MB)
15/03/22 14:22:10 INFO BlockManagerMaster: Updated info of block broadcast_10_piece0
15/03/22 14:22:10 INFO SparkContext: Created broadcast 10 from textFile at JSONRelation.scala:98
15/03/22 14:22:10 INFO FileInputFormat: Total input paths to process : 1
15/03/22 14:22:10 INFO SparkContext: Starting job: runJob at SparkPlan.scala:121
15/03/22 14:22:10 INFO DAGScheduler: Got job 6 (runJob at SparkPlan.scala:121) with 1 output partitions (allowLocal=false)
15/03/22 14:22:10 INFO DAGScheduler: Final stage: Stage 6(runJob at SparkPlan.scala:121)
15/03/22 14:22:10 INFO DAGScheduler: Parents of final stage: List()
15/03/22 14:22:10 INFO DAGScheduler: Missing parents: List()
15/03/22 14:22:10 INFO DAGScheduler: Submitting Stage 6 (MapPartitionsRDD[23] at map at SparkPlan.scala:96), which has no missing parents
15/03/22 14:22:10 INFO MemoryStore: ensureFreeSpace(5064) called with curMem=1398329, maxMem=278302556
15/03/22 14:22:10 INFO MemoryStore: Block broadcast_11 stored as values in memory (estimated size 4.9 KB, free 264.1 MB)
15/03/22 14:22:10 INFO MemoryStore: ensureFreeSpace(3457) called with curMem=1403393, maxMem=278302556
15/03/22 14:22:10 INFO MemoryStore: Block broadcast_11_piece0 stored as bytes in memory (estimated size 3.4 KB, free 264.1 MB)
15/03/22 14:22:10 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory on funshion-hadoop193:53560 (size: 3.4 KB, free: 265.2 MB)
15/03/22 14:22:10 INFO BlockManagerMaster: Updated info of block broadcast_11_piece0
15/03/22 14:22:10 INFO SparkContext: Created broadcast 11 from broadcast at DAGScheduler.scala:839
15/03/22 14:22:10 INFO DAGScheduler: Submitting 1 missing tasks from Stage 6 (MapPartitionsRDD[23] at map at SparkPlan.scala:96)
15/03/22 14:22:10 INFO TaskSchedulerImpl: Adding task set 6.0 with 1 tasks
15/03/22 14:22:10 INFO TaskSetManager: Starting task 0.0 in stage 6.0 (TID 8, funshion-hadoop194, NODE_LOCAL, 1318 bytes)
15/03/22 14:22:10 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory on funshion-hadoop194:32982 (size: 3.4 KB, free: 265.3 MB)
15/03/22 14:22:10 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on funshion-hadoop194:32982 (size: 32.9 KB, free: 265.3 MB)
15/03/22 14:22:10 INFO DAGScheduler: Stage 6 (runJob at SparkPlan.scala:121) finished in 0.457 s
15/03/22 14:22:10 INFO DAGScheduler: Job 6 finished: runJob at SparkPlan.scala:121, took 0.495140 s
15/03/22 14:22:10 INFO TaskSetManager: Finished task 0.0 in stage 6.0 (TID 8) in 454 ms on funshion-hadoop194 (1/1)
15/03/22 14:22:10 INFO TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have all completed, from pool 
15/03/22 14:22:10 INFO SparkContext: Starting job: runJob at SparkPlan.scala:121
15/03/22 14:22:10 INFO DAGScheduler: Got job 7 (runJob at SparkPlan.scala:121) with 1 output partitions (allowLocal=false)
15/03/22 14:22:10 INFO DAGScheduler: Final stage: Stage 7(runJob at SparkPlan.scala:121)
15/03/22 14:22:10 INFO DAGScheduler: Parents of final stage: List()
15/03/22 14:22:10 INFO DAGScheduler: Missing parents: List()
15/03/22 14:22:10 INFO DAGScheduler: Submitting Stage 7 (MapPartitionsRDD[23] at map at SparkPlan.scala:96), which has no missing parents
15/03/22 14:22:10 INFO MemoryStore: ensureFreeSpace(5064) called with curMem=1406850, maxMem=278302556
15/03/22 14:22:10 INFO MemoryStore: Block broadcast_12 stored as values in memory (estimated size 4.9 KB, free 264.1 MB)
15/03/22 14:22:10 INFO MemoryStore: ensureFreeSpace(3457) called with curMem=1411914, maxMem=278302556
15/03/22 14:22:10 INFO MemoryStore: Block broadcast_12_piece0 stored as bytes in memory (estimated size 3.4 KB, free 264.1 MB)
15/03/22 14:22:10 INFO BlockManagerInfo: Added broadcast_12_piece0 in memory on funshion-hadoop193:53560 (size: 3.4 KB, free: 265.2 MB)
15/03/22 14:22:10 INFO BlockManagerMaster: Updated info of block broadcast_12_piece0
15/03/22 14:22:10 INFO SparkContext: Created broadcast 12 from broadcast at DAGScheduler.scala:839
15/03/22 14:22:10 INFO DAGScheduler: Submitting 1 missing tasks from Stage 7 (MapPartitionsRDD[23] at map at SparkPlan.scala:96)
15/03/22 14:22:10 INFO TaskSchedulerImpl: Adding task set 7.0 with 1 tasks
15/03/22 14:22:10 INFO TaskSetManager: Starting task 0.0 in stage 7.0 (TID 9, funshion-hadoop195, NODE_LOCAL, 1318 bytes)
15/03/22 14:22:10 INFO BlockManagerInfo: Added broadcast_12_piece0 in memory on funshion-hadoop195:43851 (size: 3.4 KB, free: 265.3 MB)
15/03/22 14:22:11 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on funshion-hadoop195:43851 (size: 32.9 KB, free: 265.3 MB)
15/03/22 14:22:11 INFO DAGScheduler: Stage 7 (runJob at SparkPlan.scala:121) finished in 0.419 s
15/03/22 14:22:11 INFO DAGScheduler: Job 7 finished: runJob at SparkPlan.scala:121, took 0.473975 s
15/03/22 14:22:11 INFO TaskSetManager: Finished task 0.0 in stage 7.0 (TID 9) in 423 ms on funshion-hadoop195 (1/1)
15/03/22 14:22:11 INFO TaskSchedulerImpl: Removed TaskSet 7.0, whose tasks have all completed, from pool 
name   
Michael
Andy   
Justin 




-- 注意:由于当前的操作系统用户是hadoop用户,HDFS目录也有“根目录”的概念(在HDFS里,hadoop用户的根目录就是/user/hadoop),
--       所以我们也可以直接这样:
val df = sqlContext.jsonFile("people.json")
df.show()




---------------------------------------------------------------------------------------------------
-- 3.5 进一步测试Spark SQL访问HDFS的Json文件
-- 参考:http://spark.apache.org/docs/latest/sql-programming-guide.html#running-sql-queries-programmatically
--       的“JSON Datasets”章节


val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val path = "/user/hadoop/people.json"
val people = sqlContext.jsonFile(path)
people.printSchema()
people.registerTempTable("people")
val teenagers = sqlContext.sql("SELECT name, age+1 as agePlusOne FROM people WHERE age >= 13 AND age <= 19")
teenagers.show()


people.filter(people("age") > 21).show()


// Alternatively, a DataFrame can be created for a JSON dataset represented by
// an RDD[String] storing one JSON object per string.
val anotherPeopleRDD = sc.parallelize(
  """{"name":"Yin","address":{"city":"Columbus","state":"Ohio"}}""" :: Nil)
val anotherPeople = sqlContext.jsonRDD(anotherPeopleRDD)
anotherPeople.show()


----------------------
[hadoop@funshion-hadoop193 spark]$ hdfs dfs -copyFromLocal examples/src/main/resources/people.txt hdfs://funshion-hadoop193:8020/user/hadoop/
[hadoop@funshion-hadoop193 spark]$ hdfs dfs -cat /user/hadoop/people.txt
Michael, 29
Andy, 30
Justin, 19


[hadoop@funshion-hadoop193 spark]$ pwd
/usr/local/spark
[hadoop@funshion-hadoop193 spark]$ ./bin/spark-shell --master spark://funshion-hadoop193:7077


val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
case class Person(name: String, age: Int)
val people = sc.textFile("people.txt").map(_.split(",")).map(p => Person(p(0), p(1).trim.toInt)).toDF()
people.registerTempTable("people")
val teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19")
teenagers.map(t => "Name: " + t(0)).collect().foreach(println)




-- 注意:上面第四行中的文件 people.txt 是HDFS的当前“HOME”目录(/user/hadoop)下的文件。
-- 第四行可以替换为:
val people = sc.textFile("/user/hadoop/people.txt").map(_.split(",")).map(p => Person(p(0), p(1).trim.toInt)).toDF()
-- 也可以替换为:
val people = sc.textFile("hdfs://funshion-hadoop193:8020/user/hadoop/people.txt").map(_.split(",")).map(p => Person(p(0), p(1).trim.toInt)).toDF()


val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
case class Person(name: String, age: Int)
val people = sc.textFile("/user/hadoop/people.txt").map(_.split(",")).map(p => Person(p(0), p(1).trim.toInt)).toDF()
people.registerTempTable("people")
val teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19")
teenagers.map(t => "Name: " + t(0)).collect().foreach(println)


val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
case class Person(name: String, age: Int)
val people = sc.textFile("hdfs://funshion-hadoop193:8020/user/hadoop/people.txt").map(_.split(",")).map(p => Person(p(0), p(1).trim.toInt)).toDF()
people.registerTempTable("people")
val teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19")
teenagers.map(t => "Name: " + t(0)).collect().foreach(println)




---------------------------------------------------------------------------------------------------
-- ############################################################################################# --


-- 4 测试Spark SQL访问关系数据库:
-- 参考:http://spark.apache.org/docs/latest/sql-programming-guide.html#running-sql-queries-programmatically
--       的“JDBC To Other Databases”章节


---------------------------------------------------------------------------------------------------
-- 4.1 访问MySQL数据库:


[hadoop@funshion-hadoop193 lib]$ pwd
/usr/local/spark/lib
[hadoop@funshion-hadoop193 lib]$ cd ..
[hadoop@funshion-hadoop193 spark]$ SPARK_CLASSPATH=/usr/local/spark/lib/mysql-connector-java-5.1.17-bin.jar ./bin/spark-shell --master spark://funshion-hadoop193:7077
Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/03/22 18:14:04 INFO SecurityManager: Changing view acls to: hadoop
15/03/22 18:14:04 INFO SecurityManager: Changing modify acls to: hadoop
15/03/22 18:14:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/03/22 18:14:04 INFO HttpServer: Starting HTTP Server
15/03/22 18:14:05 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/22 18:14:05 INFO AbstractConnector: Started SocketConnector@0.0.0.0:46026
15/03/22 18:14:05 INFO Utils: Successfully started service 'HTTP class server' on port 46026.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.3.0
      /_/


Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_75)
Type in expressions to have them evaluated.
Type :help for more information.
15/03/22 18:14:19 INFO SparkContext: Running Spark version 1.3.0
15/03/22 18:14:19 WARN SparkConf: 
SPARK_CLASSPATH was detected (set to '/usr/local/spark/lib/mysql-connector-java-5.1.17-bin.jar:/usr/local/hadoop/lib/hadoop-lzo-0.4.20-SNAPSHOT.jar').
This is deprecated in Spark 1.0+.


Please instead use:
 - ./spark-submit with --driver-class-path to augment the driver classpath
 - spark.executor.extraClassPath to augment the executor classpath
        
15/03/22 18:14:19 WARN SparkConf: Setting 'spark.executor.extraClassPath' to '/usr/local/spark/lib/mysql-connector-java-5.1.17-bin.jar:/usr/local/hadoop/lib/hadoop-lzo-0.4.20-SNAPSHOT.jar' as a work-around.
15/03/22 18:14:19 WARN SparkConf: Setting 'spark.driver.extraClassPath' to '/usr/local/spark/lib/mysql-connector-java-5.1.17-bin.jar:/usr/local/hadoop/lib/hadoop-lzo-0.4.20-SNAPSHOT.jar' as a work-around.
15/03/22 18:14:19 INFO SecurityManager: Changing view acls to: hadoop
15/03/22 18:14:19 INFO SecurityManager: Changing modify acls to: hadoop
15/03/22 18:14:19 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/03/22 18:14:20 INFO Slf4jLogger: Slf4jLogger started
15/03/22 18:14:21 INFO Remoting: Starting remoting
15/03/22 18:14:21 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@funshion-hadoop193:57998]
15/03/22 18:14:21 INFO Utils: Successfully started service 'sparkDriver' on port 57998.
15/03/22 18:14:21 INFO SparkEnv: Registering MapOutputTracker
15/03/22 18:14:21 INFO SparkEnv: Registering BlockManagerMaster
15/03/22 18:14:21 INFO DiskBlockManager: Created local directory at /tmp/spark-e447ed57-292d-4f55-ab79-4e848c1c0622/blockmgr-be884f27-0aff-4a1f-80b8-e55deb2bcbf7
15/03/22 18:14:21 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
15/03/22 18:14:22 INFO HttpFileServer: HTTP File server directory is /tmp/spark-daaa5cde-f622-4da8-b17f-9990c96eb4d8/httpd-8825b3d7-5e1c-4786-9a54-ea6bbcf21f7e
15/03/22 18:14:22 INFO HttpServer: Starting HTTP Server
15/03/22 18:14:22 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/22 18:14:22 INFO AbstractConnector: Started SocketConnector@0.0.0.0:55062
15/03/22 18:14:22 INFO Utils: Successfully started service 'HTTP file server' on port 55062.
15/03/22 18:14:22 INFO SparkEnv: Registering OutputCommitCoordinator
15/03/22 18:14:22 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/22 18:14:22 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
15/03/22 18:14:22 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/03/22 18:14:22 INFO SparkUI: Started SparkUI at http://funshion-hadoop193:4040
15/03/22 18:14:23 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@funshion-hadoop193:7077/user/Master...
15/03/22 18:14:24 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20150322181424-0016
15/03/22 18:14:24 INFO AppClient$ClientActor: Executor added: app-20150322181424-0016/0 on worker-20150321171905-funshion-hadoop195-43185 (funshion-hadoop195:43185) with 2 cores
15/03/22 18:14:24 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150322181424-0016/0 on hostPort funshion-hadoop195:43185 with 2 cores, 512.0 MB RAM
15/03/22 18:14:24 INFO AppClient$ClientActor: Executor added: app-20150322181424-0016/1 on worker-20150321171905-funshion-hadoop194-34245 (funshion-hadoop194:34245) with 2 cores
15/03/22 18:14:24 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150322181424-0016/1 on hostPort funshion-hadoop194:34245 with 2 cores, 512.0 MB RAM
15/03/22 18:14:24 INFO AppClient$ClientActor: Executor added: app-20150322181424-0016/2 on worker-20150321171905-funshion-hadoop196-48202 (funshion-hadoop196:48202) with 2 cores
15/03/22 18:14:24 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150322181424-0016/2 on hostPort funshion-hadoop196:48202 with 2 cores, 512.0 MB RAM
15/03/22 18:14:25 INFO AppClient$ClientActor: Executor updated: app-20150322181424-0016/0 is now LOADING
15/03/22 18:14:25 INFO AppClient$ClientActor: Executor updated: app-20150322181424-0016/2 is now LOADING
15/03/22 18:14:25 INFO AppClient$ClientActor: Executor updated: app-20150322181424-0016/1 is now LOADING
15/03/22 18:14:25 INFO AppClient$ClientActor: Executor updated: app-20150322181424-0016/0 is now RUNNING
15/03/22 18:14:25 INFO AppClient$ClientActor: Executor updated: app-20150322181424-0016/1 is now RUNNING
15/03/22 18:14:25 INFO AppClient$ClientActor: Executor updated: app-20150322181424-0016/2 is now RUNNING
15/03/22 18:14:25 INFO NettyBlockTransferService: Server created on 37710
15/03/22 18:14:25 INFO BlockManagerMaster: Trying to register BlockManager
15/03/22 18:14:25 INFO BlockManagerMasterActor: Registering block manager funshion-hadoop193:37710 with 265.4 MB RAM, BlockManagerId(<driver>, funshion-hadoop193, 37710)
15/03/22 18:14:25 INFO BlockManagerMaster: Registered BlockManager
15/03/22 18:14:28 INFO EventLoggingListener: Logging events to hdfs://funshion-hadoop193:8020/spark_log/app-20150322181424-0016
15/03/22 18:14:28 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
15/03/22 18:14:29 INFO SparkILoop: Created spark context..
Spark context available as sc.
15/03/22 18:14:31 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
15/03/22 18:14:32 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@funshion-hadoop194:50050/user/Executor#1949469311] with ID 1
15/03/22 18:14:32 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@funshion-hadoop195:41120/user/Executor#1115933355] with ID 0
15/03/22 18:14:32 INFO BlockManagerMasterActor: Registering block manager funshion-hadoop194:42298 with 265.4 MB RAM, BlockManagerId(1, funshion-hadoop194, 42298)
15/03/22 18:14:32 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@funshion-hadoop196:57795/user/Executor#-985756403] with ID 2
15/03/22 18:14:32 INFO BlockManagerMasterActor: Registering block manager funshion-hadoop195:40586 with 265.4 MB RAM, BlockManagerId(0, funshion-hadoop195, 40586)
15/03/22 18:14:33 INFO BlockManagerMasterActor: Registering block manager funshion-hadoop196:35167 with 265.4 MB RAM, BlockManagerId(2, funshion-hadoop196, 35167)


scala> val jdbcDF = sqlContext.load("jdbc", Map(
     |   "url" -> "jdbc:mysql://192.168.117.193:3306/hive?user=hive&password=bee56915",
     |   "dbtable" -> "hive.TBLS",
     |   "driver" -> "com.mysql.jdbc.Driver"))
15/03/22 18:25:13 INFO metastore: Trying to connect to metastore with URI thrift://funshion-hadoop192:10000
15/03/22 18:25:14 INFO metastore: Connected to metastore.
15/03/22 18:25:14 INFO SessionState: No Tez session required at this point. hive.execution.engine=mr.
15/03/22 18:25:14 INFO SessionState: No Tez session required at this point. hive.execution.engine=mr.
jdbcDF: org.apache.spark.sql.DataFrame = [TBL_ID: bigint, CREATE_TIME: int, DB_ID: bigint, LAST_ACCESS_TIME: int, OWNER: string, RETENTION: int, SD_ID: bigint, TBL_NAME: string, TBL_TYPE: string, VIEW_EXPANDED_TEXT: string, VIEW_ORIGINAL_TEXT: string, LINK_TARGET_ID: bigint]


scala> jdbcDF.show()
15/03/22 18:25:25 INFO SparkContext: Starting job: runJob at SparkPlan.scala:121
15/03/22 18:25:25 INFO DAGScheduler: Got job 0 (runJob at SparkPlan.scala:121) with 1 output partitions (allowLocal=false)
15/03/22 18:25:25 INFO DAGScheduler: Final stage: Stage 0(runJob at SparkPlan.scala:121)
15/03/22 18:25:25 INFO DAGScheduler: Parents of final stage: List()
15/03/22 18:25:25 INFO DAGScheduler: Missing parents: List()
15/03/22 18:25:25 INFO DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[1] at map at SparkPlan.scala:96), which has no missing parents
15/03/22 18:25:26 INFO MemoryStore: ensureFreeSpace(4632) called with curMem=0, maxMem=278302556
15/03/22 18:25:26 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 4.5 KB, free 265.4 MB)
15/03/22 18:25:26 INFO MemoryStore: ensureFreeSpace(2909) called with curMem=4632, maxMem=278302556
15/03/22 18:25:26 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.8 KB, free 265.4 MB)
15/03/22 18:25:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on funshion-hadoop193:37710 (size: 2.8 KB, free: 265.4 MB)
15/03/22 18:25:26 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
15/03/22 18:25:26 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:839
15/03/22 18:25:26 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (MapPartitionsRDD[1] at map at SparkPlan.scala:96)
15/03/22 18:25:26 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
15/03/22 18:25:26 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, funshion-hadoop194, PROCESS_LOCAL, 1062 bytes)
15/03/22 18:25:27 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on funshion-hadoop194:42298 (size: 2.8 KB, free: 265.4 MB)
15/03/22 18:25:30 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 3863 ms on funshion-hadoop194 (1/1)
15/03/22 18:25:30 INFO DAGScheduler: Stage 0 (runJob at SparkPlan.scala:121) finished in 3.889 s
15/03/22 18:25:30 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
15/03/22 18:25:30 INFO DAGScheduler: Job 0 finished: runJob at SparkPlan.scala:121, took 5.087737 s
TBL_ID CREATE_TIME DB_ID LAST_ACCESS_TIME OWNER  RETENTION SD_ID TBL_NAME TBL_TYPE       VIEW_EXPANDED_TEXT VIEW_ORIGINAL_TEXT LINK_TARGET_ID
1      1426485587  2     0                hadoop 0         1     pv2      EXTERNAL_TABLE null               null               null    


-- 核对上面的查询,与直接查询Mysql数据库是否一致:
[hadoop@funshion-hadoop193 conf]$ mysql -uhive -pbee56915 -dhive
Warning: Using a password on the command line interface can be insecure.
mysql: unknown option '-d'
[hadoop@funshion-hadoop193 conf]$ mysql -uhive -pbee56915 --database=hive
Warning: Using a password on the command line interface can be insecure.
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A


Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 263
Server version: 5.6.17 MySQL Community Server (GPL)


Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.


Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.


Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.


mysql> select * from TBLS;
+--------+-------------+-------+------------------+--------+-----------+-------+----------+----------------+--------------------+--------------------+----------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER  | RETENTION | SD_ID | TBL_NAME | TBL_TYPE       | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | LINK_TARGET_ID |
+--------+-------------+-------+------------------+--------+-----------+-------+----------+----------------+--------------------+--------------------+----------------+
|      1 |  1426485587 |     2 |                0 | hadoop |         0 |     1 | pv2      | EXTERNAL_TABLE | NULL               | NULL               |           NULL |
+--------+-------------+-------+------------------+--------+-----------+-------+----------+----------------+--------------------+--------------------+----------------+
1 row in set (0.01 sec)


---------------------------------------------------------------------------------------------------


-- 访问Oracle、SQL Server数据库应该都差不多,留待各位去测试吧!


-- 谢谢!









package gs8 import org.apache.log4j.{Level, Logger} import org.apache.spark.sql.{SparkSession, DataFrame} import org.apache.spark.ml.feature.VectorAssembler import org.apache.spark.ml.Pipeline import org.apache.spark.ml.classification.RandomForestClassifier import org.apache.spark.sql.functions.col import java.util.Properties object shujuwaqu2 { def main(args: Array[String]): Unit = { val spark = SparkSession.builder() .appName("RandomForestModel") .master("local[*]") .enableHiveSupport() .getOrCreate() Logger.getLogger("org").setLevel(Level.ERROR) import spark.implicits._ val trainDF = spark.sql("SELECT * FROM dwd.fact_machine_learning_data WHERE machine_record_state IS NOT NULL") .withColumn("machine_record_id", $"machine_record_id".cast("LongType")) val featureCols = Array( "machine_record_mainshaft_speed", "machine_record_mainshaft_multiplerate", "machine_record_mainshaft_load", "machine_record_feed_speed", "machine_record_feed_multiplerate", "machine_record_pmc_code", "machine_record_circle_time", "machine_record_run_time", "machine_record_effective_shaft", "machine_record_amount_process", "machine_record_use_memory", "machine_record_free_memory", "machine_record_amount_use_code", "machine_record_amount_free_code" ) val assembler = new VectorAssembler() .setInputCols(featureCols) .setOutputCol("features") val trainData = assembler.transform(trainDF) .select($"machine_record_state".alias("label"), $"features") val rf = new RandomForestClassifier() .setLabelCol("label") .setFeaturesCol("features") .setNumTrees(20) .setMaxDepth(8) val pipeline = new Pipeline().setStages(Array(rf)) val model = pipeline.fit(trainData) println("随机森林模型训练完成!") val testDF = spark.sql("SELECT * FROM dwd.fact_machine_learning_data_test") .withColumn("machine_record_id", $"machine_record_id".cast("LongType")) val testData = assembler.transform(testDF) .select($"machine_record_id", $"features") val predictions = model.transform(testData) .select($"machine_record_id", $"prediction".alias("machine_record_state")) println("预测完成,前5条结果:") predictions.show(5, truncate = false) val url = "jdbc:mysql://xueai:3306/shtd_industry?useSSL=false&serverTimezone=UTC" val props = new Properties() props.put("user", "root") props.put("password", "admin") props.put("driver", "com.mysql.jdbc.Driver") predictions.write .mode("overwrite") .jdbc(url, "ml_result", props) println("预测结果已成功写入 MySQL 表 ml_result!") println("Hive 中 dwd.fact_machine_learning_data_test 预测完毕。") println("请在 MySQL 中执行以下查询语句查看结果:") println( """ |SELECT * FROM ml_result |WHERE machine_record_id IN (1,8,20,28,36); |""".stripMargin) spark.stop() } } 报错/usr/local/jdk1.8.0_341/bin/java -javaagent:/opt/idea-IC-223.8836.41/lib/idea_rt.jar=37031:/opt/idea-IC-223.8836.41/bin -Dfile.encoding=UTF-8 -classpath /usr/local/jdk1.8.0_341/jre/lib/charsets.jar:/usr/local/jdk1.8.0_341/jre/lib/deploy.jar:/usr/local/jdk1.8.0_341/jre/lib/ext/cldrdata.jar:/usr/local/jdk1.8.0_341/jre/lib/ext/dnsns.jar:/usr/local/jdk1.8.0_341/jre/lib/ext/jaccess.jar:/usr/local/jdk1.8.0_341/jre/lib/ext/jfxrt.jar:/usr/local/jdk1.8.0_341/jre/lib/ext/localedata.jar:/usr/local/jdk1.8.0_341/jre/lib/ext/nashorn.jar:/usr/local/jdk1.8.0_341/jre/lib/ext/sunec.jar:/usr/local/jdk1.8.0_341/jre/lib/ext/sunjce_provider.jar:/usr/local/jdk1.8.0_341/jre/lib/ext/sunpkcs11.jar:/usr/local/jdk1.8.0_341/jre/lib/ext/zipfs.jar:/usr/local/jdk1.8.0_341/jre/lib/javaws.jar:/usr/local/jdk1.8.0_341/jre/lib/jce.jar:/usr/local/jdk1.8.0_341/jre/lib/jfr.jar:/usr/local/jdk1.8.0_341/jre/lib/jfxswt.jar:/usr/local/jdk1.8.0_341/jre/lib/jsse.jar:/usr/local/jdk1.8.0_341/jre/lib/management-agent.jar:/usr/local/jdk1.8.0_341/jre/lib/plugin.jar:/usr/local/jdk1.8.0_341/jre/lib/resources.jar:/usr/local/jdk1.8.0_341/jre/lib/rt.jar:/root/IdeaProjects/demo20250411/target/classes:/usr/local/src/repo/com/fasterxml/jackson/core/jackson-databind/2.10.4/jackson-databind-2.10.4.jar:/usr/local/src/repo/org/dom4j/dom4j/2.1.4/dom4j-2.1.4.jar:/usr/local/src/repo/com/fasterxml/jackson/core/jackson-core/2.10.4/jackson-core-2.10.4.jar:/usr/local/src/repo/com/fasterxml/jackson/core/jackson-annotations/2.10.4/jackson-annotations-2.10.4.jar:/usr/local/src/repo/com/fasterxml/jackson/jaxrs/jackson-jaxrs-json-provider/2.10.4/jackson-jaxrs-json-provider-2.10.4.jar:/usr/local/src/repo/com/fasterxml/jackson/jaxrs/jackson-jaxrs-base/2.10.4/jackson-jaxrs-base-2.10.4.jar:/usr/local/src/repo/com/fasterxml/jackson/module/jackson-module-jaxb-annotations/2.10.4/jackson-module-jaxb-annotations-2.10.4.jar:/usr/local/src/repo/jakarta/xml/bind/jakarta.xml.bind-api/2.3.2/jakarta.xml.bind-api-2.3.2.jar:/usr/local/src/repo/jakarta/activation/jakarta.activation-api/1.2.1/jakarta.activation-api-1.2.1.jar:/usr/local/src/repo/com/fasterxml/jackson/dataformat/jackson-dataformat-xml/2.10.4/jackson-dataformat-xml-2.10.4.jar:/usr/local/src/repo/org/codehaus/woodstox/stax2-api/4.2/stax2-api-4.2.jar:/usr/local/src/repo/com/fasterxml/woodstox/woodstox-core/6.2.0/woodstox-core-6.2.0.jar:/usr/local/src/repo/org/scala-lang/scala-reflect/2.12.10/scala-reflect-2.12.10.jar:/usr/local/src/repo/org/scala-lang/scala-compiler/2.12.10/scala-compiler-2.12.10.jar:/usr/local/src/repo/org/scala-lang/modules/scala-xml_2.12/1.0.6/scala-xml_2.12-1.0.6.jar:/usr/local/src/repo/org/scala-lang/scala-library/2.12.10/scala-library-2.12.10.jar:/usr/local/src/repo/org/apache/kafka/kafka_2.12/2.4.1/kafka_2.12-2.4.1.jar:/usr/local/src/repo/com/fasterxml/jackson/module/jackson-module-scala_2.12/2.10.0/jackson-module-scala_2.12-2.10.0.jar:/usr/local/src/repo/com/fasterxml/jackson/module/jackson-module-paranamer/2.10.0/jackson-module-paranamer-2.10.0.jar:/usr/local/src/repo/com/fasterxml/jackson/dataformat/jackson-dataformat-csv/2.10.0/jackson-dataformat-csv-2.10.0.jar:/usr/local/src/repo/com/fasterxml/jackson/datatype/jackson-datatype-jdk8/2.10.0/jackson-datatype-jdk8-2.10.0.jar:/usr/local/src/repo/net/sf/jopt-simple/jopt-simple/5.0.4/jopt-simple-5.0.4.jar:/usr/local/src/repo/com/yammer/metrics/metrics-core/2.2.0/metrics-core-2.2.0.jar:/usr/local/src/repo/org/scala-lang/modules/scala-collection-compat_2.12/2.1.2/scala-collection-compat_2.12-2.1.2.jar:/usr/local/src/repo/org/scala-lang/modules/scala-java8-compat_2.12/0.9.0/scala-java8-compat_2.12-0.9.0.jar:/usr/local/src/repo/com/typesafe/scala-logging/scala-logging_2.12/3.9.2/scala-logging_2.12-3.9.2.jar:/usr/local/src/repo/org/slf4j/slf4j-api/1.7.28/slf4j-api-1.7.28.jar:/usr/local/src/repo/org/apache/zookeeper/zookeeper/3.5.7/zookeeper-3.5.7.jar:/usr/local/src/repo/org/apache/zookeeper/zookeeper-jute/3.5.7/zookeeper-jute-3.5.7.jar:/usr/local/src/repo/io/netty/netty-handler/4.1.45.Final/netty-handler-4.1.45.Final.jar:/usr/local/src/repo/io/netty/netty-common/4.1.45.Final/netty-common-4.1.45.Final.jar:/usr/local/src/repo/io/netty/netty-buffer/4.1.45.Final/netty-buffer-4.1.45.Final.jar:/usr/local/src/repo/io/netty/netty-transport/4.1.45.Final/netty-transport-4.1.45.Final.jar:/usr/local/src/repo/io/netty/netty-resolver/4.1.45.Final/netty-resolver-4.1.45.Final.jar:/usr/local/src/repo/io/netty/netty-codec/4.1.45.Final/netty-codec-4.1.45.Final.jar:/usr/local/src/repo/io/netty/netty-transport-native-epoll/4.1.45.Final/netty-transport-native-epoll-4.1.45.Final.jar:/usr/local/src/repo/io/netty/netty-transport-native-unix-common/4.1.45.Final/netty-transport-native-unix-common-4.1.45.Final.jar:/usr/local/src/repo/commons-cli/commons-cli/1.4/commons-cli-1.4.jar:/usr/local/src/repo/org/apache/flink/flink-connector-jdbc_2.12/1.14.0/flink-connector-jdbc_2.12-1.14.0.jar:/usr/local/src/repo/com/h2database/h2/1.4.200/h2-1.4.200.jar:/usr/local/src/repo/org/apache/flink/flink-shaded-force-shading/14.0/flink-shaded-force-shading-14.0.jar:/usr/local/src/repo/org/apache/flink/flink-runtime-web_2.12/1.14.0/flink-runtime-web_2.12-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-runtime/1.14.0/flink-runtime-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-rpc-core/1.14.0/flink-rpc-core-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-rpc-akka-loader/1.14.0/flink-rpc-akka-loader-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-queryable-state-client-java/1.14.0/flink-queryable-state-client-java-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-hadoop-fs/1.14.0/flink-hadoop-fs-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-shaded-zookeeper-3/3.4.14-14.0/flink-shaded-zookeeper-3-3.4.14-14.0.jar:/usr/local/src/repo/org/javassist/javassist/3.24.0-GA/javassist-3.24.0-GA.jar:/usr/local/src/repo/org/apache/flink/flink-shaded-netty/4.1.65.Final-14.0/flink-shaded-netty-4.1.65.Final-14.0.jar:/usr/local/src/repo/org/apache/flink/flink-shaded-guava/30.1.1-jre-14.0/flink-shaded-guava-30.1.1-jre-14.0.jar:/usr/local/src/repo/org/apache/flink/flink-shaded-jackson/2.12.4-14.0/flink-shaded-jackson-2.12.4-14.0.jar:/usr/local/src/repo/com/google/code/findbugs/jsr305/1.3.9/jsr305-1.3.9.jar:/usr/local/src/repo/org/apache/flink/flink-clients_2.12/1.14.0/flink-clients_2.12-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-core/1.14.0/flink-core-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-annotations/1.14.0/flink-annotations-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-metrics-core/1.14.0/flink-metrics-core-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-shaded-asm-7/7.1-14.0/flink-shaded-asm-7-7.1-14.0.jar:/usr/local/src/repo/com/esotericsoftware/kryo/kryo/2.24.0/kryo-2.24.0.jar:/usr/local/src/repo/com/esotericsoftware/minlog/minlog/1.2/minlog-1.2.jar:/usr/local/src/repo/org/objenesis/objenesis/2.1/objenesis-2.1.jar:/usr/local/src/repo/commons-collections/commons-collections/3.2.2/commons-collections-3.2.2.jar:/usr/local/src/repo/org/apache/commons/commons-compress/1.21/commons-compress-1.21.jar:/usr/local/src/repo/org/apache/flink/flink-optimizer/1.14.0/flink-optimizer-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-java/1.14.0/flink-java-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-streaming-java_2.12/1.14.0/flink-streaming-java_2.12-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-file-sink-common/1.14.0/flink-file-sink-common-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-streaming-scala_2.12/1.14.0/flink-streaming-scala_2.12-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-scala_2.12/1.14.0/flink-scala_2.12-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-connector-kafka_2.12/1.14.0/flink-connector-kafka_2.12-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-connector-base/1.14.0/flink-connector-base-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-sql-connector-hbase-2.2_2.12/1.14.0/flink-sql-connector-hbase-2.2_2.12-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-table-planner_2.12/1.14.0/flink-table-planner_2.12-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-table-common/1.14.0/flink-table-common-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-table-api-java/1.14.0/flink-table-api-java-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-table-api-scala_2.12/1.14.0/flink-table-api-scala_2.12-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-table-api-java-bridge_2.12/1.14.0/flink-table-api-java-bridge_2.12-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-table-runtime_2.12/1.14.0/flink-table-runtime_2.12-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-table-code-splitter/1.14.0/flink-table-code-splitter-1.14.0.jar:/usr/local/src/repo/org/codehaus/janino/janino/3.0.11/janino-3.0.11.jar:/usr/local/src/repo/org/apache/calcite/avatica/avatica-core/1.17.0/avatica-core-1.17.0.jar:/usr/local/src/repo/org/apache/flink/flink-json/1.14.0/flink-json-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-table-api-scala-bridge_2.12/1.14.0/flink-table-api-scala-bridge_2.12-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-connector-redis_2.11/1.1.5/flink-connector-redis_2.11-1.1.5.jar:/usr/local/src/repo/redis/clients/jedis/2.8.0/jedis-2.8.0.jar:/usr/local/src/repo/org/apache/commons/commons-pool2/2.3/commons-pool2-2.3.jar:/usr/local/src/repo/org/slf4j/slf4j-log4j12/1.7.7/slf4j-log4j12-1.7.7.jar:/usr/local/src/repo/log4j/log4j/1.2.17/log4j-1.2.17.jar:/usr/local/src/repo/org/apache/flink/force-shading/1.1.5/force-shading-1.1.5.jar:/usr/local/src/repo/org/apache/commons/commons-lang3/3.9/commons-lang3-3.9.jar:/usr/local/src/repo/org/apache/flink/flink-connector-hive_2.12/1.14.0/flink-connector-hive_2.12-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-connector-files/1.14.0/flink-connector-files-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-connector-hbase-2.2_2.12/1.14.0/flink-connector-hbase-2.2_2.12-1.14.0.jar:/usr/local/src/repo/org/apache/flink/flink-connector-hbase-base_2.12/1.14.0/flink-connector-hbase-base_2.12-1.14.0.jar:/usr/local/src/repo/io/netty/netty-all/4.1.46.Final/netty-all-4.1.46.Final.jar:/usr/local/src/repo/com/alibaba/fastjson/1.2.62/fastjson-1.2.62.jar:/usr/local/src/repo/org/apache/kafka/kafka-clients/2.6.0/kafka-clients-2.6.0.jar:/usr/local/src/repo/com/github/luben/zstd-jni/1.4.4-7/zstd-jni-1.4.4-7.jar:/usr/local/src/repo/org/lz4/lz4-java/1.7.1/lz4-java-1.7.1.jar:/usr/local/src/repo/org/xerial/snappy/snappy-java/1.1.7.3/snappy-java-1.1.7.3.jar:/usr/local/src/repo/mysql/mysql-connector-java/5.1.47/mysql-connector-java-5.1.47.jar:/usr/local/src/repo/org/apache/spark/spark-graphx_2.12/3.1.1/spark-graphx_2.12-3.1.1.jar:/usr/local/src/repo/org/apache/spark/spark-mllib-local_2.12/3.1.1/spark-mllib-local_2.12-3.1.1.jar:/usr/local/src/repo/org/apache/xbean/xbean-asm7-shaded/4.15/xbean-asm7-shaded-4.15.jar:/usr/local/src/repo/com/github/fommil/netlib/core/1.1.2/core-1.1.2.jar:/usr/local/src/repo/net/sourceforge/f2j/arpack_combined_all/0.1/arpack_combined_all-0.1.jar:/usr/local/src/repo/org/apache/spark/spark-tags_2.12/3.1.1/spark-tags_2.12-3.1.1.jar:/usr/local/src/repo/org/spark-project/spark/unused/1.0.0/unused-1.0.0.jar:/usr/local/src/repo/org/apache/spark/spark-mllib_2.12/3.1.1/spark-mllib_2.12-3.1.1.jar:/usr/local/src/repo/org/scala-lang/modules/scala-parser-combinators_2.12/1.1.2/scala-parser-combinators_2.12-1.1.2.jar:/usr/local/src/repo/org/apache/spark/spark-streaming_2.12/3.1.1/spark-streaming_2.12-3.1.1.jar:/usr/local/src/repo/org/scalanlp/breeze_2.12/1.0/breeze_2.12-1.0.jar:/usr/local/src/repo/org/scalanlp/breeze-macros_2.12/1.0/breeze-macros_2.12-1.0.jar:/usr/local/src/repo/net/sf/opencsv/opencsv/2.3/opencsv-2.3.jar:/usr/local/src/repo/com/github/wendykierp/JTransforms/3.1/JTransforms-3.1.jar:/usr/local/src/repo/pl/edu/icm/JLargeArrays/1.5/JLargeArrays-1.5.jar:/usr/local/src/repo/com/chuusai/shapeless_2.12/2.3.3/shapeless_2.12-2.3.3.jar:/usr/local/src/repo/org/typelevel/macro-compat_2.12/1.1.1/macro-compat_2.12-1.1.1.jar:/usr/local/src/repo/org/typelevel/spire_2.12/0.17.0-M1/spire_2.12-0.17.0-M1.jar:/usr/local/src/repo/org/typelevel/spire-macros_2.12/0.17.0-M1/spire-macros_2.12-0.17.0-M1.jar:/usr/local/src/repo/org/typelevel/spire-platform_2.12/0.17.0-M1/spire-platform_2.12-0.17.0-M1.jar:/usr/local/src/repo/org/typelevel/spire-util_2.12/0.17.0-M1/spire-util_2.12-0.17.0-M1.jar:/usr/local/src/repo/org/typelevel/machinist_2.12/0.6.8/machinist_2.12-0.6.8.jar:/usr/local/src/repo/org/typelevel/algebra_2.12/2.0.0-M2/algebra_2.12-2.0.0-M2.jar:/usr/local/src/repo/org/typelevel/cats-kernel_2.12/2.0.0-M4/cats-kernel_2.12-2.0.0-M4.jar:/usr/local/src/repo/org/apache/commons/commons-math3/3.4.1/commons-math3-3.4.1.jar:/usr/local/src/repo/org/glassfish/jaxb/jaxb-runtime/2.3.2/jaxb-runtime-2.3.2.jar:/usr/local/src/repo/com/sun/istack/istack-commons-runtime/3.0.8/istack-commons-runtime-3.0.8.jar:/usr/local/src/repo/org/apache/spark/spark-core_2.12/3.1.1/spark-core_2.12-3.1.1.jar:/usr/local/src/repo/com/thoughtworks/paranamer/paranamer/2.8/paranamer-2.8.jar:/usr/local/src/repo/org/apache/avro/avro/1.8.2/avro-1.8.2.jar:/usr/local/src/repo/org/codehaus/jackson/jackson-core-asl/1.9.13/jackson-core-asl-1.9.13.jar:/usr/local/src/repo/org/tukaani/xz/1.5/xz-1.5.jar:/usr/local/src/repo/org/apache/avro/avro-mapred/1.8.2/avro-mapred-1.8.2-hadoop2.jar:/usr/local/src/repo/org/apache/avro/avro-ipc/1.8.2/avro-ipc-1.8.2.jar:/usr/local/src/repo/com/twitter/chill_2.12/0.9.5/chill_2.12-0.9.5.jar:/usr/local/src/repo/com/esotericsoftware/kryo-shaded/4.0.2/kryo-shaded-4.0.2.jar:/usr/local/src/repo/com/esotericsoftware/minlog/1.3.0/minlog-1.3.0.jar:/usr/local/src/repo/com/twitter/chill-java/0.9.5/chill-java-0.9.5.jar:/usr/local/src/repo/org/apache/spark/spark-launcher_2.12/3.1.1/spark-launcher_2.12-3.1.1.jar:/usr/local/src/repo/org/apache/spark/spark-kvstore_2.12/3.1.1/spark-kvstore_2.12-3.1.1.jar:/usr/local/src/repo/org/fusesource/leveldbjni/leveldbjni-all/1.8/leveldbjni-all-1.8.jar:/usr/local/src/repo/org/apache/spark/spark-network-common_2.12/3.1.1/spark-network-common_2.12-3.1.1.jar:/usr/local/src/repo/org/apache/spark/spark-network-shuffle_2.12/3.1.1/spark-network-shuffle_2.12-3.1.1.jar:/usr/local/src/repo/org/apache/spark/spark-unsafe_2.12/3.1.1/spark-unsafe_2.12-3.1.1.jar:/usr/local/src/repo/javax/activation/activation/1.1.1/activation-1.1.1.jar:/usr/local/src/repo/org/apache/curator/curator-recipes/2.13.0/curator-recipes-2.13.0.jar:/usr/local/src/repo/jakarta/servlet/jakarta.servlet-api/4.0.3/jakarta.servlet-api-4.0.3.jar:/usr/local/src/repo/org/apache/commons/commons-text/1.6/commons-text-1.6.jar:/usr/local/src/repo/org/slf4j/jul-to-slf4j/1.7.30/jul-to-slf4j-1.7.30.jar:/usr/local/src/repo/org/slf4j/jcl-over-slf4j/1.7.30/jcl-over-slf4j-1.7.30.jar:/usr/local/src/repo/com/ning/compress-lzf/1.0.3/compress-lzf-1.0.3.jar:/usr/local/src/repo/org/roaringbitmap/RoaringBitmap/0.9.0/RoaringBitmap-0.9.0.jar:/usr/local/src/repo/org/roaringbitmap/shims/0.9.0/shims-0.9.0.jar:/usr/local/src/repo/commons-net/commons-net/3.1/commons-net-3.1.jar:/usr/local/src/repo/org/json4s/json4s-jackson_2.12/3.7.0-M5/json4s-jackson_2.12-3.7.0-M5.jar:/usr/local/src/repo/org/json4s/json4s-core_2.12/3.7.0-M5/json4s-core_2.12-3.7.0-M5.jar:/usr/local/src/repo/org/json4s/json4s-ast_2.12/3.7.0-M5/json4s-ast_2.12-3.7.0-M5.jar:/usr/local/src/repo/org/json4s/json4s-scalap_2.12/3.7.0-M5/json4s-scalap_2.12-3.7.0-M5.jar:/usr/local/src/repo/org/glassfish/jersey/core/jersey-client/2.30/jersey-client-2.30.jar:/usr/local/src/repo/jakarta/ws/rs/jakarta.ws.rs-api/2.1.6/jakarta.ws.rs-api-2.1.6.jar:/usr/local/src/repo/org/glassfish/hk2/external/jakarta.inject/2.6.1/jakarta.inject-2.6.1.jar:/usr/local/src/repo/org/glassfish/jersey/core/jersey-common/2.30/jersey-common-2.30.jar:/usr/local/src/repo/jakarta/annotation/jakarta.annotation-api/1.3.5/jakarta.annotation-api-1.3.5.jar:/usr/local/src/repo/org/glassfish/hk2/osgi-resource-locator/1.0.3/osgi-resource-locator-1.0.3.jar:/usr/local/src/repo/org/glassfish/jersey/core/jersey-server/2.30/jersey-server-2.30.jar:/usr/local/src/repo/org/glassfish/jersey/media/jersey-media-jaxb/2.30/jersey-media-jaxb-2.30.jar:/usr/local/src/repo/jakarta/validation/jakarta.validation-api/2.0.2/jakarta.validation-api-2.0.2.jar:/usr/local/src/repo/org/glassfish/jersey/containers/jersey-container-servlet/2.30/jersey-container-servlet-2.30.jar:/usr/local/src/repo/org/glassfish/jersey/containers/jersey-container-servlet-core/2.30/jersey-container-servlet-core-2.30.jar:/usr/local/src/repo/org/glassfish/jersey/inject/jersey-hk2/2.30/jersey-hk2-2.30.jar:/usr/local/src/repo/org/glassfish/hk2/hk2-locator/2.6.1/hk2-locator-2.6.1.jar:/usr/local/src/repo/org/glassfish/hk2/external/aopalliance-repackaged/2.6.1/aopalliance-repackaged-2.6.1.jar:/usr/local/src/repo/org/glassfish/hk2/hk2-api/2.6.1/hk2-api-2.6.1.jar:/usr/local/src/repo/org/glassfish/hk2/hk2-utils/2.6.1/hk2-utils-2.6.1.jar:/usr/local/src/repo/com/clearspring/analytics/stream/2.9.6/stream-2.9.6.jar:/usr/local/src/repo/io/dropwizard/metrics/metrics-core/4.1.1/metrics-core-4.1.1.jar:/usr/local/src/repo/io/dropwizard/metrics/metrics-jvm/4.1.1/metrics-jvm-4.1.1.jar:/usr/local/src/repo/io/dropwizard/metrics/metrics-json/4.1.1/metrics-json-4.1.1.jar:/usr/local/src/repo/io/dropwizard/metrics/metrics-graphite/4.1.1/metrics-graphite-4.1.1.jar:/usr/local/src/repo/io/dropwizard/metrics/metrics-jmx/4.1.1/metrics-jmx-4.1.1.jar:/usr/local/src/repo/org/apache/ivy/ivy/2.4.0/ivy-2.4.0.jar:/usr/local/src/repo/oro/oro/2.0.8/oro-2.0.8.jar:/usr/local/src/repo/net/razorvine/pyrolite/4.30/pyrolite-4.30.jar:/usr/local/src/repo/net/sf/py4j/py4j/0.10.9/py4j-0.10.9.jar:/usr/local/src/repo/org/apache/commons/commons-crypto/1.1.0/commons-crypto-1.1.0.jar:/usr/local/src/repo/org/apache/spark/spark-sql_2.12/3.1.1/spark-sql_2.12-3.1.1.jar:/usr/local/src/repo/com/univocity/univocity-parsers/2.9.1/univocity-parsers-2.9.1.jar:/usr/local/src/repo/org/apache/spark/spark-sketch_2.12/3.1.1/spark-sketch_2.12-3.1.1.jar:/usr/local/src/repo/org/apache/spark/spark-catalyst_2.12/3.1.1/spark-catalyst_2.12-3.1.1.jar:/usr/local/src/repo/org/codehaus/janino/commons-compiler/3.0.16/commons-compiler-3.0.16.jar:/usr/local/src/repo/org/antlr/antlr4-runtime/4.8-1/antlr4-runtime-4.8-1.jar:/usr/local/src/repo/org/apache/arrow/arrow-vector/2.0.0/arrow-vector-2.0.0.jar:/usr/local/src/repo/org/apache/arrow/arrow-format/2.0.0/arrow-format-2.0.0.jar:/usr/local/src/repo/org/apache/arrow/arrow-memory-core/2.0.0/arrow-memory-core-2.0.0.jar:/usr/local/src/repo/com/google/flatbuffers/flatbuffers-java/1.9.0/flatbuffers-java-1.9.0.jar:/usr/local/src/repo/org/apache/arrow/arrow-memory-netty/2.0.0/arrow-memory-netty-2.0.0.jar:/usr/local/src/repo/org/apache/orc/orc-core/1.5.12/orc-core-1.5.12.jar:/usr/local/src/repo/org/apache/orc/orc-shims/1.5.12/orc-shims-1.5.12.jar:/usr/local/src/repo/commons-lang/commons-lang/2.6/commons-lang-2.6.jar:/usr/local/src/repo/io/airlift/aircompressor/0.10/aircompressor-0.10.jar:/usr/local/src/repo/org/threeten/threeten-extra/1.5.0/threeten-extra-1.5.0.jar:/usr/local/src/repo/org/apache/orc/orc-mapreduce/1.5.12/orc-mapreduce-1.5.12.jar:/usr/local/src/repo/org/apache/hive/hive-storage-api/2.7.2/hive-storage-api-2.7.2.jar:/usr/local/src/repo/org/apache/parquet/parquet-column/1.10.1/parquet-column-1.10.1.jar:/usr/local/src/repo/org/apache/parquet/parquet-common/1.10.1/parquet-common-1.10.1.jar:/usr/local/src/repo/org/apache/parquet/parquet-encoding/1.10.1/parquet-encoding-1.10.1.jar:/usr/local/src/repo/org/apache/parquet/parquet-hadoop/1.10.1/parquet-hadoop-1.10.1.jar:/usr/local/src/repo/org/apache/parquet/parquet-format/2.4.0/parquet-format-2.4.0.jar:/usr/local/src/repo/org/apache/parquet/parquet-jackson/1.10.1/parquet-jackson-1.10.1.jar:/usr/local/src/repo/org/apache/spark/spark-hive_2.12/3.1.1/spark-hive_2.12-3.1.1.jar:/usr/local/src/repo/org/apache/hive/hive-common/2.3.7/hive-common-2.3.7.jar:/usr/local/src/repo/jline/jline/2.12/jline-2.12.jar:/usr/local/src/repo/com/tdunning/json/1.8/json-1.8.jar:/usr/local/src/repo/com/github/joshelser/dropwizard-metrics-hadoop-metrics2-reporter/0.1.2/dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar:/usr/local/src/repo/org/apache/hive/hive-exec/2.3.7/hive-exec-2.3.7-core.jar:/usr/local/src/repo/org/apache/hive/hive-vector-code-gen/2.3.7/hive-vector-code-gen-2.3.7.jar:/usr/local/src/repo/org/apache/velocity/velocity/1.5/velocity-1.5.jar:/usr/local/src/repo/org/antlr/antlr-runtime/3.5.2/antlr-runtime-3.5.2.jar:/usr/local/src/repo/org/antlr/ST4/4.0.4/ST4-4.0.4.jar:/usr/local/src/repo/stax/stax-api/1.0.1/stax-api-1.0.1.jar:/usr/local/src/repo/org/apache/hive/hive-metastore/2.3.7/hive-metastore-2.3.7.jar:/usr/local/src/repo/javolution/javolution/5.5.1/javolution-5.5.1.jar:/usr/local/src/repo/com/jolbox/bonecp/0.8.0.RELEASE/bonecp-0.8.0.RELEASE.jar:/usr/local/src/repo/com/zaxxer/HikariCP/2.5.1/HikariCP-2.5.1.jar:/usr/local/src/repo/org/datanucleus/datanucleus-api-jdo/4.2.4/datanucleus-api-jdo-4.2.4.jar:/usr/local/src/repo/org/datanucleus/datanucleus-rdbms/4.1.19/datanucleus-rdbms-4.1.19.jar:/usr/local/src/repo/commons-pool/commons-pool/1.5.4/commons-pool-1.5.4.jar:/usr/local/src/repo/commons-dbcp/commons-dbcp/1.4/commons-dbcp-1.4.jar:/usr/local/src/repo/javax/jdo/jdo-api/3.0.1/jdo-api-3.0.1.jar:/usr/local/src/repo/javax/transaction/jta/1.1/jta-1.1.jar:/usr/local/src/repo/org/datanucleus/javax.jdo/3.2.0-m3/javax.jdo-3.2.0-m3.jar:/usr/local/src/repo/javax/transaction/transaction-api/1.1/transaction-api-1.1.jar:/usr/local/src/repo/org/apache/hive/hive-serde/2.3.7/hive-serde-2.3.7.jar:/usr/local/src/repo/org/apache/hive/hive-shims/2.3.7/hive-shims-2.3.7.jar:/usr/local/src/repo/org/apache/hive/shims/hive-shims-common/2.3.7/hive-shims-common-2.3.7.jar:/usr/local/src/repo/org/apache/hive/shims/hive-shims-0.23/2.3.7/hive-shims-0.23-2.3.7.jar:/usr/local/src/repo/org/apache/hive/shims/hive-shims-scheduler/2.3.7/hive-shims-scheduler-2.3.7.jar:/usr/local/src/repo/org/apache/hive/hive-llap-common/2.3.7/hive-llap-common-2.3.7.jar:/usr/local/src/repo/org/apache/hive/hive-llap-client/2.3.7/hive-llap-client-2.3.7.jar:/usr/local/src/repo/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar:/usr/local/src/repo/commons-logging/commons-logging/1.0.4/commons-logging-1.0.4.jar:/usr/local/src/repo/org/apache/httpcomponents/httpclient/4.5.6/httpclient-4.5.6.jar:/usr/local/src/repo/org/apache/httpcomponents/httpcore/4.4.10/httpcore-4.4.10.jar:/usr/local/src/repo/org/codehaus/jackson/jackson-mapper-asl/1.9.13/jackson-mapper-asl-1.9.13.jar:/usr/local/src/repo/commons-codec/commons-codec/1.10/commons-codec-1.10.jar:/usr/local/src/repo/joda-time/joda-time/2.10.5/joda-time-2.10.5.jar:/usr/local/src/repo/org/jodd/jodd-core/3.5.2/jodd-core-3.5.2.jar:/usr/local/src/repo/org/datanucleus/datanucleus-core/4.1.17/datanucleus-core-4.1.17.jar:/usr/local/src/repo/org/apache/thrift/libthrift/0.12.0/libthrift-0.12.0.jar:/usr/local/src/repo/org/apache/thrift/libfb303/0.9.3/libfb303-0.9.3.jar:/usr/local/src/repo/org/apache/derby/derby/10.12.1.1/derby-10.12.1.1.jar:/usr/local/src/repo/org/apache/hadoop/hadoop-client/3.1.3/hadoop-client-3.1.3.jar:/usr/local/src/repo/org/apache/hadoop/hadoop-common/3.1.3/hadoop-common-3.1.3.jar:/usr/local/src/repo/org/eclipse/jetty/jetty-servlet/9.3.24.v20180605/jetty-servlet-9.3.24.v20180605.jar:/usr/local/src/repo/org/eclipse/jetty/jetty-security/9.3.24.v20180605/jetty-security-9.3.24.v20180605.jar:/usr/local/src/repo/org/eclipse/jetty/jetty-webapp/9.3.24.v20180605/jetty-webapp-9.3.24.v20180605.jar:/usr/local/src/repo/org/eclipse/jetty/jetty-xml/9.3.24.v20180605/jetty-xml-9.3.24.v20180605.jar:/usr/local/src/repo/javax/servlet/jsp/jsp-api/2.1/jsp-api-2.1.jar:/usr/local/src/repo/com/sun/jersey/jersey-servlet/1.19/jersey-servlet-1.19.jar:/usr/local/src/repo/commons-beanutils/commons-beanutils/1.9.3/commons-beanutils-1.9.3.jar:/usr/local/src/repo/org/apache/commons/commons-configuration2/2.1.1/commons-configuration2-2.1.1.jar:/usr/local/src/repo/com/google/re2j/re2j/1.1/re2j-1.1.jar:/usr/local/src/repo/org/apache/curator/curator-client/2.13.0/curator-client-2.13.0.jar:/usr/local/src/repo/org/apache/hadoop/hadoop-hdfs-client/3.1.3/hadoop-hdfs-client-3.1.3.jar:/usr/local/src/repo/com/squareup/okhttp/okhttp/2.7.5/okhttp-2.7.5.jar:/usr/local/src/repo/com/squareup/okio/okio/1.6.0/okio-1.6.0.jar:/usr/local/src/repo/org/apache/hadoop/hadoop-yarn-api/3.1.3/hadoop-yarn-api-3.1.3.jar:/usr/local/src/repo/javax/xml/bind/jaxb-api/2.2.11/jaxb-api-2.2.11.jar:/usr/local/src/repo/org/apache/hadoop/hadoop-yarn-client/3.1.3/hadoop-yarn-client-3.1.3.jar:/usr/local/src/repo/org/apache/hadoop/hadoop-mapreduce-client-core/3.1.3/hadoop-mapreduce-client-core-3.1.3.jar:/usr/local/src/repo/org/apache/hadoop/hadoop-yarn-common/3.1.3/hadoop-yarn-common-3.1.3.jar:/usr/local/src/repo/org/eclipse/jetty/jetty-util/9.3.24.v20180605/jetty-util-9.3.24.v20180605.jar:/usr/local/src/repo/com/sun/jersey/jersey-client/1.19/jersey-client-1.19.jar:/usr/local/src/repo/org/apache/hadoop/hadoop-mapreduce-client-jobclient/3.1.3/hadoop-mapreduce-client-jobclient-3.1.3.jar:/usr/local/src/repo/org/apache/hadoop/hadoop-mapreduce-client-common/3.1.3/hadoop-mapreduce-client-common-3.1.3.jar:/usr/local/src/repo/org/apache/hadoop/hadoop-annotations/3.1.3/hadoop-annotations-3.1.3.jar:/usr/local/src/repo/org/apache/hadoop/hadoop-auth/3.1.3/hadoop-auth-3.1.3.jar:/usr/local/src/repo/com/nimbusds/nimbus-jose-jwt/4.41.1/nimbus-jose-jwt-4.41.1.jar:/usr/local/src/repo/com/github/stephenc/jcip/jcip-annotations/1.0-1/jcip-annotations-1.0-1.jar:/usr/local/src/repo/net/minidev/json-smart/2.3/json-smart-2.3.jar:/usr/local/src/repo/net/minidev/accessors-smart/1.2/accessors-smart-1.2.jar:/usr/local/src/repo/org/ow2/asm/asm/5.0.4/asm-5.0.4.jar:/usr/local/src/repo/org/apache/curator/curator-framework/2.13.0/curator-framework-2.13.0.jar:/usr/local/src/repo/org/apache/kerby/kerb-simplekdc/1.0.1/kerb-simplekdc-1.0.1.jar:/usr/local/src/repo/org/apache/kerby/kerb-client/1.0.1/kerb-client-1.0.1.jar:/usr/local/src/repo/org/apache/kerby/kerby-config/1.0.1/kerby-config-1.0.1.jar:/usr/local/src/repo/org/apache/kerby/kerb-core/1.0.1/kerb-core-1.0.1.jar:/usr/local/src/repo/org/apache/kerby/kerby-pkix/1.0.1/kerby-pkix-1.0.1.jar:/usr/local/src/repo/org/apache/kerby/kerby-asn1/1.0.1/kerby-asn1-1.0.1.jar:/usr/local/src/repo/org/apache/kerby/kerby-util/1.0.1/kerby-util-1.0.1.jar:/usr/local/src/repo/org/apache/kerby/kerb-common/1.0.1/kerb-common-1.0.1.jar:/usr/local/src/repo/org/apache/kerby/kerb-crypto/1.0.1/kerb-crypto-1.0.1.jar:/usr/local/src/repo/org/apache/kerby/kerb-util/1.0.1/kerb-util-1.0.1.jar:/usr/local/src/repo/org/apache/kerby/token-provider/1.0.1/token-provider-1.0.1.jar:/usr/local/src/repo/org/apache/kerby/kerb-admin/1.0.1/kerb-admin-1.0.1.jar:/usr/local/src/repo/org/apache/kerby/kerb-server/1.0.1/kerb-server-1.0.1.jar:/usr/local/src/repo/org/apache/kerby/kerb-identity/1.0.1/kerb-identity-1.0.1.jar:/usr/local/src/repo/org/apache/kerby/kerby-xdr/1.0.1/kerby-xdr-1.0.1.jar:/usr/local/src/repo/com/google/guava/guava/27.0-jre/guava-27.0-jre.jar:/usr/local/src/repo/com/google/guava/failureaccess/1.0/failureaccess-1.0.jar:/usr/local/src/repo/com/google/guava/listenablefuture/9999.0-empty-to-avoid-conflict-with-guava/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar:/usr/local/src/repo/org/checkerframework/checker-qual/2.5.2/checker-qual-2.5.2.jar:/usr/local/src/repo/com/google/errorprone/error_prone_annotations/2.2.0/error_prone_annotations-2.2.0.jar:/usr/local/src/repo/com/google/j2objc/j2objc-annotations/1.1/j2objc-annotations-1.1.jar:/usr/local/src/repo/org/codehaus/mojo/animal-sniffer-annotations/1.17/animal-sniffer-annotations-1.17.jar:/usr/local/src/repo/org/apache/hbase/hbase-mapreduce/2.2.3/hbase-mapreduce-2.2.3.jar:/usr/local/src/repo/org/apache/hbase/thirdparty/hbase-shaded-miscellaneous/2.2.1/hbase-shaded-miscellaneous-2.2.1.jar:/usr/local/src/repo/org/apache/hbase/thirdparty/hbase-shaded-netty/2.2.1/hbase-shaded-netty-2.2.1.jar:/usr/local/src/repo/org/apache/hbase/thirdparty/hbase-shaded-protobuf/2.2.1/hbase-shaded-protobuf-2.2.1.jar:/usr/local/src/repo/org/apache/hbase/hbase-common/2.2.3/hbase-common-2.2.3.jar:/usr/local/src/repo/com/github/stephenc/findbugs/findbugs-annotations/1.3.9-1/findbugs-annotations-1.3.9-1.jar:/usr/local/src/repo/org/apache/hbase/hbase-zookeeper/2.2.3/hbase-zookeeper-2.2.3.jar:/usr/local/src/repo/org/apache/hbase/hbase-protocol/2.2.3/hbase-protocol-2.2.3.jar:/usr/local/src/repo/com/google/protobuf/protobuf-java/2.5.0/protobuf-java-2.5.0.jar:/usr/local/src/repo/org/apache/hbase/hbase-protocol-shaded/2.2.3/hbase-protocol-shaded-2.2.3.jar:/usr/local/src/repo/org/apache/hbase/hbase-metrics/2.2.3/hbase-metrics-2.2.3.jar:/usr/local/src/repo/org/apache/hbase/hbase-metrics-api/2.2.3/hbase-metrics-api-2.2.3.jar:/usr/local/src/repo/org/apache/htrace/htrace-core4/4.2.0-incubating/htrace-core4-4.2.0-incubating.jar:/usr/local/src/repo/org/apache/hbase/hbase-hadoop-compat/2.2.3/hbase-hadoop-compat-2.2.3.jar:/usr/local/src/repo/org/apache/hbase/hbase-hadoop2-compat/2.2.3/hbase-hadoop2-compat-2.2.3.jar:/usr/local/src/repo/org/apache/hbase/hbase-server/2.2.3/hbase-server-2.2.3.jar:/usr/local/src/repo/org/apache/hbase/hbase-http/2.2.3/hbase-http-2.2.3.jar:/usr/local/src/repo/org/eclipse/jetty/jetty-util-ajax/9.3.27.v20190418/jetty-util-ajax-9.3.27.v20190418.jar:/usr/local/src/repo/org/eclipse/jetty/jetty-http/9.3.27.v20190418/jetty-http-9.3.27.v20190418.jar:/usr/local/src/repo/javax/ws/rs/javax.ws.rs-api/2.0.1/javax.ws.rs-api-2.0.1.jar:/usr/local/src/repo/org/apache/hbase/hbase-procedure/2.2.3/hbase-procedure-2.2.3.jar:/usr/local/src/repo/org/eclipse/jetty/jetty-server/9.3.27.v20190418/jetty-server-9.3.27.v20190418.jar:/usr/local/src/repo/org/eclipse/jetty/jetty-io/9.3.27.v20190418/jetty-io-9.3.27.v20190418.jar:/usr/local/src/repo/org/glassfish/web/javax.servlet.jsp/2.3.2/javax.servlet.jsp-2.3.2.jar:/usr/local/src/repo/org/glassfish/javax.el/3.0.1-b12/javax.el-3.0.1-b12.jar:/usr/local/src/repo/javax/servlet/jsp/javax.servlet.jsp-api/2.3.1/javax.servlet.jsp-api-2.3.1.jar:/usr/local/src/repo/org/jamon/jamon-runtime/2.4.1/jamon-runtime-2.4.1.jar:/usr/local/src/repo/javax/servlet/javax.servlet-api/3.1.0/javax.servlet-api-3.1.0.jar:/usr/local/src/repo/com/lmax/disruptor/3.3.6/disruptor-3.3.6.jar:/usr/local/src/repo/org/apache/hadoop/hadoop-distcp/2.8.5/hadoop-distcp-2.8.5.jar:/usr/local/src/repo/org/apache/hbase/hbase-replication/2.2.3/hbase-replication-2.2.3.jar:/usr/local/src/repo/commons-io/commons-io/2.5/commons-io-2.5.jar:/usr/local/src/repo/org/apache/hadoop/hadoop-hdfs/2.8.5/hadoop-hdfs-2.8.5.jar:/usr/local/src/repo/org/mortbay/jetty/jetty/6.1.26/jetty-6.1.26.jar:/usr/local/src/repo/org/mortbay/jetty/jetty-util/6.1.26/jetty-util-6.1.26.jar:/usr/local/src/repo/com/sun/jersey/jersey-core/1.9/jersey-core-1.9.jar:/usr/local/src/repo/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar:/usr/local/src/repo/asm/asm/3.1/asm-3.1.jar:/usr/local/src/repo/commons-daemon/commons-daemon/1.0.13/commons-daemon-1.0.13.jar:/usr/local/src/repo/xmlenc/xmlenc/0.52/xmlenc-0.52.jar:/usr/local/src/repo/org/apache/yetus/audience-annotations/0.5.0/audience-annotations-0.5.0.jar:/usr/local/src/repo/org/apache/hbase/hbase-client/2.2.3/hbase-client-2.2.3.jar:/usr/local/src/repo/org/jruby/jcodings/jcodings/1.0.18/jcodings-1.0.18.jar:/usr/local/src/repo/org/jruby/joni/joni/2.1.11/joni-2.1.11.jar:/usr/local/src/repo/ru/yandex/clickhouse/clickhouse-jdbc/0.3.2/clickhouse-jdbc-0.3.2.jar:/usr/local/src/repo/com/clickhouse/clickhouse-http-client/0.3.2/clickhouse-http-client-0.3.2.jar:/usr/local/src/repo/com/clickhouse/clickhouse-client/0.3.2/clickhouse-client-0.3.2.jar:/usr/local/src/repo/com/google/code/gson/gson/2.8.8/gson-2.8.8.jar:/usr/local/src/repo/org/apache/httpcomponents/httpmime/4.5.13/httpmime-4.5.13.jar:/opt/scala-2.12.10/lib/scala-parser-combinators_2.12-1.0.7.jar:/opt/scala-2.12.10/lib/scala-xml_2.12-1.0.6.jar:/opt/scala-2.12.10/lib/scala-swing_2.12-2.0.3.jar:/opt/scala-2.12.10/lib/scala-reflect.jar:/opt/scala-2.12.10/lib/scala-library.jar gs8.shujuwaqu2 log4j:WARN No appenders could be found for logger (org.apache.hadoop.hive.conf.HiveConf). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Using Spark&#39;s default log4j profile: org/apache/spark/log4j-defaults.properties 25/10/09 15:41:11 WARN Utils: Your hostname, pbcp resolves to a loopback address: 127.0.1.1; using 192.168.75.3 instead (on interface ens33) 25/10/09 15:41:11 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 25/10/09 15:41:11 INFO SparkContext: Running Spark version 3.1.1 25/10/09 15:41:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 25/10/09 15:41:11 INFO ResourceUtils: ============================================================== 25/10/09 15:41:11 INFO ResourceUtils: No custom resources configured for spark.driver. 25/10/09 15:41:11 INFO ResourceUtils: ============================================================== 25/10/09 15:41:11 INFO SparkContext: Submitted application: RandomForestModel 25/10/09 15:41:11 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) 25/10/09 15:41:11 INFO ResourceProfile: Limiting resource is cpu 25/10/09 15:41:11 INFO ResourceProfileManager: Added ResourceProfile id: 0 25/10/09 15:41:11 INFO SecurityManager: Changing view acls to: root 25/10/09 15:41:11 INFO SecurityManager: Changing modify acls to: root 25/10/09 15:41:11 INFO SecurityManager: Changing view acls groups to: 25/10/09 15:41:11 INFO SecurityManager: Changing modify acls groups to: 25/10/09 15:41:11 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 25/10/09 15:41:12 INFO Utils: Successfully started service &#39;sparkDriver&#39; on port 45815. 25/10/09 15:41:12 INFO SparkEnv: Registering MapOutputTracker 25/10/09 15:41:12 INFO SparkEnv: Registering BlockManagerMaster 25/10/09 15:41:12 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 25/10/09 15:41:12 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 25/10/09 15:41:12 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 25/10/09 15:41:12 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-5cf527aa-87b6-4cde-8bd4-fd7658d60a9a 25/10/09 15:41:12 INFO MemoryStore: MemoryStore started with capacity 1948.2 MiB 25/10/09 15:41:12 INFO SparkEnv: Registering OutputCommitCoordinator 25/10/09 15:41:12 INFO Utils: Successfully started service &#39;SparkUI&#39; on port 4040. 25/10/09 15:41:12 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.75.3:4040 25/10/09 15:41:12 INFO Executor: Starting executor ID driver on host 192.168.75.3 25/10/09 15:41:12 INFO Utils: Successfully started service &#39;org.apache.spark.network.netty.NettyBlockTransferService&#39; on port 44405. 25/10/09 15:41:12 INFO NettyBlockTransferService: Server created on 192.168.75.3:44405 25/10/09 15:41:12 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 25/10/09 15:41:12 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.75.3, 44405, None) 25/10/09 15:41:12 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.75.3:44405 with 1948.2 MiB RAM, BlockManagerId(driver, 192.168.75.3, 44405, None) 25/10/09 15:41:12 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.75.3, 44405, None) 25/10/09 15:41:12 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.75.3, 44405, None) 25/10/09 15:41:13 ERROR FileUtils: The jar file path /opt/module/hive-3.1.2/lib doesn&#39;t exist 25/10/09 15:41:13 ERROR FileUtils: The jar file path /opt/module/hive-3.1.2/jdbc doesn&#39;t exist 25/10/09 15:41:14 ERROR FileUtils: The jar file path /opt/module/hive-3.1.2/lib doesn&#39;t exist 25/10/09 15:41:14 ERROR FileUtils: The jar file path /opt/module/hive-3.1.2/jdbc doesn&#39;t exist 25/10/09 15:41:14 INFO Persistence: Property datanucleus.metadata.validate unknown - will be ignored 25/10/09 15:41:14 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored 25/10/09 15:41:14 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored 25/10/09 15:41:14 ERROR FileUtils: The jar file path /opt/module/hive-3.1.2/lib doesn&#39;t exist 25/10/09 15:41:14 ERROR FileUtils: The jar file path /opt/module/hive-3.1.2/jdbc doesn&#39;t exist Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException: DataType longtype is not supported.(line 1, pos 0) == SQL == LongType ^^^ at org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitPrimitiveDataType$1(AstBuilder.scala:2267) at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:124) at org.apache.spark.sql.catalyst.parser.AstBuilder.visitPrimitiveDataType(AstBuilder.scala:2242) at org.apache.spark.sql.catalyst.parser.AstBuilder.visitPrimitiveDataType(AstBuilder.scala:54) at org.apache.spark.sql.catalyst.parser.SqlBaseParser$PrimitiveDataTypeContext.accept(SqlBaseParser.java:17259) at org.apache.spark.sql.catalyst.parser.AstBuilder.typedVisit(AstBuilder.scala:58) at org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitSingleDataType$1(AstBuilder.scala:98) at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:124) at org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleDataType(AstBuilder.scala:98) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.$anonfun$parseDataType$1(ParseDriver.scala:39) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:107) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseDataType(ParseDriver.scala:38) at org.apache.spark.sql.Column.cast(Column.scala:1204) at gs8.shujuwaqu2$.main(shujuwaqu2.scala:24) at gs8.shujuwaqu2.main(shujuwaqu2.scala) Process finished with exit code 1
10-10
要下载特定版本的 Oracle WebLogic Server JAR 文件(如 `fmw_12.2.1.3.0_wls.jar`),需访问 Oracle 官方网站并遵循相关流程[^1]。以下是详细的步骤说明: ### 下载获取步骤 1. **访问官方下载页面** 前往 [Oracle WebLogic Server 下载页面](https://www.oracle.com/middleware/technologies/weblogic-server-installers-downloads.html),选择所需的版本和平台。 2. **登录 Oracle 账户** 为了下载安装文件,必须拥有一个 Oracle 帐户。如果尚未注册,请先完成注册流程。 3. **接受许可协议并下载** 在下载链接旁边点击“Accept License Agreement”,然后下载适用于目标操作系统的安装包。例如,在 Linux 系统上通常会下载 `.zip` 或 `.jar` 格式的文件。 4. **解压 ZIP 文件以提取 JAR 文件** 如果下载的是包含多个文件的 ZIP 包(如 `fmw_12.2.1.3.0_wls_Disk1_1of1.zip`),可以通过以下命令解压以获取 `fmw_12.2.1.3.0_wls.jar`: ```bash unzip fmw_12.2.1.3.0_wls_Disk1_1of1.zip ``` 此命令将提取出 `fmw_12.2.1.3.0_wls.jar` 和其他相关文件。 5. **使用静默安装方式执行安装** 如果需要通过静默模式安装 WebLogic Server,则可以使用如下命令执行安装: ```bash $JAVA_HOME/bin/java -Xmx1024m -jar /u01/install/fmw_12.2.1.3.0_wls.jar -silent -responseFile /u01/install/wls12.2.1.3.rsp -invPtrLoc /u01/install/oraInst.loc ``` 该命令将使用指定的响应文件进行自动安装,避免手动交互[^2]。 6. **补丁管理** 对于已安装的 WebLogic 版本(如 10.3.6),可前往 Oracle 支持门户查找并应用最新的补丁(如 Patch 10.3.6.0.200714)。这些补丁通常以独立的 JAR 文件形式提供,并可通过命令行工具进行安装。 ### 示例:WebLogic Server 12c 安装脚本片段 ```bash # 解压安装包 unzip fmw_12.2.1.3.0_wls_Disk1_1of1.zip # 静默安装 WebLogic Server $JAVA_HOME/bin/java -Xmx1024m -jar fmw_12.2.1.3.0_wls.jar -silent -responseFile ./wls12.2.1.3.rsp -invPtrLoc ./oraInst.loc ``` ### 注意事项 - **Java 环境要求**:确保系统中已安装兼容版本的 JDK(如 Java 8)。 - **响应文件配置**:在静默安装时,需提前准备好响应文件(`.rsp`),其中包含必要的安装参数和路径设置。 - **操作系统兼容性**:确认所选版本支持当前的操作系统环境(如 Linux、Windows 等)。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值