spark-使用cloudera manager部署的spark测试运行mllib的例子

1.测试cdh集群中spark是否正常运行

[root@cdh01 ~]#  spark-submit --master local --class  org.apache.spark.examples.SparkPi /opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/spark/lib/spark-examples.jar 10
18/10/29 14:39:08 INFO spark.SparkContext: Running Spark version 1.6.0
18/10/29 14:39:09 INFO spark.SecurityManager: Changing view acls to: root
18/10/29 14:39:09 INFO spark.SecurityManager: Changing modify acls to: root
18/10/29 14:39:09 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
18/10/29 14:39:09 INFO util.Utils: Successfully started service 'sparkDriver' on port 55692.
18/10/29 14:39:09 INFO slf4j.Slf4jLogger: Slf4jLogger started
18/10/29 14:39:09 INFO Remoting: Starting remoting
18/10/29 14:39:10 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.50.202:43516]
18/10/29 14:39:10 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriverActorSystem@192.168.50.202:43516]
18/10/29 14:39:10 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 43516.
18/10/29 14:39:10 INFO spark.SparkEnv: Registering MapOutputTracker
18/10/29 14:39:10 INFO spark.SparkEnv: Registering BlockManagerMaster
18/10/29 14:39:10 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-2bf97eb7-1a7e-4df7-b221-4e603dc3a55f
18/10/29 14:39:10 INFO storage.MemoryStore: MemoryStore started with capacity 530.0 MB
18/10/29 14:39:10 INFO spark.SparkEnv: Registering OutputCommitCoordinator
18/10/29 14:39:10 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
18/10/29 14:39:10 INFO ui.SparkUI: Started SparkUI at http://192.168.50.202:4040
18/10/29 14:39:10 INFO spark.SparkContext: Added JAR file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/spark/lib/spark-examples.jar at spark://192.168.50.202:55692/jars/spark-examples.jar with timestamp 1540795150401
18/10/29 14:39:10 INFO executor.Executor: Starting executor ID driver on host localhost
18/10/29 14:39:10 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 53969.
18/10/29 14:39:10 INFO netty.NettyBlockTransferService: Server created on 53969
18/10/29 14:39:10 INFO storage.BlockManager: external shuffle service port = 7337
18/10/29 14:39:10 INFO storage.BlockManagerMaster: Trying to register BlockManager
18/10/29 14:39:10 INFO storage.BlockManagerMasterEndpoint: Registering block manager localhost:53969 with 530.0 MB RAM, BlockManagerId(driver, localhost, 53969)
18/10/29 14:39:10 INFO storage.BlockManagerMaster: Registered BlockManager
18/10/29 14:39:11 INFO scheduler.EventLoggingListener: Logging events to hdfs://cdh01:8020/user/spark/applicationHistory/local-1540795150435
18/10/29 14:39:11 INFO spark.SparkContext: Registered listener com.cloudera.spark.lineage.ClouderaNavigatorListener
18/10/29 14:39:11 INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:36
18/10/29 14:39:11 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:36) with 10 output partitions
18/10/29 14:39:11 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:36)
18/10/29 14:39:11 INFO scheduler.DAGScheduler: Parents of final stage: List()
18/10/29 14:39:11 INFO scheduler.DAGScheduler: Missing parents: List()
18/10/29 14:39:11 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:32), which has no missing parents
18/10/29 14:39:12 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1904.0 B, free 530.0 MB)
18/10/29 14:39:12 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1202.0 B, free 530.0 MB)
18/10/29 14:39:12 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:53969 (size: 1202.0 B, free: 530.0 MB)
18/10/29 14:39:12 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1004
18/10/29 14:39:12 INFO scheduler.DAGScheduler: Submitting 10 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:32) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9))
18/10/29 14:39:12 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 10 tasks
18/10/29 14:39:12 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 2036 bytes)
18/10/29 14:39:12 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
18/10/29 14:39:12 INFO executor.Executor: Fetching spark://192.168.50.202:55692/jars/spark-examples.jar with timestamp 1540795150401
18/10/29 14:39:12 INFO spark.ExecutorAllocationManager: New executor driver has registered (new total is 1)
18/10/29 14:39:12 INFO util.Utils: Fetching spark://192.168.50.202:55692/jars/spark-examples.jar to /tmp/spark-e7873ccb-d141-4347-abcd-1b263d364be3/userFiles-89bc4061-62e5-41b0-b1c2-cecbc4d3af73/fetchFileTemp4804387182541284155.tmp
18/10/29 14:39:12 INFO executor.Executor: Adding file:/tmp/spark-e7873ccb-d141-4347-abcd-1b263d364be3/userFiles-89bc4061-62e5-41b0-b1c2-cecbc4d3af73/spark-examples.jar to class loader
18/10/29 14:39:12 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 877 bytes result sent to driver
18/10/29 14:39:12 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 2038 bytes)
18/10/29 14:39:12 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
18/10/29 14:39:12 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 342 ms on localhost (executor driver) (1/10)
18/10/29 14:39:12 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 877 bytes result sent to driver
18/10/29 14:39:12 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, executor driver, partition 2, PROCESS_LOCAL, 2038 bytes)
18/10/29 14:39:12 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 51 ms on localhost (executor driver) (2/10)
18/10/29 14:39:12 INFO executor.Executor: Running task 2.0 in stage 0.0 (TID 2)
18/10/29 14:39:12 INFO executor.Executor: Finished task 2.0 in stage 0.0 (TID 2). 877 bytes result sent to driver
18/10/29 14:39:12 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, executor driver, partition 3, PROCESS_LOCAL, 2038 bytes)
18/10/29 14:39:12 INFO executor.Executor: Running task 3.0 in stage 0.0 (TID 3)
18/10/29 14:39:12 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 39 ms on localhost (executor driver) (3/10)
18/10/29 14:39:12 INFO executor.Executor: Finished task 3.0 in stage 0.0 (TID 3). 877 bytes result sent to driver
18/10/29 14:39:12 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, localhost, executor driver, partition 4, PROCESS_LOCAL, 2038 bytes)
18/10/29 14:39:12 INFO executor.Executor: Running task 4.0 in stage 0.0 (TID 4)
18/10/29 14:39:12 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 42 ms on localhost (executor driver) (4/10)
18/10/29 14:39:12 INFO executor.Executor: Finished task 4.0 in stage 0.0 (TID 4). 877 bytes result sent to driver
18/10/29 14:39:12 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, localhost, executor driver, partition 5, PROCESS_LOCAL, 2038 bytes)
18/10/29 14:39:12 INFO executor.Executor: Running task 5.0 in stage 0.0 (TID 5)
18/10/29 14:39:12 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 37 ms on localhost (executor driver) (5/10)
18/10/29 14:39:12 INFO executor.Executor: Finished task 5.0 in stage 0.0 (TID 5). 877 bytes result sent to driver
18/10/29 14:39:12 INFO scheduler.TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, localhost, executor driver, partition 6, PROCESS_LOCAL, 2038 bytes)
18/10/29 14:39:12 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 71 ms on localhost (executor driver) (6/10)
18/10/29 14:39:12 INFO executor.Executor: Running task 6.0 in stage 0.0 (TID 6)
18/10/29 14:39:12 INFO executor.Executor: Finished task 6.0 in stage 0.0 (TID 6). 877 bytes result sent to driver
18/10/29 14:39:12 INFO scheduler.TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, localhost, executor driver, partition 7, PROCESS_LOCAL, 2038 bytes)
18/10/29 14:39:12 INFO executor.Executor: Running task 7.0 in stage 0.0 (TID 7)
18/10/29 14:39:12 INFO scheduler.TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 32 ms on localhost (executor driver) (7/10)
18/10/29 14:39:12 INFO executor.Executor: Finished task 7.0 in stage 0.0 (TID 7). 877 bytes result sent to driver
18/10/29 14:39:12 INFO scheduler.TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8, localhost, executor driver, partition 8, PROCESS_LOCAL, 2038 bytes)
18/10/29 14:39:12 INFO executor.Executor: Running task 8.0 in stage 0.0 (TID 8)
18/10/29 14:39:12 INFO scheduler.TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 28 ms on localhost (executor driver) (8/10)
18/10/29 14:39:12 INFO executor.Executor: Finished task 8.0 in stage 0.0 (TID 8). 877 bytes result sent to driver
18/10/29 14:39:12 INFO scheduler.TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, localhost, executor driver, partition 9, PROCESS_LOCAL, 2038 bytes)
18/10/29 14:39:12 INFO executor.Executor: Running task 9.0 in stage 0.0 (TID 9)
18/10/29 14:39:12 INFO scheduler.TaskSetManager: Finished task 8.0 in stage 0.0 (TID 8) in 27 ms on localhost (executor driver) (9/10)
18/10/29 14:39:12 INFO executor.Executor: Finished task 9.0 in stage 0.0 (TID 9). 877 bytes result sent to driver
18/10/29 14:39:12 INFO scheduler.TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 24 ms on localhost (executor driver) (10/10)
18/10/29 14:39:12 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:36) finished in 0.628 s
18/10/29 14:39:12 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:36, took 1.046294 s
18/10/29 14:39:12 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
Pi is roughly 3.141903141903142
18/10/29 14:39:13 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.50.202:4040
18/10/29 14:39:13 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/10/29 14:39:13 INFO storage.MemoryStore: MemoryStore cleared
18/10/29 14:39:13 INFO storage.BlockManager: BlockManager stopped
18/10/29 14:39:13 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
18/10/29 14:39:13 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/10/29 14:39:13 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
18/10/29 14:39:13 INFO spark.SparkContext: Successfully stopped SparkContext
18/10/29 14:39:13 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
18/10/29 14:39:13 INFO util.ShutdownHookManager: Shutdown hook called
18/10/29 14:39:13 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-e7873ccb-d141-4347-abcd-1b263d364be3
18/10/29 14:39:13 INFO Remoting: Remoting shut down
18/10/29 14:39:13 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.

运行正常。

2.下载spark-mllib的测试数据

wget --no-check-certificate  https://raw.githubusercontent.com/apache/spark/branch-1.5/data/mllib/sample_movielens_data.txt

会有报错:

[root@cdh01 ~]# wget --no-check-certificate \
> https://raw.githubusercontent.com/apache/spark/branch-1.5/data/mllib/sample_movielens_data.txt
-bash: wget: 未找到命令

解决办法:

1.yum -y install wget
由于我的cdh集群的yum源被设置过,无法进行yum下载。此步骤执行不成功
2.下载wget的rpm包,手动安装
下载地址http://rpmfind.net/linux/rpm2html/search.php?query=wget(x86-64)
寻找与自己系统匹配的版本,下载上传虚拟机,执行rpm安装过程
[root@cdh01 ~]# rpm -ivh /opt/lixiang/wget-1.14-15.el7_4.1.x86_64.rpm 
准备中...                          ################################# [100%]
正在升级/安装...
   1:wget-1.14-15.el7_4.1             ################################# [100%]

再次执行下载mllib例子的命令,下载完成。将下载好的数据上传至hdfs

[root@cdh01 ~]#  hdfs dfs -copyFromLocal sample_movielens_data.txt /user/hdfs

3.运行Spark MLlib MovieLens示例应用程序,该应用程序根据电影评论计算推荐值:

[root@cdh01 ~]#   spark-submit --master local --class org.apache.spark.examples.mllib.MovieLensALS /opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/spark/lib/spark-examples.jar  --rank 5 --numIterations 5 --lambda 1.0 --kryo /user/hdfs/sample_movielens_data.txt
18/10/29 14:16:54 INFO spark.SparkContext: Running Spark version 1.6.0
18/10/29 14:16:54 INFO spark.SecurityManager: Changing view acls to: root
18/10/29 14:16:54 INFO spark.SecurityManager: Changing modify acls to: root
18/10/29 14:16:54 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
18/10/29 14:16:54 INFO util.Utils: Successfully started service 'sparkDriver' on port 48962.
18/10/29 14:16:55 INFO slf4j.Slf4jLogger: Slf4jLogger started
18/10/29 14:16:55 INFO Remoting: Starting remoting
18/10/29 14:16:55 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.50.202:60843]
18/10/29 14:16:55 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriverActorSystem@192.168.50.202:60843]
18/10/29 14:16:55 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 60843.
18/10/29 14:16:55 INFO spark.SparkEnv: Registering MapOutputTracker
18/10/29 14:16:55 INFO spark.SparkEnv: Registering BlockManagerMaster
18/10/29 14:16:55 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-b631979d-d1d3-4b0e-a52c-79f23ae27859
18/10/29 14:16:55 INFO storage.MemoryStore: MemoryStore started with capacity 530.0 MB
18/10/29 14:16:55 INFO spark.SparkEnv: Registering OutputCommitCoordinator
18/10/29 14:16:55 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
18/10/29 14:16:55 INFO ui.SparkUI: Started SparkUI at http://192.168.50.202:4040
18/10/29 14:16:55 INFO spark.SparkContext: Added JAR file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/spark/lib/spark-examples.jar at spark://192.168.50.202:48962/jars/spark-examples.jar with timestamp 1540793815778
18/10/29 14:16:55 INFO executor.Executor: Starting executor ID driver on host localhost
18/10/29 14:16:55 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42182.
18/10/29 14:16:55 INFO netty.NettyBlockTransferService: Server created on 42182
18/10/29 14:16:55 INFO storage.BlockManager: external shuffle service port = 7337
18/10/29 14:16:55 INFO storage.BlockManagerMaster: Trying to register BlockManager
18/10/29 14:16:55 INFO storage.BlockManagerMasterEndpoint: Registering block manager localhost:42182 with 530.0 MB RAM, BlockManagerId(driver, localhost, 42182)
18/10/29 14:16:55 INFO storage.BlockManagerMaster: Registered BlockManager
18/10/29 14:16:57 INFO scheduler.EventLoggingListener: Logging events to hdfs://cdh01:8020/user/spark/applicationHistory/local-1540793815861
18/10/29 14:16:57 INFO spark.SparkContext: Registered listener com.cloudera.spark.lineage.ClouderaNavigatorListener
Got 1501 ratings from 30 users on 100 movies.
Training: 1184, test: 317.
18/10/29 14:17:00 WARN netlib.BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
18/10/29 14:17:00 WARN netlib.BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
18/10/29 14:17:00 WARN netlib.LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK
18/10/29 14:17:00 WARN netlib.LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK
Test RMSE = 1.424178449372927.

 

发布了36 篇原创文章 · 获赞 17 · 访问量 7万+
展开阅读全文

没有更多推荐了,返回首页

©️2019 CSDN 皮肤主题: 编程工作室 设计师: CSDN官方博客

分享到微信朋友圈

×

扫一扫,手机浏览