大数据运维题库2018年

501.配置master Node的主机名为:master;slaver1 Node的主机名为:slaver1。将查询2个节点的主机名信息以文本形式提交到答题框。
[root@master ~]# hostname
master
[root@slave ~]# hostname
slave
502.修改2个节点的hosts文件,配置IP地址与主机名之间的映射关系。将查询hosts文件的信息以文本形式提交到答题框。
[root@master ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.0.0.12 master
10.0.0.13 slave

503.配置2个节点使用Ambari和iaas中的centos7的yum源。其中Ambari yum源在XianDian-BigData-v2.1-BASE.iso软件包中。
[root@master ~]# cat /etc/yum.repos.d/ambari.repo
[centos]
name=centos
baseurl=ftp://192.168.100.10/centos
gpgcheck=0
enable=1
[ambari]
name=ambari
baseurl=ftp://192.168.100.10/2.2/ambari
gpgcheck=0
enabled=1

504.在master节点安装ntp时钟服务,使用文件/etc/ntp.conf配置ntp服务;在slaver节点安装ntpdate软件包,将slaver1节点时钟同步到master节点。
[root@slave ~]# ntpdate master
21 Jan 02:21:08 ntpdate[10527]: adjust time server 10.0.0.12 offset -0.000312 sec

505.检查2个节点是否可以通过无密钥相互访问,如果未配置,则进行SSH无密码公钥认证配置。
[root@slave ~]# ssh master
Last login: Mon Jan 21 02:33:16 2019 from 10.0.0.13

#########################
# Welcome to XianDian #
#########################

[root@master ~]# ssh slave
Last login: Mon Jan 21 02:33:25 2019 from 10.0.0.12

#########################
# Welcome to XianDian #
#########################
506.安装2个节点的JDK环境,其中jdk-8u77-linux-x64.tar.gz在XianDian-BigData-v2.1-BASE.iso软件包中。
[root@master ~]# java -version
java version “1.8.0_77”
Java™ SE Runtime Environment (build 1.8.0_77-b03)
Java HotSpot™ 64-Bit Server VM (build 25.77-b03, mixed mode)
[root@slave ~]# java -version
java version “1.8.0_77”
Java™ SE Runtime Environment (build 1.8.0_77-b03)
Java HotSpot™ 64-Bit Server VM (build 25.77-b03, mixed mode)

507.在master节点安装配置HTTP服务,将软件包XianDian-BigData-v2.1-BASE.iso中的HDP-2.4-BASE和HDP-UTILS-1.1.0.20拷贝到/var/www/html目录中,并启动HTTP服务。
[root@master ~]# ls /var/www/html/
HDP-2.6.1.0 HDP-UTILS-1.1.0.21

508.查询2个节点的yum源配置文件、JDK版本信息、slaver1节点同步master节点的命令及结果和HTTP服务的运行状态信息,以文本形式提交到答题框。
[root@master ~]# cat /etc/yum.repos.d/ambari.repo
[centos]
name=centos
baseurl=ftp://192.168.100.10/centos
gpgcheck=0
enable=1
[ambari]
name=ambari
baseurl=ftp://192.168.100.10/2.2/ambari
gpgcheck=0
enabled=1
[root@slave ~]# cat /etc/yum.repos.d/ambari.repo
[centos]
name=centos
baseurl=ftp://192.168.100.10/centos
gpgcheck=0
enable=1
[ambari]
name=ambari
baseurl=ftp://192.168.100.10/2.2/ambari
gpgcheck=0
enabled=1
[root@master ~]# java -version
java version “1.8.0_77”
Java™ SE Runtime Environment (build 1.8.0_77-b03)
Java HotSpot™ 64-Bit Server VM (build 25.77-b03, mixed mode)
[root@slave ~]# ntpdate master
21 Jan 02:50:01 ntpdate[12868]: adjust time server 10.0.0.12 offset -0.028440 sec
[root@master ~]# systemctl status httpd
httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled)
Active: active (running) since Sun 2019-01-20 06:55:47 UTC; 19h ago
Docs: man:httpd(8)
man:apachectl(8)
Main PID: 11732 (httpd)
Status: “Total requests: 168; Current requests/sec: 0; Current traffic: 0 B/sec”
CGroup: /system.slice/httpd.service
├─11732 /usr/sbin/httpd -DFOREGROUND
├─11734 /usr/sbin/httpd -DFOREGROUND
├─11735 /usr/sbin/httpd -DFOREGROUND
├─11736 /usr/sbin/httpd -DFOREGROUND
├─11737 /usr/sbin/httpd -DFOREGROUND
├─11738 /usr/sbin/httpd -DFOREGROUND
├─14012 /usr/sbin/httpd -DFOREGROUND
├─14013 /usr/sbin/httpd -DFOREGROUND
├─14014 /usr/sbin/httpd -DFOREGROUND
└─25143 /usr/sbin/httpd -DFOREGROUND

Jan 20 06:55:47 master httpd[11732]: AH00558: httpd: Could not reliably d…ge
Jan 20 06:55:47 master systemd[1]: Started The Apache HTTP Server.
Hint: Some lines were ellipsized, use -l to show in full.

509.在master节点上安装ambari-server服务和MariaDB数据库服务,创建ambari数据库和ambari用户,用户密码为bigdata。赋予ambari用户访问ambari数据库的权限,并导入/var/lib/ambari-server/resources/Ambari-DDL-MySQL-CREATE.sql文件至ambari数据库中。配置完成后安装mysql-connector-java软件包。查询master节点中ambari数据库中的所有表的列表信息,以文本形式提交查询结果到答题框中。
MariaDB [ambari]> show tables;
±------------------------------+
| Tables_in_ambari |
±------------------------------+
| ClusterHostMapping |
| QRTZ_BLOB_TRIGGERS |
| QRTZ_CALENDARS |
| QRTZ_CRON_TRIGGERS |
| QRTZ_FIRED_TRIGGERS |
| QRTZ_JOB_DETAILS |
| QRTZ_LOCKS |
| QRTZ_PAUSED_TRIGGER_GRPS |
| QRTZ_SCHEDULER_STATE |
| QRTZ_SIMPLE_TRIGGERS |
| QRTZ_SIMPROP_TRIGGERS |
| QRTZ_TRIGGERS |
| adminpermission |
| adminprincipal |
| adminprincipaltype |
| adminprivilege |
| adminresource |
| adminresourcetype |
| alert_current |
| alert_definition |
| alert_group |
| alert_group_target |
| alert_grouping |
| alert_history |
| alert_notice |
| alert_target |
| alert_target_states |
| ambari_operation_history |
| ambari_sequences |
| artifact |
| blueprint |
| blueprint_configuration |
| blueprint_setting |
| clusterconfig |
| clusters |
| clusterservices |
| clusterstate |
| confgroupclusterconfigmapping |
| configgroup |
| configgrouphostmapping |
| execution_command |
| extension |
| extensionlink |
| groups |
| host_role_command |
| host_version |
| hostcomponentdesiredstate |
| hostcomponentstate |
| hostconfigmapping |
| hostgroup |
| hostgroup_component |
| hostgroup_configuration |
| hosts |
| hoststate |
| kerberos_descriptor |
| kerberos_principal |
| kerberos_principal_host |
| key_value_store |
| members |
| metainfo |
| permission_roleauthorization |
| remoteambaricluster |
| remoteambariclusterservice |
| repo_version |
| request |
| requestoperationlevel |
| requestresourcefilter |
| requestschedule |
| requestschedulebatchrequest |
| role_success_criteria |
| roleauthorization |
| servicecomponent_version |
| servicecomponentdesiredstate |
| serviceconfig |
| serviceconfighosts |
| serviceconfigmapping |
| servicedesiredstate |
| setting |
| stack |
| stage |
| topology_host_info |
| topology_host_request |
| topology_host_task |
| topology_hostgroup |
| topology_logical_request |
| topology_logical_task |
| topology_request |
| upgrade |
| upgrade_group |
| upgrade_history |
| upgrade_item |
| users |
| viewentity |
| viewinstance |
| viewinstancedata |
| viewinstanceproperty |
| viewmain |
| viewparameter |
| viewresource |
| viewurl |
| widget |
| widget_layout |
| widget_layout_user_widget |
±------------------------------+
103 rows in set (0.00 sec)

510.在master节点上安装ambari-server服务和MariaDB数据库服务,创建ambari数据库和ambari用户,用户密码为bigdata。赋予ambari用户访问ambari数据库的权限,并导入/var/lib/ambari-server/resources/Ambari-DDL-MySQL-CREATE.sql文件至ambari数据库中。操作完成后进入MariaDB数据库,查询mysql数据库中user表中的文件内容,以文本形式提交查询结果到答题框中。
MariaDB [mysql]> select * from user;
±----------±-------±------------------------------------------±------------±------------±------------±------------±------------±----------±------------±--------------±-------------±----------±-----------±----------------±-----------±-----------±-------------±-----------±----------------------±-----------------±-------------±----------------±-----------------±-----------------±---------------±--------------------±-------------------±-----------------±-----------±-------------±-----------------------±---------±-----------±------------±-------------±--------------±------------±----------------±---------------------±-------±----------------------+
| Host | User | Password | Select_priv | Insert_priv | Update_priv | Delete_priv | Create_priv | Drop_priv | Reload_priv | Shutdown_priv | Process_priv | File_priv | Grant_priv | References_priv | Index_priv | Alter_priv | Show_db_priv | Super_priv | Create_tmp_table_priv | Lock_tables_priv | Execute_priv | Repl_slave_priv | Repl_client_priv | Create_view_priv | Show_view_priv | Create_routine_priv | Alter_routine_priv | Create_user_priv | Event_priv | Trigger_priv | Create_tablespace_priv | ssl_type | ssl_cipher | x509_issuer | x509_subject | max_questions | max_updates | max_connections | max_user_connections | plugin | authentication_string |
±----------±-------±------------------------------------------±------------±------------±------------±------------±------------±----------±------------±--------------±-------------±----------±-----------±----------------±-----------±-----------±-------------±-----------±----------------------±-----------------±-------------±----------------±-----------------±-----------------±---------------±--------------------±-------------------±-----------------±-----------±-------------±-----------------------±---------±-----------±------------±-------------±--------------±------------±----------------±---------------------±-------±----------------------+
| localhost | root | *C33A05FE652CA69965121A309F0DE7FA785D3916 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | | | | | 0 | 0 | 0 | 0 | | |
| master | root | *C33A05FE652CA69965121A309F0DE7FA785D3916 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | | | | | 0 | 0 | 0 | 0 | | |
| 127.0.0.1 | root | *C33A05FE652CA69965121A309F0DE7FA785D3916 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | | | | | 0 | 0 | 0 | 0 | | |
| ::1 | root | *C33A05FE652CA69965121A309F0DE7FA785D3916 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | | | | | 0 | 0 | 0 | 0 | | |
| localhost | ambari | *C33A05FE652CA69965121A309F0DE7FA785D3916 | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | | | | | 0 | 0 | 0 | 0 | | |
| % | ambari | *C33A05FE652CA69965121A309F0DE7FA785D3916 | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | | | | | 0 | 0 | 0 | 0 | | |
| localhost | hive | *C33A05FE652CA69965121A309F0DE7FA785D3916 | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | | | | | 0 | 0 | 0 | 0 | | |
| % | hive | *C33A05FE652CA69965121A309F0DE7FA785D3916 | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | | | | | 0 | 0 | 0 | 0 | | |
±----------±-------±------------------------------------------±------------±------------±------------±------------±------------±----------±------------±--------------±-------------±----------±-----------±----------------±-----------±-----------±-------------±-----------±----------------------±-----------------±-------------±----------------±-----------------±-----------------±---------------±--------------------±-------------------±-----------------±-----------±-------------±-----------------------±---------±-----------±------------±-------------±--------------±------------±----------------±---------------------±-------±----------------------+
8 rows in set (0.00 sec)

511.在master节点对ambari-server进行设置(ambari-server setup),指定JDK安装路径和数据库的主机、端口、用户、密码等参数,并启动ambari-server服务。配置完成后,通过curl命令在Linux Shell中查询http://master:8080界面内容,以文本形式提交查询结果到答题框中。
[root@master ~]# curl http://192.168.100.32:8080

<!--
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements.  See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership.  The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License.  You may obtain a copy of the License at
*
*     http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
-->
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <link rel="stylesheet" href="stylesheets/vendor.css">
  <link rel="stylesheet" href="stylesheets/app.css">
  <script src="javascripts/vendor.js"></script>
  <script src="javascripts/app.js"></script>
  <script>
      $(document).ready(function() {
          require('initialize');
          // make favicon work in firefox
          $('link[type*=icon]').detach().appendTo('head');
          $('#loading').remove();
      });
  </script>
  <title>先电大数据平台</title>
  <link rel="shortcut icon" href="/img/logo.png" type="image/x-icon">
</head>
<body>
    <div id="loading">...加载中...</div>
    <div id="wrapper">
    <!-- ApplicationView -->
    </div>
    <footer>
        <div class="container">
            <a href="http://www.1daoyun.com/" target="_blank">© 南京第五十五所技术开发有限公司 版权所有 版本号:V2.2</a>.<br>
            <a href="/licenses/NOTICE.txt" target="_blank">查看使用的第三方工具/资源,以及各自归属</a>
         </div>
    </footer>
</body>
</html>

512.在master节点对ambari-server进行设置(ambari-server setup),指定JDK安装路径和数据库的主机、端口、用户、密码等参数,并启动ambari-server服务。配置完成后,查询ambari-server的运行状态信息,以文本形式提交查询结果到答题框中。
[root@master ~]# ambari-server status
Using python /usr/bin/python
Ambari-server status
Ambari Server running
Found Ambari Server PID: 13714 at: /var/run/ambari-server/ambari-server.pid

513.在2台节点中安装ambari-agent服务,修改/etc/ambari-agent/conf/ambari-agent.ini文件server端主机位master节点,启动ambari-agent服务,查看agent端/var/log/ambari-agent/ambari-agent.log日志文件,以文本形式提交心跳连接发送成功的信号结果到答题框中。
[root@slave ~]# tail /var/log/ambari-agent/ambari-agent.log
INFO 2019-01-21 03:23:10,096 logger.py:75 - Testing the JVM’s JCE policy to see it if supports an unlimited key length.
INFO 2019-01-21 03:23:10,096 logger.py:75 - Testing the JVM’s JCE policy to see it if supports an unlimited key length.
INFO 2019-01-21 03:23:10,349 Hardware.py:176 - Some mount points were ignored: /, /dev, /dev/shm, /run, /sys/fs/cgroup
INFO 2019-01-21 03:23:10,351 Controller.py:320 - Sending Heartbeat (id = 76214)
INFO 2019-01-21 03:23:10,358 Controller.py:333 - Heartbeat response received (id = 76215)
INFO 2019-01-21 03:23:10,358 Controller.py:342 - Heartbeat interval is 1 seconds
INFO 2019-01-21 03:23:10,358 Controller.py:380 - Updating configurations from heartbeat
INFO 2019-01-21 03:23:10,358 Controller.py:389 - Adding cancel/execution commands
INFO 2019-01-21 03:23:10,358 Controller.py:406 - Adding recovery commands
INFO 2019-01-21 03:23:10,359 Controller.py:475 - Waiting 0.9 for next heartbeat

514.在先电大数据平台中创建Hadoop集群“XIANDIAN HDP”,选择安装栈为HDP 2.4,安装服务为HDFS、YARN+MapReduce2、Zookeeper、Ambari Metrics。安装完成后,在master节点和slaver节点的Linux Shell中查看Hadoop集群的服务进程信息,以文本形式提交查询结果到答题框中。
[root@slave ~]# jps
21344 SecondaryNameNode
4417 QuorumPeerMain
17971 Jps
18821 DataNode
20503 NodeManager
19689 ApplicationHistoryServer
20266 JobHistoryServer
20876 ResourceManager

515.在先电大数据平台中创建Hadoop集群“XIANDIAN HDP”,选择安装栈为HDP 2.4,安装服务为HDFS、YARN+MapReduce2、Zookeeper、Ambari Metrics。安装完成后,在Linux Shell中查看Hadoop集群的基本统计信息,以文本形式提交查询命令和查询结果到答题框中。
[root@master ~]# hdfs fsck /
Connecting to namenode via http://master:50070/fsck?ugi=root&path=%2F
FSCK started by root (auth:SIMPLE) from /10.0.0.12 for path / at Mon Jan 21 04:23:40 UTC 2019
.
/app-logs/ambari-qa/logs/application_1547971095325_0001/slave_45454_1547971260134: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741830_1006. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/app-logs/ambari-qa/logs/application_1547971095325_0002/master_45454_1547971343610: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741842_1018. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/app-logs/ambari-qa/logs/application_1547971095325_0002/slave_45454_1547971344834: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741843_1019. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/hdp/apps/2.6.1.0-129/hive/hive.tar.gz: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741847_1023. Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/hdp/apps/2.6.1.0-129/mapreduce/hadoop-streaming.jar: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741848_1024. Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/hdp/apps/2.6.1.0-129/mapreduce/mapreduce.tar.gz: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741825_1001. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).

/hdp/apps/2.6.1.0-129/mapreduce/mapreduce.tar.gz: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741826_1002. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/hdp/apps/2.6.1.0-129/pig/pig.tar.gz: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741845_1021. Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).

/hdp/apps/2.6.1.0-129/pig/pig.tar.gz: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741846_1022. Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/hdp/apps/2.6.1.0-129/tez/tez.tar.gz: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741844_1020. Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/mr-history/done/2019/01/20/000000/job_1547971095325_0002-1547971269040-ambari%2Dqa-word+count-1547971336333-1-1-SUCCEEDED-default-1547971292433.jhist: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741840_1016. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/mr-history/done/2019/01/20/000000/job_1547971095325_0002_conf.xml: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741841_1017. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/tmp/id000a0d00_date582019: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741827_1003. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/tmp/idtest.ambari-qa.1547973953.18.in: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741850_1026. Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/tmp/idtest.ambari-qa.1547973953.18.pig: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741849_1025. Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/user/ambari-qa/DistributedShell/application_1547971095325_0001/AppMaster.jar: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741828_1004. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/user/ambari-qa/DistributedShell/application_1547971095325_0001/shellCommands: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741829_1005. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/user/ambari-qa/mapredsmokeinput: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741831_1007. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).

/user/ambari-qa/mapredsmokeoutput/part-r-00000: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741838_1014. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
Status: HEALTHY
Total size: 502825041 B
Total dirs: 48
Total files: 18
Total symlinks: 0
Total blocks (validated): 19 (avg. block size 26464475 B)
Minimally replicated blocks: 19 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 19 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 5
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 33 (46.478874 %)
Number of data-nodes: 2
Number of racks: 1
FSCK ended at Mon Jan 21 04:23:40 UTC 2019 in 16 milliseconds

The filesystem under path ‘/’ is HEALTHY

516.检查master Node的主机名master,slaver1 Node的主机名slaver1。修改2个节点的hosts文件,配置IP地址与主机名之间的映射关系。查询2个节点的hosts文件的信息以文本形式提交到答题框。
[root@master ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.0.0.12 master
10.0.0.13 slave
[root@slave ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.0.0.12 master
10.0.0.13 slave

517.检查master节点安装的ntp时钟服务是否启动,并同步master节点时钟至slaver1节点。将同步命令及结果信息,以文本形式提交到答题框。
[root@slave ~]# ntpdate master
21 Jan 04:27:46 ntpdate[23426]: adjust time server 10.0.0.12 offset -0.146752 sec

518.检查master节点ambari-server的运行状态,如未启动,则启动ambari-server服务。使用curl命令在Linux Shell中查询http://master:8080界面内容,以文本形式提交查询结果到答题框中。
[root@master ~]# curl http://master:8080

<!--
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements.  See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership.  The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License.  You may obtain a copy of the License at
*
*     http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
-->


<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <link rel="stylesheet" href="stylesheets/vendor.css">
  <link rel="stylesheet" href="stylesheets/app.css">
  <script src="javascripts/vendor.js"></script>
  <script src="javascripts/app.js"></script>
  <script>
      $(document).ready(function() {
          require('initialize');
          // make favicon work in firefox
          $('link[type*=icon]').detach().appendTo('head');
          $('#loading').remove();
      });
  </script>
  <title>先电大数据平台</title>
  <link rel="shortcut icon" href="/img/logo.png" type="image/x-icon">
</head>
<body>
    <div id="loading">...加载中...</div>
    <div id="wrapper">
    <!-- ApplicationView -->
    </div>
    <footer>
        <div class="container">
            <a href="http://www.1daoyun.com/" target="_blank">© 南京第五十五所技术开发有限公司 版权所有 版本号:V2.2</a>.<br>
            <a href="/licenses/NOTICE.txt" target="_blank">查看使用的第三方工具/资源,以及各自归属</a>
         </div>
    </footer>
</body>
</html>

519.检查master节点ambari-server的运行状态,如未启动,则启动ambari-server服务。以文本形式提交ambari-server的运行状态信息到答题框中。
Using python /usr/bin/python
Ambari-server status
Ambari Server running
Found Ambari Server PID: 13714 at: /var/run/ambari-server/ambari-server.pid

520.检查slaver节点ambari-agent的运行状态,如未启动,则启动ambari-agent服务,查看agent端/var/log/ambari-agent/ambari-agent.log日志文件,以文本形式提交心跳连接发送成功的信号结果到答题框中。
[root@slave ~]# tail /var/log/ambari-agent/ambari-agent.log
INFO 2019-01-21 04:29:36,593 logger.py:75 - Testing the JVM’s JCE policy to see it if supports an unlimited key length.
INFO 2019-01-21 04:29:36,593 logger.py:75 - Testing the JVM’s JCE policy to see it if supports an unlimited key length.
INFO 2019-01-21 04:29:36,841 Hardware.py:176 - Some mount points were ignored: /, /dev, /dev/shm, /run, /sys/fs/cgroup
INFO 2019-01-21 04:29:36,844 Controller.py:320 - Sending Heartbeat (id = 80504)
INFO 2019-01-21 04:29:36,850 Controller.py:333 - Heartbeat response received (id = 80505)
INFO 2019-01-21 04:29:36,850 Controller.py:342 - Heartbeat interval is 1 seconds
INFO 2019-01-21 04:29:36,851 Controller.py:380 - Updating configurations from heartbeat
INFO 2019-01-21 04:29:36,851 Controller.py:389 - Adding cancel/execution commands
INFO 2019-01-21 04:29:36,851 Controller.py:475 - Waiting 0.9 for next heartbeat
INFO 2019-01-21 04:29:37,751 Controller.py:482 - Wait for next heartbeat over

521.启动成功后,分别在master节点和slaver节点的Linux Shell中查看Hadoop集群的服务进程信息,以文本形式提交查询结果到答题框中。
[root@slave ~]# jps
21344 SecondaryNameNode
23601 Jps
4417 QuorumPeerMain
18821 DataNode
20503 NodeManager
19689 ApplicationHistoryServer
20266 JobHistoryServer
20876 ResourceManager

522.启动成功后,在Linux Shell中查看Hadoop集群的基本统计信息,以文本形式提交查询命令和查询结果到答题框中。
[root@master ~]# hdfs fsck /
Connecting to namenode via http://master:50070/fsck?ugi=root&path=%2F
FSCK started by root (auth:SIMPLE) from /10.0.0.12 for path / at Mon Jan 21 04:31:09 UTC 2019
.
/app-logs/ambari-qa/logs/application_1547971095325_0001/slave_45454_1547971260134: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741830_1006. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/app-logs/ambari-qa/logs/application_1547971095325_0002/master_45454_1547971343610: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741842_1018. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/app-logs/ambari-qa/logs/application_1547971095325_0002/slave_45454_1547971344834: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741843_1019. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/hdp/apps/2.6.1.0-129/hive/hive.tar.gz: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741847_1023. Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/hdp/apps/2.6.1.0-129/mapreduce/hadoop-streaming.jar: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741848_1024. Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/hdp/apps/2.6.1.0-129/mapreduce/mapreduce.tar.gz: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741825_1001. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).

/hdp/apps/2.6.1.0-129/mapreduce/mapreduce.tar.gz: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741826_1002. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/hdp/apps/2.6.1.0-129/pig/pig.tar.gz: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741845_1021. Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).

/hdp/apps/2.6.1.0-129/pig/pig.tar.gz: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741846_1022. Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/hdp/apps/2.6.1.0-129/tez/tez.tar.gz: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741844_1020. Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/mr-history/done/2019/01/20/000000/job_1547971095325_0002-1547971269040-ambari%2Dqa-word+count-1547971336333-1-1-SUCCEEDED-default-1547971292433.jhist: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741840_1016. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/mr-history/done/2019/01/20/000000/job_1547971095325_0002_conf.xml: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741841_1017. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/tmp/id000a0d00_date582019: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741827_1003. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/tmp/idtest.ambari-qa.1547973953.18.in: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741850_1026. Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/tmp/idtest.ambari-qa.1547973953.18.pig: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741849_1025. Target Replicas is 5 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/user/ambari-qa/DistributedShell/application_1547971095325_0001/AppMaster.jar: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741828_1004. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/user/ambari-qa/DistributedShell/application_1547971095325_0001/shellCommands: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741829_1005. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/user/ambari-qa/mapredsmokeinput: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741831_1007. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).

/user/ambari-qa/mapredsmokeoutput/part-r-00000: Under replicated BP-1577517373-10.0.0.12-1547971048845:blk_1073741838_1014. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
Status: HEALTHY
Total size: 502825041 B
Total dirs: 48
Total files: 18
Total symlinks: 0
Total blocks (validated): 19 (avg. block size 26464475 B)
Minimally replicated blocks: 19 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 19 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 5
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 33 (46.478874 %)
Number of data-nodes: 2
Number of racks: 1
FSCK ended at Mon Jan 21 04:31:09 UTC 2019 in 7 milliseconds

The filesystem under path ‘/’ is HEALTHY

601.在HDFS文件系统的根目录下创建递归目录“1daoyun/file”,将附件中的BigDataSkills.txt文件,上传到1daoyun/file目录中,使用相关命令查看文件系统中1daoyun/file目录的文件列表信息,以文本形式提交以上操作命令和输出结果到答题框中。
[root@master ~]# hadoop fs -ls /1daoyun/file
Found 1 items
-rw-r–r-- 3 root hdfs 24811184 2019-01-12 08:23 /1daoyun/file/BigDataSkills.txt

602.在HDFS文件系统的根目录下创建递归目录“1daoyun/file”,将附件中的BigDataSkills.txt文件,上传到1daoyun/file目录中,并使用HDFS文件系统检查工具检查文件是否受损,以文本形式提交以上操作命令和输出结果到答题框中。
[root@master ~]# hadoop fsck /1daoyun/file/BigDataSkills.txt
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://master:50070/fsck?ugi=root&path=%2F1daoyun%2Ffile%2FBigDataSkills.txt
FSCK started by root (auth:SIMPLE) from /192.168.2.12 for path /1daoyun/file/BigDataSkills.txt at Sat Jan 12 08:42:55 CST 2019
.
/1daoyun/file/BigDataSkills.txt: Under replicated BP-1205401636-192.168.2.12-1547217372496:blk_1073742690_1867. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
Status: HEALTHY
Total size: 24811184 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 24811184 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 1 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 1 (33.333332 %)
Number of data-nodes: 2
Number of racks: 1
FSCK ended at Sat Jan 12 08:42:55 CST 2019 in 4 milliseconds

The filesystem under path ‘/1daoyun/file/BigDataSkills.txt’ is HEALTHY
603.在HDFS文件系统的根目录下创建递归目录“1daoyun/file”,将附件中的BigDataSkills.txt文件,上传到1daoyun/file目录中,上传过程指定BigDataSkills.txt文件在HDFS文件系统中的复制因子为2,并使用fsck工具检查存储块的副本数,以文本形式提交以上操作命令和输出结果到答题框中。
[root@master ~]# hadoop fs -D dfs.replication=2 -putBigDataSkills.txt /1daoyun/file
[root@master ~]# hadoop fsck /1daoyun/file/BigDataSkills.txt DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://master:50070/fsck?ugi=root&path=%2F1daoyun%2Ffile%2FBigDataSkills.txt
FSCK started by root (auth:SIMPLE) from /192.168.2.12 for path /1daoyun/file/BigDataSkills.txt at Sat Jan 12 08:44:57 CST 2019
.Status: HEALTHY
Total size: 24811184 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 24811184 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 2
Number of racks: 1
FSCK ended at Sat Jan 12 08:44:57 CST 2019 in 0 milliseconds

The filesystem under path ‘/1daoyun/file/BigDataSkills.txt’ is HEALTHY

604.HDFS文件系统的根目录下存在一个/apps的文件目录,要求开启该目录的可创建快照功能,并为该目录文件创建快照,快照名称为apps_1daoyun,使用相关命令查看该快照文件的列表信息,以文本形式提交以上操作命令和输出结果到答题框中。
[hdfs@master root]$ hadoop dfsadmin -allowSnapshot /apps
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Allowing snaphot on /apps succeeded
[hdfs@master root]$ hadoop fs -createSnapshot /apps apps_1daoyun
Created snapshot /apps/.snapshot/apps_1daoyun
[hdfs@master root]$ hadoop fs -ls /apps/.snapshot
Found 1 items
drwxrwxrwx - hdfs hdfs 0 2019-01-12 08:53 /apps/.snapshot/apps_1daoyun

605.HDFS文件系统的/user/root/small-file目录中存在一些小文件,要求使用Hadoop Arachive工具将这些小文件归档成为一个文件,文件名为xiandian-data.tar。归档完成后,查看xiandian-data.tar的列表信息,以文本形式提交以上操作命令和输出结果到答题框中。
[root@master ~]# hadoop archive -archiveName xiandian-data.har -p /user/ambari-qa /user/root
[root@master ~]# hadoop fs -ls /user/root/xiandian-data.har
Found 4 items
-rw-r–r-- 3 root hdfs 0 2019-01-12 09:05 /user/root/xiandian-data.har/_SUCCESS
-rw-r–r-- 3 root hdfs 959 2019-01-12 09:05 /user/root/xiandian-data.har/_index
-rw-r–r-- 3 root hdfs 23 2019-01-12 09:05 /user/root/xiandian-data.har/_masterindex
-rw-r–r-- 3 root hdfs 49747 2019-01-12 09:05 /user/root/xiandian-data.har/part-0

606.当Hadoop集群启动的时候,会首先进入到安全模式的状态,该模式默认30秒后退出。当系统处于安全模式时,只能对HDFS文件系统进行读取,无法进行写入修改删除等的操作。现假设需要对Hadoop集群进行维护,需要使集群进入安全模式的状态,并检查其状态。将集群进入安全模式和检查安全模式状态的操作命令以文本形式提交到答题框中。
[hdfs@master root]$ hadoop dfsadmin -safemode enter
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Safe mode is ON
[hdfs@master root]$ hadoop dfsadmin -safemode get
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Safe mode is ON

607.为了防止操作人员误删文件,HDFS文件系统提供了回收站的功能,但过多的垃圾文件会占用大量的存储空间。要求在先电大数据平台的WEB界面将HDFS文件系统回收站中的文件彻底删除的时间间隔为7天,以文本形式提交修改的文件名称、参数信息和参数值到答题框中。
高级 core-site

fs.trash.interval

10080

608.为了防止操作人员误删文件,HDFS文件系统提供了回收站的功能,但过多的垃圾文件会占用大量的存储空间。要求在Linux Shell中使用“vi”命令修改相应的配置文件以及参数信息,完成后,重启相应的服务。以文本形式提交以上操作命令和修改的参数信息到答题框中。
[root@master ~]# vi /etc/hadoop/2.6.1.0-129/0/core-site.xml

fs.trash.interval
10080

[root@master ~]# su - hdfs

Last login: Mon May 8 09:31:52 UTC 2017

[hdfs@master ~]$/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/confstop namenode

[hdfs@master ~]$/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/confstart namenode

[hdfs@master ~]$/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/confstop datanode

[hdfs@master ~]$/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/confstart datanode
609.为了防止操作人员误删文件,HDFS文件系统提供了回收站的功能,假设一名工程师发现自己前一天以root用户的身份不小心删除了HDFS文件系统中一个名为cetc55.txt的文件,现需要你使用find命令找到这个文件路径,并还原文件至原来的位置。完成后以文本形式提交以上操作命令和还原后的文件列表信息到答题框中。
[root@master ~]# hadoop fs -find / -name cetc55.txt
find: Permission denied: user=root, access=READ_EXECUTE, inode="/apps/falcon/extensions/mirroring":falcon:users:drwxrwx—
find: Permission denied: user=root, access=READ_EXECUTE, inode="/apps/hbase/staging":hbase:hdfs:drwx–x--x
find: Permission denied: user=root, access=READ_EXECUTE, inode="/ats/done":yarn:hadoop:drwx------
find: Permission denied: user=root, access=READ_EXECUTE, inode="/mr-history/done/2019/01/12":mapred:hadoop:drwxrwx—
find: Permission denied: user=root, access=READ_EXECUTE, inode="/tmp/hive/hive/721658a5-bd0e-4d51-95ce-db0cadd38cfb":hive:hdfs:drwx------
find: Permission denied: user=root, access=READ_EXECUTE, inode="/user/ambari-qa":ambari-qa:hdfs:drwxrwx—
find: Permission denied: user=root, access=READ_EXECUTE, inode="/webhdfs/v1":hive:hadoop:drwx------
[root@master ~]# hadoop fs -mv /user/root/.Trash/Current/cetc55.txt /

610.Hadoop集群中的主机在某些情况下会出现宕机或者系统损坏的问题,一旦遇到这些问题,HDFS文件系统中的数据文件难免会产生损坏或者丢失,为了保证HDFS文件系统的可靠性,现需要在先电大数据平台的WEB界面将集群的冗余复制因子修改为5,以文本形式提交修改的参数信息和参数值到答题框中。
Advanced
General
Block replication
5

611.Hadoop集群中的主机在某些情况下会出现宕机或者系统损坏的问题,一旦遇到这些问题,HDFS文件系统中的数据文件难免会产生损坏或者丢失,为了保证HDFS文件系统的可靠性,需要将集群的冗余复制因子修改为5,在Linux Shell中使用“vi”命令修改相应的配置文件以及参数信息,完成后,重启相应的服务。以文本形式提交以上操作命令和修改的参数信息到答题框中。
[root@master ~]# vi /etc/hadoop/2.6.1.0-129/0/hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>5</value>
</property>

612.在集群节点中/usr/hdp/2.4.3.0-227/hadoop-mapreduce/目录下,存在一个案例JAR包hadoop-mapreduce-examples.jar。运行JAR包中的PI程序来进行计算圆周率π的近似值,要求运行5次Map任务,每个Map任务的投掷次数为5,运行完成后以文本形式提交以上操作命令和输出结果到答题框中。
[root@master hadoop-mapreduce]# hadoop jar hadoop-mapreduce-examples.jar pi 5 5
Number of Maps = 5
Samples per Map = 5
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Starting Job
19/01/12 12:11:29 INFO client.RMProxy: Connecting to ResourceManager at slaver/192.168.2.13:8050
19/01/12 12:11:29 INFO client.AHSProxy: Connecting to Application History server at slaver/192.168.2.13:10200
19/01/12 12:11:31 INFO input.FileInputFormat: Total input paths to process : 5
19/01/12 12:11:31 INFO mapreduce.JobSubmitter: number of splits:5
19/01/12 12:11:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1547258293975_0001
19/01/12 12:11:32 INFO impl.YarnClientImpl: Submitted application application_1547258293975_0001
19/01/12 12:11:32 INFO mapreduce.Job: The url to track the job: http://slaver:8088/proxy/application_1547258293975_0001/
19/01/12 12:11:32 INFO mapreduce.Job: Running job: job_1547258293975_0001
19/01/12 12:11:52 INFO mapreduce.Job: Job job_1547258293975_0001 running in uber mode : false
19/01/12 12:11:52 INFO mapreduce.Job: map 0% reduce 0%
19/01/12 12:12:06 INFO mapreduce.Job: map 40% reduce 0%
19/01/12 12:12:19 INFO mapreduce.Job: map 40% reduce 13%
19/01/12 12:12:20 INFO mapreduce.Job: map 100% reduce 13%
19/01/12 12:12:24 INFO mapreduce.Job: map 100% reduce 100%
19/01/12 12:12:24 INFO mapreduce.Job: Job job_1547258293975_0001 completed successfully
19/01/12 12:12:24 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=116
FILE: Number of bytes written=888813
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1305
HDFS: Number of bytes written=215
HDFS: Number of read operations=23
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=5
Launched reduce tasks=1
Data-local map tasks=5
Total time spent by all maps in occupied slots (ms)=94972
Total time spent by all reduces in occupied slots (ms)=30068
Total time spent by all map tasks (ms)=94972
Total time spent by all reduce tasks (ms)=15034
Total vcore-milliseconds taken by all map tasks=94972
Total vcore-milliseconds taken by all reduce tasks=15034
Total megabyte-milliseconds taken by all map tasks=64770904
Total megabyte-milliseconds taken by all reduce tasks=20506376
Map-Reduce Framework
Map input records=5
Map output records=10
Map output bytes=90
Map output materialized bytes=140
Input split bytes=715
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=140
Reduce input records=10
Reduce output records=0
Spilled Records=20
Shuffled Maps =5
Failed Shuffles=0
Merged Map outputs=5
GC time elapsed (ms)=977
CPU time spent (ms)=11310
Physical memory (bytes) snapshot=2232025088
Virtual memory (bytes) snapshot=15569510400
Total committed heap usage (bytes)=2539126784
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=590
File Output Format Counters
Bytes Written=97
Job Finished in 55.522 seconds
Estimated value of Pi is 3.68000000000000000000

613.在集群节点中/usr/hdp/2.4.3.0-227/hadoop-mapreduce/目录下,存在一个案例JAR包hadoop-mapreduce-examples.jar。运行JAR包中的wordcount程序来对/1daoyun/file/BigDataSkills.txt文件进行单词计数,将运算结果输出到/1daoyun/output目录中,使用相关命令查询单词计数结果,以文本形式提交以上操作命令和输出结果到答题框中。
[root@master hadoop-mapreduce]# hadoop jar hadoop-mapreduce-examples.jar wordcount /1daoyun/file/BigDataSkills.txt /1daoyun/output
[root@master hadoop-mapreduce]# hadoop fs -cat /1daoyun/output/part-r-00000

614.在集群节点中/usr/hdp/2.4.3.0-227/hadoop-mapreduce/目录下,存在一个案例JAR包hadoop-mapreduce-examples.jar。运行JAR包中的sudoku程序来计算下表中数独运算题的结果。运行完成后以文本形式提交以上操作命令和输出结果到答题框中。

[root@master hadoop-mapreduce]# cat /opt/txt/puzzle.dta
8 ? ? ? ? ? ? ? ?
? ? 3 6 ? ? ? ? ?
? 7 ? ? 9 ? 2 ? ?
? 5 ? ? ? 7 ? ? ?
? ? ? ? 4 5 7 ? ?
? ? ? 1 ? ? ? 3 ?
? ? 1 ? ? ? ? 6 8
? ? 8 5 ? ? ? 1 ?
? 9 ? ? ? ? 4 ? ?
[root@master hadoop-mapreduce]# hadoop jar hadoop-mapreduce-examples-2.7.3.2.6.1.0-129.jar sudoku /root/puzzle.dta
Solving /root/puzzle.dta
8 6 1 5 7 2 3 9 4
5 2 4 3 8 9 1 7 6
3 7 9 1 4 6 5 8 2
4 3 6 2 5 8 9 1 7
7 9 8 6 3 1 2 4 5
1 5 2 4 9 7 8 6 3
2 4 7 9 1 5 6 3 8
9 8 5 7 6 3 4 2 1
6 1 3 8 2 4 7 5 9

Found 1 solutions

615.在集群节点中/usr/hdp/2.4.3.0-227/hadoop-mapreduce/目录下,存在一个案例JAR包hadoop-mapreduce-examples.jar。运行JAR包中的grep程序来统计文件系统中/1daoyun/file/BigDataSkills.txt文件中“Hadoop”出现的次数,统计完成后,查询统计结果信息。以文本形式提交以上操作命令和输出结果到答题框中。
[root@master hadoop-mapreduce]# hadoop jar hadoop-mapreduce-examples.jar grep /1daoyun/file/BigDataSkills.txt /1daoyun/output Hadoop
[root@master hadoop-mapreduce]# hadoop fs -cat /1daoyun/output/*

616.启动先电大数据平台的Hbase数据库,其中要求使用master节点的RegionServer。在Linux Shell中启动Hbase shell,查看HBase的版本信息。将以上操作命令(相关数据库命令语言请全部使用小写格式)以文本形式提交到答题框。
[root@master ~]# hbase shell
HBase Shell; enter ‘help’ for list of supported commands.
Type “exit” to leave the HBase Shell
Version 1.1.2.2.6.1.0-129, r718c773662346de98a8ce6fd3b5f64e279cb87d4, Wed May 31 03:27:31 UTC 2017

hbase(main):004:0> version
1.1.2.2.6.1.0-129, r718c773662346de98a8ce6fd3b5f64e279cb87d4, Wed May 31 03:27:31 UTC 2017

617.启动先电大数据平台的Hbase数据库,其中要求使用master节点的RegionServer。在Linux Shell中启动Hbase shell,查看HBase的状态信息。将以上操作命令(相关数据库命令语言请全部使用小写格式)以文本形式提交到答题框。
[root@master ~]# hbase shell
hbase(main):015:0> status
1 active master, 0 backup masters, 1 servers, 0 dead, 4.0000 average load

618.启动先电大数据平台的Hbase数据库,其中要求使用master节点的RegionServer。在Linux Shell中启动Hbase shell,查看进入HBase shell的当前系统用户。将以上操作命令(相关数据库命令语言请全部使用小写格式)以文本形式提交到答题框。
[root@master ~]# hbase shell
hbase(main):019:0> whoami
root (auth:SIMPLE)
groups: root

619.在HBase数据库中创建表xiandian_user,列族为info,创建完成后查看xiandian_user表的描述信息。将以上操作命令(相关数据库命令语言请全部使用小写格式)以文本形式提交到答题框。
hbase(main):036:0> create ‘xiandian_user’,‘info’
0 row(s) in 7.4640 seconds
=> Hbase::Table - xiandian_user

hbase(main):045:0> describe ‘xiandian_user’
Table xiandian_user is ENABLED
xiandian_user
COLUMN FAMILIES DESCRIPTION
{NAME => ‘info’, BLOOMFILTER => ‘ROW’, VERSIONS => ‘1’, IN_MEMORY => ‘false’, KE
EP_DELETED_CELLS => ‘FALSE’, DATA_BLOCK_ENCODING => ‘NONE’, TTL => ‘FOREVER’, CO
MPRESSION => ‘NONE’, MIN_VERSIONS => ‘0’, BLOCKCACHE => ‘true’, BLOCKSIZE => ‘65
536’, REPLICATION_SCOPE => ‘0’}
1 row(s) in 0.4000 seconds

620.开启HBase的安全认证功能,在HBase Shell中设置root用户拥有表xiandian_user的读写与执行的权限,设置完成后,使用相关命令查看其权限信息。将开启HBase的安全认证功能的参数和参数值以及以上操作命令(相关数据库命令语言请全部使用小写格式)和查询结果以文本形式提交到答题框。
参数 hbase.security.authorization
参数值 true
hbase(main):001:0> grant ‘root’,‘RWX’,‘xiandian_user’
hbase(main):002:0> user_permission ‘xiandian_user’

621.使用HBase数据库中org.apache.hadoop.hbase.mapreduce.ImportTsv类将附件中xiandian_user.csv文件生成Hfile文件,然后将生成的Hfile文件使用org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles类导入到HBase数据库的xiandian_user表中。使用scan命令查看xiandian_user表中的数据,将以上操作命令(相关数据库命令语言请全部使用小写格式)和查询结果以文本形式提交到答题框。

622.在HBase Shell中使用get命令查询xiandian_user表中rowkey为88的info信息,将以上操作命令(相关数据库命令语言请全部使用小写格式)和查询结果以文本形式提交到答题框。

623.在HBase Shell中统计xiandian_user表中的行数,要求统计的行数间隔为100,统计的数据缓存为500,将以上操作命令(相关数据库命令语言请全部使用小写格式)和查询结果以文本形式提交到答题框。

624.进入HBase Shell,在xiandian_user表中插入数据,其rowkey为620,info:age为58,info:name为user620,插入完成后,使用get命令查询插入的信息。将以上操作命令(相关数据库命令语言请全部使用小写格式)和查询结果以文本形式提交到答题框。

625.进入HBase Shell,删除xiandian_user表中rowkey为73,关于info:age的数据,删除后,使用get命令查询rowkey为73的数据信息。将以上操作命令(相关数据库命令语言请全部使用小写格式)和查询结果以文本形式提交到答题框。

626.启动先电大数据平台的Hive数据仓库,启动Hvie 客户端,通过Hive查看hadoop所有文件路径(相关数据库命令语言请全部使用小写格式),将查询结果以文本形式提交到答题框中。
hive> dfs -ls;
Found 4 items
drwx------ - root hdfs 0 2019-01-12 08:31 .Trash
drwxr-xr-x - root hdfs 0 2019-01-13 17:42 .hiveJars
drwx------ - root hdfs 0 2019-01-12 12:34 .staging
drwxr-xr-x - root hdfs 0 2019-01-12 09:05 xiandian-data.har

627.使用Hive工具来创建数据表xd_phy_course,将phy_course_xd.txt导入到该表中,其中xd_phy_course表的数据结构如下表所示。导入完成后,通过hive查询数据表xd_phy_course中数据在HDFS所处的文件位置列表信息,将以上操作命令(相关数据库命令语言请全部使用小写格式)和输出结果以文本形式提交到答题框。
stname(string) stID(int) class(string) opt_cour(string)
hive> create table xd_phy_course(stname string,stID int,class string,opt_cour string) row format delimited fields terminated by ‘\t’ lines terminated by ‘\n’ stored as textfile;
OK
Time taken: 13.391 seconds
hive> load data local inpath ‘/opt/txt/phy_course_xd.txt’ into table xd_phy_course;
Loading data to table default.xd_phy_course
Table default.xd_phy_course stats: [numFiles=1, numRows=0, totalSize=450, rawDataSize=0]
OK
Time taken: 1.177 seconds
hive> dfs -ls /apps/hive/warehouse;
Found 1 items
drwxrwxrwx - root hadoop 0 2019-01-14 21:53 /apps/hive/warehouse/xd_phy_course

628.使用Hive工具来创建数据表xd_phy_course,并定义该表为外部表,外部存储位置为/1daoyun/data/hive,将phy_course_xd.txt导入到该表中,其中xd_phy_course表的数据结构如下表所示。导入完成后,在hive中查询数据表xd_phy_course的数据结构信息,将以上操作命令(相关数据库命令语言请全部使用小写格式)和输出结果以文本形式提交到答题框。
stname(string) stID(int) class(string) opt_cour(string)
hive> create table xd_phy_course(stname string,stID int,class string,opt_cour string) row format delimited fields terminated by ‘\t’ lines terminated by ‘\n’ location ‘/qdaoyun/data/hive’;
OK
Time taken: 0.278 seconds
hive> load data local inpath ‘/opt/txt/phy_course_xd.txt’ into table xd_phy_course;
Loading data to table default.xd_phy_course
Table default.xd_phy_course stats: [numFiles=1, totalSize=450]
OK
Time taken: 0.945 seconds
hive> desc xd_phy_course;
OK
stname string
stid int
class string
opt_cour string
Time taken: 0.495 seconds, Fetched: 4 row(s)

629.使用Hive工具来查找出phy_course_xd.txt文件中某高校Software_1403班级报名选修volleyball的成员所有信息,其中phy_course_xd.txt文件数据结构如下表所示,选修科目字段为opt_cour,班级字段为class,将以上操作命令(相关数据库命令语言请全部使用小写格式)和输出结果以文本形式提交到答题框。
stname(string) stID(int) class(string) opt_cour(string)
hive> create table xd_phy_course(stname string,stID int,class string,opt_cour string) row format delimited fields terminated by ‘\t’ lines terminated by ‘\n’;
OK
Time taken: 0.646 seconds
hive> load data local inpath ‘/opt/txt/phy_course_xd.txt’ into table xd_phy_course;
Loading data to table default.xd_phy_course
Table default.xd_phy_course stats: [numFiles=1, numRows=0, totalSize=450, rawDataSize=0]
OK
Time taken: 0.999 seconds
hive> select * from xd_phy_course where class=‘Software_1403’ and opt_cour=‘volleyball’;
OK
student409 10120408 Software_1403 volleyball
student411 10120410 Software_1403 volleyball
student413 10120412 Software_1403 volleyball
student419 10120418 Software_1403 volleyball
student421 10120420 Software_1403 volleyball
student422 10120421 Software_1403 volleyball
student424 10120423 Software_1403 volleyball
student432 10120431 Software_1403 volleyball
student438 10120437 Software_1403 volleyball
student447 10120446 Software_1403 volleyball
Time taken: 0.98 seconds, Fetched: 10 row(s)

630.使用Hive工具来统计phy_course_xd.txt文件中某高校报名选修各个体育科目的总人数,其中phy_course_xd.txt文件数据结构如下表所示,选修科目字段为opt_cour,将统计的结果导入到表phy_opt_count中,通过SELECT语句查询表phy_opt_count内容,将统计语句以及查询命令(相关数据库命令语言请全部使用小写格式)和输出结果以文本形式提交到答题框。
stname(string) stID(int) class(string) opt_cour(string)
hive> create table xd_phy_course(stname string,stID int,class string,opt_cour string) row format delimited fields terminated by ‘\t’ lines terminated by ‘\n’;
OK
Time taken: 0.225 seconds
hive> load data local inpath ‘/opt/txt/phy_course_xd.txt’ into table xd_phy_course;
Loading data to table default.xd_phy_course
Table default.xd_phy_course stats: [numFiles=1, numRows=0, totalSize=450, rawDataSize=0]
OK
Time taken: 0.91 seconds
hive> create table phy_opt_count(opt_cour string,cour_count int) row format delimited fields terminated by ‘\t’ lines terminated by ‘\n’;
OK
Time taken: 0.206 seconds
hive> insert overwrite table phy_opt_count select xd_phy_course.opt_cour,count(distinct xd_phy_course.stID) from xd_phy_course group by xd_phy_course.opt_cour;
Query ID = root_20190115051024_6c7a70fe-a7b0-49a8-b8ab-cc7ee2854c47
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening…
Session re-established.
Status: Running (Executing on YARN cluster with App id application_1547338155253_0005)

Loading data to table default.phy_opt_count
Table default.phy_opt_count stats: [numFiles=1, numRows=1, totalSize=14, rawDataSize=13]
OK
Time taken: 35.618 seconds
hive> select * from phy_opt_count;
OK
volleyball 10
Time taken: 0.094 seconds, Fetched: 1 row(s)

631.使用Hive工具来查找出phy_course_score_xd.txt文件中某高校Software_1403班级体育选修成绩在90分以上的成员所有信息,其中phy_course_score_xd.txt文件数据结构如下表所示,选修科目字段为opt_cour,成绩字段为score,将以上操作命令(相关数据库命令语言请全部使用小写格式)和输出结果以文本形式提交到答题框。
stname(string) stID(int) class(string) opt_cour(string) score(float)
hive> create table phy_course_score_xd(stname string,stID int,class string,opt_cour string,score float) row format delimited fields terminated by ‘\t’ lines terminated by ‘\n’;
OK
Time taken: 0.202 seconds
hive> load data local inpath ‘/opt/txt/phy_course_score_xd.txt’ into table phy_course_score_xd;
Loading data to table default.phy_course_score_xd
Table default.phy_course_score_xd stats: [numFiles=1, numRows=0, totalSize=354, rawDataSize=0]
OK
Time taken: 0.836 seconds
hive> select * from phy_course_score_xd where class=‘Software_1403’ and score>90;
OK
student433 10120432 Software_1403 football 98.0
student444 10120443 Software_1403 swimming 99.0
student445 10120444 Software_1403 tabletennis 97.0
student450 10120449 Software_1403 basketball 97.0
Time taken: 0.087 seconds, Fetched: 4 row(s)

632.使用Hive工具来统计phy_course_score_xd.txt文件中某高校各个班级体育课的平均成绩,使用round函数保留两位小数。其中phy_course_score_xd.txt文件数据结构如下表所示,班级字段为class,成绩字段为score,将以上操作命令(相关数据库命令语言请全部使用小写格式)和输出结果以文本形式提交到答题框。
stname(string) stID(int) class(string) opt_cour(string) score(float)
hive> select class,round(avg(score)) from phy_course_score_xd group by class;
Query ID = root_20190115054732_ff2d91be-9eb8-44e1-b824-0fcf736bb512
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1547338155253_0006)

OK
Software_1403 98.0
Software_1403 badminton NULL
Software_1403 tabletennis NULL
Software_1403 volleyball NULL
Time taken: 6.85 seconds, Fetched: 4 row(s)

633.使用Hive工具来统计phy_course_score_xd.txt文件中某高校各个班级体育课的最高成绩。其中phy_course_score_xd.txt文件数据结构如下表所示,班级字段为class,成绩字段为score,将以上操作命令(相关数据库命令语言请全部使用小写格式)和输出结果以文本形式提交到答题框。
stname(string) stID(int) class(string) opt_cour(string) score(float)
hive> select class,max(score) from phy_course_score_xd group by class;
Query ID = root_20190115054845_87524d45-31bb-46ea-8ed4-173fc84c1b25
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1547338155253_0006)

OK
Software_1403 99.0
Software_1403 badminton NULL
Software_1403 tabletennis NULL
Software_1403 volleyball NULL
Time taken: 16.028 seconds, Fetched: 4 row(s)

634.在Hive数据仓库将网络日志weblog_entries.txt中分开的request_date和request_time字段进行合并,并以一个下划线“_”进行分割,如下图所示,其中weblog_entries.txt的数据结构如下表所示。将以上操作命令(相关数据库命令语言请全部使用小写格式)和后十行输出结果以文本形式提交到答题框。

md5(STRING) url(STRING) request_date (STRING) request_time (STRING) ip(STRING)
hive> create table weblog_entries(md5 string,url string,request_data string,request_time string,ip string) row format delimited fields terminated by ‘\t’ lines terminated by ‘\n’ location ‘/data/hive/weblog/’;
OK
Time taken: 0.384 seconds
hive> load data local inpath ‘/opt/txt/weblog_entries.txt’ into table weblog_entries;
Loading data to table default.weblog_entries
Table default.weblog_entries stats: [numFiles=1, totalSize=251130]
OK
Time taken: 0.868 seconds
hive> select concat_ws(’_’,request_data,request_time) from weblog_entries;
2012-05-10_21:20:51
2012-05-10_21:34:54
2012-05-10_21:23:00
2012-05-10_21:10:22
2012-05-10_21:18:48
2012-05-10_21:12:25
2012-05-10_21:29:01
2012-05-10_21:13:47
2012-05-10_21:12:37
2012-05-10_21:34:20
2012-05-10_21:27:00
2012-05-10_21:33:53
2012-05-10_21:10:19
2012-05-10_21:12:05
2012-05-10_21:25:58
2012-05-10_21:34:28
Time taken: 0.123 seconds, Fetched: 3000 row(s)

635.在Hive数据仓库将网络日志weblog_entries.txt中的IP字段与ip_to_country中IP对应的国家进行简单的内链接,输出结果如下图所示,其中weblog_entries.txt的数据结构如下表所示。将以上操作命令(相关数据库命令语言请全部使用小写格式)和后十行输出结果以文本形式提交到答题框。

md5(STRING) url(STRING) request_date (STRING) request_time (STRING) ip(STRING)
hive> create table ip_to_country(ip string,country string) row format delimited fields terminated by ‘\t’ lines terminated by ‘\n’ location ‘/data/hive/ip_to_country/’;
OK
Time taken: 0.211 seconds
hive> load data local inpath ‘/opt/txt/ip_to_country.txt’ into table ip_to_country;
Loading data to table default.ip_to_country
Table default.ip_to_country stats: [numFiles=1, totalSize=7552856]
OK
Time taken: 5.443 seconds
hive> select wle.*,itc.country from weblog_entries wle join ip_to_country itc on wle.ip=itc.ip;
Query ID = root_20190115100452_7e9f9e8f-8c18-4546-83cf-3557d8ff9bbf
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening…
Session re-established.
Status: Running (Executing on YARN cluster with App id application_1547338155253_0009)

OK
Time taken: 24.07 seconds

636.使用Hive动态地关于网络日志weblog_entries.txt的查询结果创建Hive表。通过创建一张名为weblog_entries_url_length的新表来定义新的网络日志数据库的三个字段,分别是url,request_date,request_time。此外,在表中定义一个获取url字符串长度名为“url_length”的新字段,其中weblog_entries.txt的数据结构如下表所示。完成后查询weblog_entries_url_length表文件内容,将以上操作命令(相关数据库命令语言请全部使用小写格式)和后十行输出结果以文本形式提交到答题框。
md5(STRING) url(STRING) request_date (STRING) request_time (STRING) ip(STRING)
hive> create table weblog_entries_url_length as select url,request_data,request_time,length(url) as url_length from weblog_entries;
Query ID = root_20190115101234_1b0ca195-da6a-481f-9276-52812e2ff32d
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1547338155253_0009)

Moving data to directory hdfs://master:8020/apps/hive/warehouse/weblog_entries_url_length
Table default.weblog_entries_url_length stats: [numFiles=1, numRows=3000, totalSize=121379, rawDataSize=118379]
OK
Time taken: 8.019 seconds
hive> select * from weblog_entries_url_length;
/apliivnfonuq.html 2012-05-10 21:20:51 18
/cvjcxq.html 2012-05-10 21:34:54 12
/oduuw.html 2012-05-10 21:23:00 11
/uytd.html 2012-05-10 21:10:22 10
/frpnqyqqa.html 2012-05-10 21:18:48 15
/n.html 2012-05-10 21:12:25 7
/qnrxlxqacgiudbtfggcg.html 2012-05-10 21:29:01 26
/sbbiuot.html 2012-05-10 21:13:47 13
/ofxi.html 2012-05-10 21:12:37 10
/hjmdhaoogwqhp.html 2012-05-10 21:34:20 19
/angjbmea.html 2012-05-10 21:27:00 14
/mmdttqsnjfifkihcvqu.html 2012-05-10 21:33:53 25
/eorxuryjadhkiwsf.html 2012-05-10 21:10:19 22
/e.html 2012-05-10 21:12:05 7
/khvc.html 2012-05-10 21:25:58 10
/c.html 2012-05-10 21:34:28 7
Time taken: 0.087 seconds, Fetched: 3000 row(s)

637.在master和slaver节点安装Sqoop Clients,完成后,在master节点查看Sqoop的版本信息,将操作命令和输出结果以文本形式提交到答题框中。
[root@master ~]# sqoop version
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.1.0-129/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.1.0-129/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/01/15 10:33:58 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.6.1.0-129
Sqoop 1.4.6.2.6.1.0-129
git commit id 99af1205a99646445a9c7254ad2770706e1cc6a4
Compiled by jenkins on Wed May 31 03:22:43 UTC 2017

638.使用Sqoop工具列出master节点中MySQL中所有数据库,将操作命令和输出结果以文本形式提交到答题框中。
[root@master ~]# sqoop list-databases --connect jdbc:mysql://localhost --username root --password bigdata
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.1.0-129/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.1.0-129/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/01/15 10:38:02 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.6.1.0-129
19/01/15 10:38:02 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/01/15 10:38:03 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
ambari
hive
mysql
oozie
performance_schema

639.使用Sqoop工具列出master节点中MySQL中ambari数据库中所有的数据表,将操作命令和输出结果以文本形式提交到答题框中。
[root@master ~]# sqoop list-tables --connect jdbc:mysql://localhost/ambari --username root --password bigdata
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.1.0-129/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.1.0-129/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/01/15 10:39:31 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.6.1.0-129
19/01/15 10:39:31 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/01/15 10:39:31 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
ClusterHostMapping
QRTZ_BLOB_TRIGGERS
QRTZ_CALENDARS
QRTZ_CRON_TRIGGERS
QRTZ_FIRED_TRIGGERS
QRTZ_JOB_DETAILS
QRTZ_LOCKS

640.在MySQL中创建名为xiandian的数据库,在xiandian数据库中创建xd_phy_course数据表,其数据表结构如表1所示。使用Hive工具来创建数据表xd_phy_course,将phy_course_xd.txt导入到该表中,其中xd_phy_course表的数据结构如表2所示。使用Sqoop工具将hive数据仓库中的xd_phy_course表导出到master节点的MySQL中xiandain数据库的xd_phy_course表。将以上操作命令和输出结果以文本形式提交到答题框中。
表1
stname VARCHAR(20) stID INT(1) class VARCHAR(20) opt_cour VARCHAR(20)
表2
stname(string) stID(int) class(string) opt_cour(string)
[root@master ~]# mysql -uroot -pbigdata
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 195
Server version: 5.5.44-MariaDB MariaDB Server

Copyright © 2000, 2015, Oracle, MariaDB Corporation Ab and others.

Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement.

MariaDB [(none)]> create database xiandian;
Query OK, 1 row affected (0.01 sec)

MariaDB [(none)]> use xiandian;
Database changed
MariaDB [xiandian]> create table xd_phy_course(stname varchar(20),stID int(1),class varchar(20),opt_cour varchar(20));
Query OK, 0 rows affected (0.03 sec)
hive> create table xd_phy_course3(stname string,stID int,class string,opt_cour string) row format delimited fields terminated by ‘\t’ lines terminated by ‘\n’;
OK
Time taken: 2.773 seconds
hive> load data local inpath ‘/opt/txt/phy_course_xd.txt’ into table xd_phy_course3;
Loading data to table default.xd_phy_course3
Table default.xd_phy_course3 stats: [numFiles=1, numRows=0, totalSize=450, rawDataSize=0]
OK
Time taken: 0.853 seconds

641.在Hive中创建xd_phy_course数据表,其数据表结构如下表所示。使用Sqoop工具将MySQL中xiandian数据库下xd_phy_course表导入到Hive数据仓库中的xd_phy_course表中。
stname(string) stID(int) class(string) opt_cour(string)
[root@master ~]# hive

WARNING: Use “yarn jar” to launch YARNapplications.

Logging initialized using configuration in file:/etc/hive/2.4.3.0-227/0/hive-log4j.properties

hive> create table xd_phy_course4 (stnamestring,stID int,class string,opt_cour string) row format delimited fieldsterminated by ‘\t’ lines terminated by ‘\n’;

OK

Time taken: 2.329 seconds

[root@master ~]# sqoop import --connectjdbc:mysql://localhost:3306/xiandian --username root --password bigdata --tablexd_phy_course --hive-import --hive-overwrite --hive-table xd_phy_course4 -m 1–fields-terminated-by ‘\t’ --lines-terminated-by ‘\n’

642.在master节点安装Pig Clients,打开Linux Shell以MapReduce 模式启动它的Grunt,将启动命令和启动结果以文本形式提交到答题框中。
[root@master ~]# pig -x mapreduce
19/01/15 12:11:19 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
19/01/15 12:11:19 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
19/01/15 12:11:19 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2019-01-15 12:11:19,262 [main] INFO org.apache.pig.Main - Apache Pig version 0.16.0.2.6.1.0-129 (rexported) compiled May 31 2017, 03:39:20
2019-01-15 12:11:19,262 [main] INFO org.apache.pig.Main - Logging error messages to: /root/pig_1547525479260.log
2019-01-15 12:11:19,293 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
2019-01-15 12:11:19,962 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master:8020
2019-01-15 12:11:20,771 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-2b845d6b-dda1-446f-8bb0-4c0f1fccfd40
2019-01-15 12:11:21,272 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://slaver:8188/ws/v1/timeline/
2019-01-15 12:11:21,414 [main] INFO org.apache.pig.backend.hadoop.PigATSClient - Created ATS Hook
grunt>
643.在master节点安装Pig Clients,打开Linux Shell以Local 模式启动它的Grunt,将启动命令和启动结果以文本形式提交到答题框中。
[root@master ~]# pig -x local
19/01/15 12:10:29 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
19/01/15 12:10:29 INFO pig.ExecTypeProvider: Picked LOCAL as the ExecType
2019-01-15 12:10:29,354 [main] INFO org.apache.pig.Main - Apache Pig version 0.16.0.2.6.1.0-129 (rexported) compiled May 31 2017, 03:39:20
2019-01-15 12:10:29,355 [main] INFO org.apache.pig.Main - Logging error messages to: /root/pig_1547525429353.log
2019-01-15 12:10:29,394 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
2019-01-15 12:10:29,784 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2019-01-15 12:10:30,188 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-310a210d-52ce-4823-8cab-8129fb97da01
2019-01-15 12:10:30,189 [main] WARN org.apache.pig.PigServer - ATS is disabled since yarn.timeline-service.enabled set to false
grunt>

644.使用Pig工具在Local模式计算系统日志access_log.txt中的IP的点击数,要求使用GROUP BY语句按照IP进行分组,通过FOREACH 运算符,对关系的列进行迭代,统计每个分组的总行数,最后使用DUMP语句查询统计结果。将查询命令和查询结果以文本形式提交到答题框中。
[root@master ~]# pig -x local
19/01/15 12:10:29 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
19/01/15 12:10:29 INFO pig.ExecTypeProvider: Picked LOCAL as the ExecType
2019-01-15 12:10:29,354 [main] INFO org.apache.pig.Main - Apache Pig version 0.16.0.2.6.1.0-129 (rexported) compiled May 31 2017, 03:39:20
2019-01-15 12:10:29,355 [main] INFO org.apache.pig.Main - Logging error messages to: /root/pig_1547525429353.log
2019-01-15 12:10:29,394 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
2019-01-15 12:10:29,784 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2019-01-15 12:10:30,188 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-310a210d-52ce-4823-8cab-8129fb97da01
2019-01-15 12:10:30,189 [main] WARN org.apache.pig.PigServer - ATS is disabled since yarn.timeline-service.enabled set to false
grunt> copyFromLocal /opt/txt/access.txt /user/root/input/log1.txt
grunt> A = LOAD ‘/user/root/input/log1.txt’ USING PigStorage (’ ') as (ip,others);
grunt> group_ip = group A by ip;
grunt> result = foreach group_ip generate group,COUNT(A);
grunt> dump result;
(58.248.201.125,1)
(60.216.140.121,1)
(61.151.206.221,4)
(117.184.250.100,4)
(180.163.220.100,1)
(180.163.220.124,1)
(180.163.220.125,1)
(180.163.220.126,1)
(195.208.221.237,2)

645.使用Pig工具计算天气数据集temperature.txt中年度最高气温,要求使用GROUP BY语句按照year进行分组,通过FOREACH 运算符,对关系的列进行迭代,统计每个分组的最大值,最后使用DUMP语句查询计算结果。将以上查询命令和查询结果以文本形式提交到答题框中。
grunt> copyFromLocal /opt/txt/temperature.txt /user/root/temprature.txt
grunt> A = LOAD ‘/user/root/temprature.txt’ USING PigStorage(’ ') AS (year:int,temperature:int);
grunt> B = GROUP A BY year;
grunt> C = FOREACH B GENERATE group,MAX(A.temperature);
grunt> dump C;
Vertex Stats:
VertexId Parallelism TotalTasks InputRecords ReduceInputRecords OutputRecords FileBytesRead FileBytesWritten HdfsBytesRead HdfsBytesWritten Alias Feature Outputs
scope-20 1 1 357 0 357 32 61 2852 0 A,B,C
scope-21 1 1 0 1 1 61 0 0 6 C GROUP_BY hdfs://master:8020/tmp/temp1707840247/tmp-1432654154,

Input(s):
Successfully read 357 records (2852 bytes) from: “/user/root/temprature.txt”

Output(s):
Successfully stored 1 records (6 bytes) in: “hdfs://master:8020/tmp/temp1707840247/tmp-1432654154”

2019-01-15 18:10:37,781 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2019-01-15 18:10:37,781 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1

646.使用Pig工具统计数据集ip_to_country中每个国家的IP地址数。要求使用GROUP BY语句按照国家进行分组,通过FOREACH 运算符,对关系的列进行迭代,统计每个分组的IP地址数目,最后将统计结果保存到/data/pig/output目录中,并查看数据结果。将以上操作命令和查询结果以文本形式提交到答题框中。
grunt> copyFromLocal /opt/txt/ip_to_country.txt /user/root/ip_to_country.txt
grunt> ip_countries = LOAD ‘/user/root/ip_to_country.txt’ AS (ip:chararray,country:chararray);
grunt> country_grpd = GROUP ip_countries BY country;
grunt> country_counts = FOREACH country_grpd GENERATE FLATTEN(group),COUNT(ip_countries) as counts;
grunt> STORE country_counts INTO ‘/data/pig/output’;
Vertex Stats:
VertexId Parallelism TotalTasks InputRecords ReduceInputRecords OutputRecords FileBytesRead FileBytesWritten HdfsBytesRead HdfsBytesWritten Alias Feature Outputs
scope-19 1 1 248284 0 248284 32 1935 3922915 0 country_counts,country_grpd,ip_countries
scope-20 1 1 0 246 246 1935 0 0 1618 country_counts GROUP_BY /data/pig/output,

Input(s):
Successfully read 248284 records (3922915 bytes) from: “/user/root/ip_to_country.txt”

Output(s):
Successfully stored 246 records (1618 bytes) in: “/data/pig/output”

647.在master节点安装Mahout Client,打开Linux Shell运行mahout命令查看Mahout自带的案例程序,将查询结果以文本形式提交到答题框中。
[root@master ~]# mahout
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /usr/hdp/2.6.1.0-129/hadoop/bin/hadoop and HADOOP_CONF_DIR=/usr/hdp/2.6.1.0-129/hadoop/conf
MAHOUT-JOB: /usr/hdp/2.6.1.0-129/mahout/mahout-examples-0.9.0.2.6.1.0-129-job.jar
An example program must be given as the first argument.
Valid program names are:
arff.vector: : Generate Vectors from an ARFF file or directory
baumwelch: : Baum-Welch algorithm for unsupervised HMM training
buildforest: : Build the random forest classifier
canopy: : Canopy clustering
cat: : Print a file or resource as the logistic regression models would see it
cleansvd: : Cleanup and verification of SVD output
clusterdump: : Dump cluster output to text
clusterpp: : Groups Clustering Output In Clusters
cmdump: : Dump confusion matrix in HTML or text formats
concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix

648.使用Mahout工具将解压后的20news-bydate.tar.gz文件内容转换成序列文件,保存到/data/mahout/20news/output/20news-seq/目录中,并查看该目录的列表信息,将操作命令和查询结果以文本形式提交到答题框中。
[root@master ~]# mkdir 20news
[root@master ~]# tar -xzf 20news-bydate.tar.gz -C20news
[root@master ~]# hadoop fs -mkdir -p/data/mahout/20news/20news-all
[root@master ~]# hadoop fs -put 20news/*/data/mahout/20news/20news-all
[root@master ~]# mahout seqdirectory -i /data/mahout/20news/20news-all -o /data/mahout/20news/output/20news-seq
/01/16 04:36:03 INFO mapreduce.Job: map 100% reduce 0%
19/01/16 04:36:04 INFO mapreduce.Job: Job job_1547338155253_0016 completed successfully
19/01/16 04:36:05 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=151642
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=37878493
HDFS: Number of bytes written=13631587
HDFS: Number of read operations=75388
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=93308
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=93308
Total vcore-milliseconds taken by all map tasks=93308
Total megabyte-milliseconds taken by all map tasks=63636056
Map-Reduce Framework
Map input records=18846
Map output records=18846
Input split bytes=2023490
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=745
CPU time spent (ms)=71930
Physical memory (bytes) snapshot=226930688
Virtual memory (bytes) snapshot=2505920512
Total committed heap usage (bytes)=108003328
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=13631587
19/01/16 04:36:05 INFO driver.MahoutDriver: Program took 134759 ms (Minutes: 2.245983333333333)

649.使用Mahout工具将解压后的20news-bydate.tar.gz文件内容转换成序列文件,保存到/data/mahout/20news/output/20news-seq/目录中,使用-text命令查看序列文件内容(前20行即可),将操作命令和查询结果以文本形式提交到答题框中。
[root@master ~]# mkdir 20news
[root@master ~]# tar -xzf 20news-bydate.tar.gz -C20news
[root@master ~]# hadoop fs -mkdir -p /data/mahout/20news/20news-all
[root@master ~]# hadoop fs -put 20news/*/data/mahout/20news/20news-all
[root@master ~]# mahout seqdirectory -i /data/mahout/20news/20news-all -o /data/mahout/20news/output/20news-seq
[root@master ~]# hadoop fs -text /data/mahout/20news/output/20news-seq/part-m-00000 | head -n 20
19/01/16 04:39:47 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
19/01/16 04:39:47 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
19/01/16 04:39:47 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
19/01/16 04:39:47 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
19/01/16 04:39:47 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
/20news-bydate-test/alt.atheism/53068 From: decay@cbnewsj.cb.att.com (dean.kaflowitz)
Subject: Re: about the bible quiz answers
Organization: AT&T
Distribution: na
Lines: 18

In article healta.153.735242337@saturn.wwc.edu, healta@saturn.wwc.edu (Tammy R Healy) writes:

.>
.>
.> #12) The 2 cheribums are on the Ark of the Covenant. When God said make no
.> graven image, he was refering to idols, which were created to be worshipped.
.> The Ark of the Covenant wasn’t wrodhipped and only the high priest could
.> enter the Holy of Holies where it was kept once a year, on the Day of
.> Atonement.

I am not familiar with, or knowledgeable about the original language,
but I believe there is a word for “idol” and that the translator
would have used the word “idol” instead of “graven image” had
the original said “idol.” So I think you’re wrong here, but
then again I could be too. I just suggesting a way to determine
text: Unable to write to output stream.

650.使用Mahout挖掘工具对数据集user-item-score.txt(用户-物品-得分)进行物品推
荐,要求采用基于项目的协同过滤算法,欧几里得距离公式定义,并且每位用户的推荐个数为3,设置非布尔数据,最大偏好值为4,最小偏好值为1,将推荐输出结果保存到output目录中,通过-cat命令查询输出结果part-r-00000中的内容 。将以上执行推荐算法的命令和查询结果以文本形式提交到答题框中。
[hdfs@master ~]$ hadoop fs -mkdir -p/data/mahout/project

[hdfs@master ~]$ hadoop fs -put user-item-score.txt/data/mahout/project

[hdfs@master ~]$ mahout recommenditembased -i/data/mahout/project/ user-item-score.txt -o /data/mahout/project/output -n 3-b false -s SIMILARITY_EUCLIDEAN_DISTANCE --maxPrefsPerUser 4 --minPrefsPerUser1 --maxPrefsInItemSimilarity 4 --tempDir /data/mahout/project/temp

651.在master节点安装启动Flume组件,打开Linux Shell运行flume-ng的帮助命令,查看Flume-ng的用法信息,将查询结果以文本形式提交到答题框中。
[root@master ~]# flume-ng help

652.根据提供的模板log-example.conf文件,使用Flume NG工具收集master节点的系统日志/var/log/secure,将收集的日志信息文件的名称以“xiandian-sec”为前缀,存放于HDFS文件系统的/1daoyun/file/flume目录中,并且定义在HDFS中产生的文件的时间戳为10分钟。进行收集后,查询HDFS文件系统中/1daoyun/file/flume的列表信息。将以上操作命令和结果信息以及修改后的log-example.conf文件内容提交到答题框中。
[root@master ~]# hadoop fs -ls /1daoyun/file/flume

Found 1 items

-rw-r–r-- 3root hdfs 1142 2017-05-08 10:29 /1daoyun/file/flume/xiandian-sec.1494239316323

[root@master ~]# cat log-example.conf

653.根据提供的模板hdfs-example.conf文件,使用Flume NG工具设置master节点的系统路径/opt/xiandian/为实时上传文件至HDFS文件系统的实时路径,设置HDFS文件系统的存储路径为/data/flume/,上传后的文件名保持不变,文件类型为DataStream,然后启动flume-ng agent。将以上操作命令和以及修改后的hdfs-example.conf文件内容提交到答题框中。
[root@master ~]# flume-ng agent --conf-filehdfs-example.conf --name master
[root@master ~]# cat hdfs-example.conf

654.在先电大数据平台部署Spark服务组件,打开Linux Shell启动spark-shell终端,将启动的程序进程信息以文本形式提交到答题框中。
[root@master ~]# spark-shell
Multiple versions of Spark are installed but SPARK_MAJOR_VERSION is not set
Spark1 will be picked by default
19/01/15 21:04:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
19/01/15 21:04:17 INFO spark.SecurityManager: Changing view acls to: root
19/01/15 21:04:17 INFO spark.SecurityManager: Changing modify acls to: root
19/01/15 21:04:17 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
19/01/15 21:04:18 INFO spark.HttpServer: Starting HTTP Server
19/01/15 21:04:18 INFO server.Server: jetty-8.y.z-SNAPSHOT
19/01/15 21:04:18 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:42191
19/01/15 21:04:18 INFO util.Utils: Successfully started service ‘HTTP class server’ on port 42191.
Welcome to
____ __
/ / ___ / /
\ / _ / _ `/ __/ '/
/
/ .__/_,// /_/_\ version 1.6.3

655.启动spark-shell后,在scala中加载数据“1,2,3,4,5,6,7,8,9,10”,求这些数据的2倍乘积能够被3整除的数字,并通过toDebugString 方法来查看RDD的谱系。将以上操作命令和结果信息以文本形式提交到答题框中。

scala> val number=sc.parallelize(1 to 10)
number: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at :27
scala> val doublenum=number.map(*2)
doublenum: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[1] at map at :29
scala> val threenum=doublenum.filter(
%3==0)
threenum: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[2] at filter at :31
scala> threenum.collect
19/01/15 21:16:38 INFO spark.SparkContext: Starting job: collect at :34
19/01/15 21:16:38 INFO scheduler.DAGScheduler: Got job 0 (collect at :34) with 4 output partitions
19/01/15 21:16:38 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (collect at :34)
res0: Array[Int] = Array(6, 12, 18)
scala> threenum.toDebugString
res5: String =
(4) MapPartitionsRDD[2] at filter at :31 []
| MapPartitionsRDD[1] at map at :29 []
| ParallelCollectionRDD[0] at parallelize at :27 []

656.启动spark-shell后,在scala中加载Key-Value数据“(“A”,1),(“B”,2),(“C”,3),(“A”,4), (“B”,5), (“C”,4), (“A”,3), (“A”,9), (“B”,4), (“D”,5)”,将这些数据以Key为基准进行升序排序,并以Key为基准进行分组。将以上操作命令和结果信息以文本形式提交到答题框中。
scala> val kv1=sc.parallelize(List((“A”,1),(“B”,2),(“C”,3),(“A”,4),(“B”,5),(“C”,4),(“A”,3),(“A”,9),(“B”,4),(“D”,5)))
kv1: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[3] at parallelize at :27
scala> kv1.sortByKey().collect
19/01/15 22:33:03 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool
19/01/15 22:33:03 INFO scheduler.DAGScheduler: ResultStage 3 (collect at :30) finished in 0.140 s
19/01/15 22:33:03 INFO scheduler.DAGScheduler: Job 2 finished: collect at :30, took 0.347574 s
res6: Array[(String, Int)] = Array((A,1), (A,4), (A,3), (A,9), (B,2), (B,5), (B,4), (C,3), (C,4), (D,5))
scala> kv1.groupByKey().collect
19/01/15 22:34:33 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool
19/01/15 22:34:33 INFO scheduler.DAGScheduler: ResultStage 5 (collect at :30) finished in 1.701 s
19/01/15 22:34:33 INFO scheduler.DAGScheduler: Job 3 finished: collect at :30, took 1.873657 s
res7: Array[(String, Iterable[Int])] = Array((D,CompactBuffer(5)), (A,CompactBuffer(1, 4, 3, 9)), (B,CompactBuffer(2, 5, 4)), (C,CompactBuffer(3, 4)))

657.启动spark-shell后,在scala中加载Key-Value数据“(“A”,1),(“B”,3),(“C”,5),(“D”,4), (“B”,7), (“C”,4), (“E”,5), (“A”,8), (“B”,4), (“D”,5)”,将这些数据以Key为基准进行升序排序,并对相同的Key进行Value求和计算。将以上操作命令和结果信息以文本形式提交到答题框中。
scala> val kv2=sc.parallelize(List((“A”,1),(“B”,3),(“C”,5),(“D”,4), (“B”,7), (“C”,4), (“E”,5), (“A”,8), (“B”,4), (“D”,5)))
kv2: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[8] at parallelize at :27
scala> kv2.sortByKey().collect
19/01/15 22:40:15 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 8.0, whose tasks have all completed, from pool
19/01/15 22:40:15 INFO scheduler.DAGScheduler: ResultStage 8 (collect at :30) finished in 0.040 s
19/01/15 22:40:15 INFO scheduler.DAGScheduler: Job 5 finished: collect at :30, took 0.234400 s
res8: Array[(String, Int)] = Array((A,1), (A,8), (B,3), (B,7), (B,4), (C,5), (C,4), (D,4), (D,5), (E,5))
scala> kv2.reduceByKey(+).collect
19/01/15 22:41:32 INFO scheduler.DAGScheduler: ResultStage 10 (collect at :30) finished in 0.020 s
19/01/15 22:41:32 INFO scheduler.DAGScheduler: Job 6 finished: collect at :30, took 0.091262 s
res9: Array[(String, Int)] = Array((D,9), (A,9), (E,5), (B,14), (C,9))

658.启动spark-shell后,在scala中加载Key-Value数据“(“A”,4),(“A”,2),(“C”,3),(“A”,4),(“B”,5),(“C”,3),(“A”,4),以Key为基准进行去重操作,并通过toDebugString 方法来查看RDD的谱系。将以上操作命令和结果信息以文本形式提交到答题框中。
scala> val kv1=sc.parallelize(List((“A”,4),(“A”,2),(“C”,3),(“A”,4),(“B”,5),(“C”,3),(“A”,4)))
kv1: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[0] at parallelize at :27

scala> kv1.distinct.collect
19/01/16 05:02:08 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
19/01/16 05:02:08 INFO scheduler.DAGScheduler: ResultStage 1 (collect at :30) finished in 0.072 s
19/01/16 05:02:08 INFO scheduler.DAGScheduler: Job 0 finished: collect at :30, took 0.647297 s
res0: Array[(String, Int)] = Array((A,4), (B,5), (A,2), (C,3))
scala> kv1.toDebugString
res1: String = (4) ParallelCollectionRDD[0] at parallelize at :27 []

659.启动spark-shell后,在scala中加载两组Key-Value数据“(“A”,1),(“B”,2),(“C”,3),(“A”,4),(“B”,5)”、(“A”,1),(“B”,2),(“C”,3),(“A”,4),(“B”,5),将两组数据以Key为基准进行JOIN操作,将以上操作命令和结果信息以文本形式提交到答题框中。
scala> val kv5=sc.parallelize(List((“A”,1),(“B”,2),(“C”,3),(“A”,4),(“B”,5)))
kv5: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[4] at parallelize at :27
scala> val kv6=sc.parallelize(List((“A”,1),(“B”,2),(“C”,3),(“A”,4),(“B”,5)))
kv6: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[5] at parallelize at :27
scala> kv5.join(kv6).collect
19/01/16 05:08:02 INFO scheduler.DAGScheduler: ResultStage 4 (collect at :32) finished in 0.070 s
19/01/16 05:08:02 INFO scheduler.DAGScheduler: Job 1 finished: collect at :32, took 0.173849 s
res2: Array[(String, (Int, Int))] = Array((A,(1,1)), (A,(1,4)), (A,(4,1)), (A,(4,4)), (B,(2,2)), (B,(2,5)), (B,(5,2)), (B,(5,5)), (C,(3,3)))

660.在Spark-Shell中使用scala语言对sample-data目录中的文件使用flatMap语句进行数据压缩,压缩的所有数据以空格为分隔符,压缩后对字母进行Key:Value计数(字母为Key,出现次数为Value),将以上操作命令和结果信息以文本形式提交到答题框中。
scala> var rdd4 = sc.textFile(“hdfs://192.168.2.12:8020/sample-data/”)
19/01/16 05:13:38 INFO storage.MemoryStore: Block broadcast_5 stored as values in memory (estimated size 90.8 KB, free 511.0 MB)
19/01/16 05:13:38 INFO storage.MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 29.9 KB, free 511.0 MB)
19/01/16 05:13:38 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on localhost:60121 (size: 29.9 KB, free: 511.1 MB)
19/01/16 05:13:38 INFO spark.SparkContext: Created broadcast 5 from textFile at :27
rdd4: org.apache.spark.rdd.RDD[String] = hdfs://192.168.2.12:8020/sample-data/ MapPartitionsRDD[10] at textFile at :27
scala> rdd4.toDebugString
scala> val words=rdd4.flatMap(.split(" "))
words: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[15] at flatMap at :29
scala> val wordscount=words.map(word => (word,1)).reduceByKey(
+_)
scala> wordscount.collec
scala> wordscount.toDebugString

661.在Spark-Shell中使用scala语言加载search.txt文件数据,其数据结构释义如下表所示。加载完成后过滤掉不足6列的行数据和第四列排名为2、第五列点击顺序号为1的数据,并进行计数。将以上操作命令和结果信息以文本形式提交到答题框中。
访问时间 用户ID 查询词 该URL在返回结果中的排名 用户点击的顺序号 用户点击的URL
scala> val ardd = sc.textFile("/data/search.txt")
19/01/16 05:30:11 INFO storage.MemoryStore: Block broadcast_9 stored as values in memory (estimated size 349.6 KB, free 509.6 MB)
19/01/16 05:30:11 INFO storage.MemoryStore: Block broadcast_9_piece0 stored as bytes in memory (estimated size 29.9 KB, free 509.5 MB)
19/01/16 05:30:11 INFO storage.BlockManagerInfo: Added broadcast_9_piece0 in memory on localhost:60121 (size: 29.9 KB, free: 511.0 MB)
19/01/16 05:30:11 INFO spark.SparkContext: Created broadcast 9 from textFile at :27
ardd: org.apache.spark.rdd.RDD[String] = /data/search.txt MapPartitionsRDD[22] at textFile at :27
scala> val mapardd = ardd.map((.split(’\t’))).filter(.length >= 6 )
mapardd: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[24] at filter at :29
scala> val filterardd = mapardd.filter((3).toString != “2”).filter((4).toString != “1”)
filterardd: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[26] at filter at :31
scala> filterardd.count

  • 4
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

mn525520

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值