一、安装前准备
1、安装配置数据库
查看 Cloudera Manager and Managed Service Data Stores. 安装配置数据库,文档点击这里MySQL Database,Oracle Database, or External PostgreSQL Database
2、(仅CDH 5) 在RHEL 5 和 CentOS 5, 安装Python 2.6 or 2.7
二、安装Cloudera Manager和Agents
文件被解压到一个名字与Cloudera Manager 版本相符的子目录。比如: /opt/cloudera-manager/cm-5.0/. 这个全路径后面会用到,tarball_root 目录.
1、Perform Configuration Required by Single User Mode单用户模式所需的配置
单用户模式的配置步骤请点击这里 Single User Mode Requirements.2、创建用户
在所有主机上手动创建cloudera-scm账户:
$ sudo useradd --system --home=/opt/cloudera-manager/cm-5.0/run/cloudera-scm-server --no-create-home --shell=/bin/false --comment "Cloudera SCM User" cloudera-scm
确保 --home 路径与个人环境匹配. 这个参数应与tarball的解压目录相符。比如你的--home 可能是/opt/cm-5.0/run/cloudera-scm-server.
3、创建Clodera Manager Server 本地数据存储目录
- 创建此目录: /var/lib/cloudera-scm-server.
- 修改此目录的权限,以便cloudera-scm 用户及其组有操作权限.
$ sudo mkdir /var/log/cloudera-scm-server $ sudo chown cloudera-scm:cloudera-scm /var/log/cloudera-scm-server
4、配置Cloudera Manager Agents
- 修改每个Cloudera Manager Agent 主机上的 tarball_root/etc/cloudera-scm-agent/config.ini文件,配置 Cloudera Manager Agent 所指向的 Cloudera Manager Server :
Property Description server_host Name of the host where Cloudera Manager Server is running. server_port Port on the host where Cloudera Manager Server is running. - By default, a tarball installation has a var subdirectory where state is stored. 默认情况,tarball包里有一个var子目录用来存储状态。
- In a non-tarball installation, state is stored in /var. 非源码安装时状态存储在/var目录。
- Cloudera recommends that you reconfigure the tarball installation to use an external directory as the /var equivalent (/var or any other directory outside the tarball) 。Cloudera 推荐源码安装重新配置为一个外部目录,像/var一样(/var 或者其他tarball安装目录以外的目录)
- so that when you upgrade Cloudera Manager, the new tarball installation can access this state。以便升级Cloudera Manager 时,新tarball包可以访问这个状态.
- Configure the installation to use an external directory for storing state by editing tarball_root/etc/default/cloudera-scm-agent and setting the CMF_VAR variable to the location of the /var equivalent. 通过修改tarball_root/etc/default/cloudera-scm-agent 文件的CMF_VAR 变量的值来配置了一个外部的目录
- If you do not reuse the state directory between different tarball installations, duplicate Cloudera Manager Agent entries can occur in the Cloudera Manager database.如果你在不同的源码安装中不会再使用这个状态目录,
5、自定义Cloudera Manager User 及目录
- /var/log/cloudera-scm-headlamp
- /var/log/cloudera-scm-firehose
- /var/log/cloudera-scm-alertpublisher
- /var/log/cloudera-scm-eventserver
- /var/lib/cloudera-scm-headlamp
- /var/lib/cloudera-scm-firehose
- /var/lib/cloudera-scm-alertpublisher
- /var/lib/cloudera-scm-eventserver
- /var/lib/cloudera-scm-server
- Change ownership of existing directories:
- Use the chown command to change ownership of all existing directories to the Cloudera Manager user. If the Cloudera Manager username and group arecloudera-scm, to change the ownership of the headlamp log directory, you issue a command similar to the following:
$ sudo chown -R cloudera-scm:cloudera-scm /var/log/cloudera-scm-headlamp
- Use the chown command to change ownership of all existing directories to the Cloudera Manager user. If the Cloudera Manager username and group arecloudera-scm, to change the ownership of the headlamp log directory, you issue a command similar to the following:
- Use alternate directories:
- If the directories you plan to use do not exist, create them. For example, to create /var/cm_logs/cloudera-scm-headlamp for use by the cloudera-scm user, you can use the following commands:
mkdir /var/cm_logs/cloudera-scm-headlamp chown cloudera-scm /var/cm_logs/cloudera-scm-headlamp
- Connect to the Cloudera Manager Admin Console.
- Select Clusters > Cloudera Management Service
- Select Scope > role name.
- Click the Configuration tab.
- Enter a term in the Search field to find the settings to be changed. For example, you might enter /var or directory.
- Update each value with the new locations for Cloudera Manager to use.
Note: The configuration property for the Cloudera Manager Server Local Data Storage Directory (default value is: /var/lib/cloudera-scm-server) is located on a different page:
- Select Administration > Settings.
- Type directory in the Search box.
- Enter the directory path in the Cloudera Manager Server Local Data Storage Directory property.
- Click Save Changes to commit the changes.
- If the directories you plan to use do not exist, create them. For example, to create /var/cm_logs/cloudera-scm-headlamp for use by the cloudera-scm user, you can use the following commands:
三、为Cloudera Manager Server 配置数据库
四、创建 Parcel 目录
- 在 Cloudera Manager Server 主机, 创建一个 parcel 库目录:
$ sudo mkdir -p /opt/cloudera/parcel-repo
- 修改目录的权限:
自定义用户名请修改为相应的用户名及组名$ sudo chown cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo
- 在集群的每台机器上创建一个parcels 目录:
$ sudo mkdir -p /opt/cloudera/parcels
- 修改权限:
$ sudo chown cloudera-scm:cloudera-scm /opt/cloudera/parcels
五、启动Cloudera Manager Server
- 关闭 HDFS 及MapReduce. See Stopping Services (CDH 4) or Stopping Services (CDH 5) for the commands to stop these services.
- 配置初始化脚本为开机不启动. Use commands similar to those shown in Configuring init to Start Core Hadoop System Services (CDH 4) orConfiguring init to Start Hadoop System Services (CDH 5), but disable the start on boot (for example, $ sudo chkconfig hadoop-hdfs-namenode off).
- root用户:
$ sudotarball_root/etc/init.d/cloudera-scm-server start
- 其他用户. If you run as another user, ensure the user you created for Cloudera Manager owns the location to which you extracted the tarball including the newly created database files. If you followed the earlier examples and created the directory /opt/cloudera-manager and the user cloudera-scm, you could use the following command to change ownership of the directory:
$ sudo chown -R cloudera-scm:cloudera-scm /opt/cloudera-manager
Once you have established ownership of directory locations, you can start Cloudera Manager Server using the user account you chose. For example, you might run the Cloudera Manager Server as cloudera-service. In this case, you have the following options:
- Run the following command:
$ sudo -u cloudera-servicetarball_root/etc/init.d/cloudera-scm-server start
- Edit the configuration files so the script internally changes the user. Then run the script as root:
- Remove the following line from tarball_root/etc/default/cloudera-scm-server:
export CMF_SUDO_CMD=" "
- Change the user and group in tarball_root/etc/init.d/cloudera-scm-server to the user you want the server to run as. For example, to run ascloudera-service, change the user and group as follows:
USER=cloudera-service GROUP=cloudera-service
- Run the server script as root:
$ sudotarball_root/etc/init.d/cloudera-scm-server start
- Remove the following line from tarball_root/etc/default/cloudera-scm-server:
- Run the following command:
- Cloudera Manager Server 开机自启动:
- 在 Cloudera Manager Server 主机运行如下命令:
- RHEL-compatible and SLES
$ cptarball_root/etc/init.d/cloudera-scm-server /etc/init.d/cloudera-scm-server $ chkconfig cloudera-scm-server on
- Debian/Ubuntu
$ cptarball_root/etc/init.d/cloudera-scm-server /etc/init.d/cloudera-scm-server $ update-rc.d cloudera-scm-server defaults
- RHEL-compatible and SLES
- 在 Cloudera Manager Server 主机, 打开the /etc/init.d/cloudera-scm-server 文件并把 CMF_DEFAULTS 的值,从${CMF_DEFAULTS:-/etc/default} 改为 tarball_root/etc/default.
- 在 Cloudera Manager Server 主机运行如下命令:
六、启动Cloudera Manager Agents
- 在每个Agent 主机上运行如下命令启动Cloudera Manager Agent:
Agent启动后会与Cloudera Manager Server通信。$ sudotarball_root/etc/init.d/cloudera-scm-agent start
- If you are running single user mode, start Cloudera Manager Agent using the user account you chose. For example, to run the Cloudera Manager Agent as cloudera-scm, you have the following options:
- Run the following command:
$ sudo -u cloudera-scmtarball_root/etc/init.d/cloudera-scm-agent start
- Edit the configuration files so the script internally changes the user, and then run the script as root:
- Remove the following line from tarball_root/etc/default/cloudera-scm-agent:
export CMF_SUDO_CMD=" "
- Change the user and group in tarball_root/etc/init.d/cloudera-scm-agent to the user you want the Agent to run as. For example, to run ascloudera-scm, change the user and group as follows:
USER=cloudera-scm GROUP=cloudera-scm
- Run the Agent script as root:
$ sudotarball_root/etc/init.d/cloudera-scm-agent start
- Remove the following line from tarball_root/etc/default/cloudera-scm-agent:
- Run the following command:
- Cloudera Manager Agents开机自启动:
- Run the following commands on each Agent host:
- RHEL-compatible and SLES
$ cptarball_root/etc/init.d/cloudera-scm-agent /etc/init.d/cloudera-scm-agent $ chkconfig cloudera-scm-agent on
- Debian/Ubuntu
$ cptarball_root/etc/init.d/cloudera-scm-agent /etc/init.d/cloudera-scm-agent $ update-rc.d cloudera-scm-agent defaults
- RHEL-compatible and SLES
- On each Agent, open the tarball_root/etc/init.d/cloudera-scm-agent file and change the value of CMF_DEFAULTS from ${CMF_DEFAULTS:-/etc/default} totarball_root/etc/default.
- Run the following commands on each Agent host:
七、安装依赖
- Red-hat compatible
- chkconfig
- python (2.6 required for CDH 5)
- bind-utils
- psmisc
- libxslt
- zlib
- sqlite
- cyrus-sasl-plain
- cyrus-sasl-gssapi
- fuse
- portmap
- fuse-libs
- redhat-lsb
- SLES
- chkconfig
- python (2.6 required for CDH 5)
- bind-utils
- psmisc
- libxslt
- zlib
- sqlite
- cyrus-sasl-plain
- cyrus-sasl-gssapi
- fuse
- portmap
- python-xml
- libfuse2
- Debian/Ubuntu
- lsb-base
- psmisc
- bash
- libsasl2-modules
- libsasl2-modules-gssapi-mit
- zlib1g
- libxslt1.1
- libsqlite3-0
- libfuse2
- fuse-utils or fuse
- rpcbind
八、启动及登陆Cloudera Manager 管理控制台
- Wait several minutes for the Cloudera Manager Server to complete its startup. To observe the startup process, run tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log on the Cloudera Manager Server host. If the Cloudera Manager Server does not start, see Troubleshooting Installation and Upgrade Problems.
- In a web browser, enter http://Server host:7180, where Server host is the fully qualified domain name or IP address of the host where the Cloudera Manager Server is running. The login screen for Cloudera Manager Admin Console displays.
- Log into Cloudera Manager Admin Console. The default credentials are: Username: admin Password: admin. Cloudera Manager does not support changing the adminusername for the installed account. You can change the password using Cloudera Manager after you run the installation wizard. Although you cannot change theadmin username, you can add a new user, assign administrative privileges to the new user, and then delete the default admin account.
九、选择Cloudera Manager 版本及主机
- When you start the Cloudera Manager Admin Console, the install wizard starts up. Click Continue to get started.
- Choose which edition to install:
- Cloudera Express, which does not require a license, but provides a limited set of features.
- Cloudera Enterprise Data Hub Edition Trial, which does not require a license, but expires after 60 days and cannot be renewed.
- Cloudera Enterprise with one of the following license types:
- Basic Edition
- Flex Edition
- Data Hub Edition
- If you elect Cloudera Enterprise, install a license:
- Click Upload License.
- Click the document icon to the left of the Select a License File text field.
- Navigate to the location of your license file, click the file, and click Open.
- Click Upload.
- Information is displayed indicating what the CDH installation includes. At this point, you can access online Help or the Support Portal. Click Continue to proceed with the installation.
- Click the Currently Managed Hosts tab.
- Choose the hosts to add to the cluster.
- Click Continue.
十、选择安装软件的方法并安装软件
- Install CDH and managed services using parcels:
- Use Parcels
- Choose the parcels to install. The choices depend on the repositories you have chosen; a repository can contain multiple parcels. Only the parcels for the latest supported service versions are configured by default.
You can add additional parcels for previous versions by specifying custom repositories. For example, you can find the locations of the previous CDH 4 parcels at http://archive.cloudera.com/cdh4/parcels/. Or, if you are installing CDH 4.3 and want to use policy-file authorization, you can add the Sentry parcel using this mechanism.
- To specify the parcel directory, specify the local parcel repository, add a parcel repository, or specify the properties of a proxy server through which parcels are downloaded, click the More Options button and do one or more of the following:
- Parcel Directory and Local Parcel Repository Path - Specify the location of parcels on cluster hosts and the Cloudera Manager Server host. If you change the default value for Parcel Directory and have already installed and started Cloudera Manager Agents, restart the Agents:
$ sudo service cloudera-scm-agent restart
- Parcel Repository - In the Remote Parcel Repository URLs field, click the button and enter the URL of the repository. The URL you specify is added to the list of repositories listed in the Configuring Cloudera Manager Server Parcel Settings page and a parcel is added to the list of parcels on the Select Repository page. If you have multiple repositories configured, you see all the unique parcels contained in all your repositories.
- Proxy Server - Specify the properties of a proxy server.
- Parcel Directory and Local Parcel Repository Path - Specify the location of parcels on cluster hosts and the Cloudera Manager Server host. If you change the default value for Parcel Directory and have already installed and started Cloudera Manager Agents, restart the Agents:
- Click OK.
- To specify the parcel directory, specify the local parcel repository, add a parcel repository, or specify the properties of a proxy server through which parcels are downloaded, click the More Options button and do one or more of the following:
- Select the release of Cloudera Manager Agent. You can choose either the version that matches the Cloudera Manager Server you are currently using or specify a version in a custom repository. If you opted to use custom repositories for installation files, you can provide a GPG key URL that applies for all repositories. Click Continue.
- Choose the parcels to install. The choices depend on the repositories you have chosen; a repository can contain multiple parcels. Only the parcels for the latest supported service versions are configured by default.
- Click Continue. Cloudera Manager installs the CDH and managed service parcels. During parcel installation, progress is indicated for the phases of the parcel installation process in separate progress bars. If you are installing multiple parcels, you see progress bars for each parcel. When the Continue button at the bottom of the screen turns blue, the installation process is completed. Click Continue.
- Use Parcels
- Click Continue. The Host Inspector runs to validate the installation and provides a summary of what it finds, including all the versions of the installed components. If the validation is successful, click Finish.
十一、添加服务
使用 Cloudera Manager 向导配置并启动CDH及其他所管理的服务.
- In the first page of the Add Services wizard, choose the combination of services to install and whether to install Cloudera Navigator:
- Click the radio button next to the combination of services to install:
CDH 4 CDH 5 - Core Hadoop - HDFS, MapReduce, ZooKeeper, Oozie, Hive, and Hue
- Core with HBase
- Core with Impala
- All Services - HDFS, MapReduce, ZooKeeper, HBase, Impala, Oozie, Hive, Hue, and Sqoop
- Custom Services - Any combination of services.
- Core Hadoop - HDFS, YARN (includes MapReduce 2), ZooKeeper, Oozie, Hive, Hue, and Sqoop
- Core with HBase
- Core with Impala
- Core with Search
- Core with Spark
- All Services - HDFS, YARN (includes MapReduce 2), ZooKeeper, Oozie, Hive, Hue, Sqoop, HBase, Impala, Solr, Spark, and Key-Value Store Indexer
- Custom Services - Any combination of services.
- Some services depend on other services; for example, HBase requires HDFS and ZooKeeper. Cloudera Manager tracks dependencies and installs the correct combination of services.
- In a Cloudera Manager deployment of a CDH 4 cluster, the MapReduce service is the default MapReduce computation framework. Choose Custom Services to install YARN, or use the Add Service functionality to add YARN after installation completes.
Note: You can create a YARN service in a CDH 4 cluster, but it is not considered production ready.
- In a Cloudera Manager deployment of a CDH 5 cluster, the YARN service is the default MapReduce computation framework. Choose Custom Servicesto install MapReduce, or use the Add Service functionality to add MapReduce after installation completes.
Note: In CDH 5, the MapReduce service has been deprecated. However, the MapReduce service is fully supported for backward compatibility through the CDH 5 lifecycle.
- The Flume service can be added only after your cluster has been set up.
- If you have chosen Data Hub Edition Trial or Cloudera Enterprise, optionally select the Include Cloudera Navigator checkbox to enable Cloudera Navigator. See the Cloudera Navigator Documentation.
- Click the radio button next to the combination of services to install:
- Customize the assignment of role instances to hosts. The wizard evaluates the hardware configurations of the hosts to determine the best hosts for each role. The wizard assigns all worker roles to the same set of hosts to which the HDFS DataNode role is assigned. You can reassign role instances if necessary.
Click a field below a role to display a dialog containing a list of hosts. If you click a field containing multiple hosts, you can also select All Hosts to assign the role to all hosts, or Custom to display the pageable hosts dialog.
The following shortcuts for specifying hostname patterns are supported:- Range of hostnames (without the domain portion)
Range Definition Matching Hosts 10.1.1.[1-4] 10.1.1.1, 10.1.1.2, 10.1.1.3, 10.1.1.4 host[1-3].company.com host1.company.com, host2.company.com, host3.company.com host[07-10].company.com host07.company.com, host08.company.com, host09.company.com, host10.company.com - IP addresses
- Rack name
Click the View By Host button for an overview of the role assignment by hostname ranges.
- Range of hostnames (without the domain portion)
- When you are satisfied with the assignments, click Continue.
- On the Database Setup page, configure settings for required databases:
- Enter the database host, database type, database name, username, and password for the database that you created when you set up the database.
- Click Test Connection to confirm that Cloudera Manager can communicate with the database using the information you have supplied. If the test succeeds in all cases, click Continue; otherwise, check and correct the information you have provided for the database and then try the test again. (For some servers, if you are using the embedded database, you will see a message saying the database will be created at a later step in the installation process.) The Review Changes screen displays.
- Review the configuration changes to be applied. Confirm the settings entered for file system paths. The file paths required vary based on the services to be installed.If you chose to add the Sqoop service, indicate whether to use the default Derby database or the embedded PostgreSQL database. If the latter, type the database name, host, and user credentials that you specified when you created the database.
Warning: Do not place DataNode data directories on NAS devices. When resizing an NAS, block replicas can be deleted, which will result in reports of missing blocks.Click Continue. The wizard starts the services.
- When all of the services are started, click Continue. You see a success message indicating that your cluster has been successfully started.
- Click Finish to proceed to the Home Page.
(可选) Change the Cloudera Manager User
- Connect to the Cloudera Manager Admin Console.
- Do one of the following:
- Select Clusters > Cloudera Management Service > Cloudera Management Service.
- On the Status tab of the Home page, in Cloudera Management Service table, click the Cloudera Management Service link.
- Click the Configuration tab.
- Use the search box to find the property to change. For example, you might enter "system" to find the System User and System Group properties.
- Make any changes required to the System User and System Group to ensure Cloudera Manager uses the proper user accounts.
- Click Save Changes.
- Start the Cloudera Management Service roles.
Change the Default Administrator Password
- Right-click the logged-in username at the far right of the top navigation bar and select Change Password.
- Enter the current password and a new password twice, and then click Update.
Configure Oozie Data Purge Settings
If you added an Oozie service, you can change your Oozie configuration to control when data is purged in order to improve performance, cut down on database disk usage, or to keep the history for a longer period of time. Limiting the size of the Oozie database can also improve performance during upgrades. See Configuring Oozie Data Purge Settings Using Cloudera Manager.
Test the Installation
You can test the installation following the instructions in Testing the Installation.