sqoop2安装

# Decompress Sqoop distribution tarball
tar -xvf sqoop-<version>-bin-hadoop<hadoop-version>.tar.gz

# Move decompressed content to any location
mv sqoop-<version>-bin-hadoop<hadoop version>.tar.gz /usr/lib/sqoop

# Change working directory
cd /usr/lib/sqoop

The Sqoop server uses environment variables to find Hadoop libraries. If the environment variable $HADOOP_HOME is set, Sqoop will look for jars in thefollowing locations: $HADOOP_HOME/share/hadoop/common, $HADOOP_HOME/share/hadoop/hdfs, $HADOOP_HOME/share/hadoop/mapreduceand $HADOOP_HOME/share/hadoop/yarn. You can specify where the Sqoop server should look for the common, hdfs, mapreduce, and yarn jars indepently with the $HADOOP_COMMON_HOME, $HADOOP_HDFS_HOME, $HADOOP_MAPRED_HOME and $HADOOP_YARN_HOME environment variables.

# Export HADOOP_HOME variable
export HADOOP_HOME=/...

# Or alternatively HADOOP_*_HOME variables
export HADOOP_COMMON_HOME=/...
export HADOOP_HDFS_HOME=/...
export HADOOP_MAPRED_HOME=/...
export HADOOP_YARN_HOME=/...

Hadoop configuration

Sqoop server will need to impersonate users to access HDFS and other resources in or outside of the cluster as the user who started given job rather then user who is running the server. You need to configure Hadoop to explicitly allow this impersonation via so called proxyuser system. You need to create two properties in core-site.xml file - hadoop.proxyuser.$SERVER_USER.hosts and hadoop.proxyuser.$SERVER_USER.groups where $SERVER_USER is the user who will be running Sqoop 2 server. In most scenarios configuring * is sufficient. Please refer to Hadoop documentation for details how to use those properties.

Example fragment that needs to be present in core-site.xml file for case when server is running under sqoop2 user:

<property>
  <name>hadoop.proxyuser.sqoop2.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.sqoop2.groups</name>
  <value>*</value>
</property>

If you’re running Sqoop 2 server under a so called system user (user with ID less then min.user.id- 1000 by default), then YARN will by default refuse to run Sqoop 2 jobs. You will need to add the user name who is running Sqoop 2 server (most likely user sqoop2) to a allowed.system.usersproperty of container-executor.cfg. Please refer to YARN documentation for further details.

Example fragment that needs to be present in container-executor.cfg file for case when server is running under sqoop2 user:

allowed.system.users=sqoop2

Third party jars

To propagate any third party jars to Sqoop server classpath, create a directory anywhere on the file system and export it’s location in SQOOP_SERVER_EXTRA_LIB variable.

# Create directory for extra jars
mkdir -p /var/lib/sqoop2/

# Copy all your JDBC drivers to this directory
cp mysql-jdbc*.jar /var/lib/sqoop2/
cp postgresql-jdbc*.jar /var/lib/sqoop2/

# And finally export this directory to SQOOP_SERVER_EXTRA_LIB
export SQOOP_SERVER_EXTRA_LIB=/var/lib/sqoop2/

Sqoop doesn’t ship with any JDBC drivers due to incompatible licenses. You will need to use this mechanism to install all JDBC drivers that are needed.

 Configuring PATH

All user and administrator facing shell commands are stored in bin/ directory. It’s recommended to add this directory to your $PATH for easier execution, for example:

PATH=$PATH:`pwd`/bin/

Configuring Server

Server configuration files are stored in conf directory. File sqoop_bootstrap.properties specifies which configuration provider should be used for loading configuration for rest of Sqoop server. Default value PropertiesConfigurationProvider should be sufficient.

Second configuration file called sqoop.properties contains remaining configuration properties that can affect Sqoop server. The configuration file is very well documented, so check if all configuration properties fits your environment. Default or very little tweaking should be sufficient in most common cases.

Repository Initialization

The metadata repository needs to be initialized before starting Sqoop 2 server for the first time. Use Upgrade tool to initialize the repository:

sqoop2-tool upgrade

You can verify if everything have been configured correctly using Verify tool:

sqoop2-tool verify
...
Verification was successful.
Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly

 报错:

Tool class org.apache.sqoop.tools.tool.UpgradeTool has failed.

查看@LOGDIR@/sqoop.log

2018-09-20 15:42:15,123 ERROR [org.apache.sqoop.core.SqoopServer.initialize(SqoopServer.java:67)] Failure in server initialization
org.apache.sqoop.common.SqoopException: MAPREDUCE_0002:Failure on submission engine initialization - Invalid Hadoop configuration directory (not a directory or permission issues): /etc/hadoop/conf/
        at org.apache.sqoop.submission.mapreduce.MapreduceSubmissionEngine.initialize(MapreduceSubmissionEngine.java:97)
        at org.apache.sqoop.driver.JobManager.initialize(JobManager.java:257)
        at org.apache.sqoop.core.SqoopServer.initialize(SqoopServer.java:64)
        at org.apache.sqoop.tools.tool.VerifyTool.runTool(VerifyTool.java:36)
        at org.apache.sqoop.tools.ToolRunner.main(ToolRunner.java:72)
2018-09-20 15:42:15,130 ERROR [org.apache.sqoop.tools.tool.VerifyTool.runTool(VerifyTool.java:41)] Got exception while initializing/destroying Sqoop server:
java.lang.RuntimeException: Failure in server initialization
        at org.apache.sqoop.core.SqoopServer.initialize(SqoopServer.java:68)
        at org.apache.sqoop.tools.tool.VerifyTool.runTool(VerifyTool.java:36)
        at org.apache.sqoop.tools.ToolRunner.main(ToolRunner.java:72)
Caused by: org.apache.sqoop.common.SqoopException: MAPREDUCE_0002:Failure on submission engine initialization - Invalid Hadoop configuration directory (not a directory or permission issues): /etc/hadoop/conf/
        at org.apache.sqoop.submission.mapreduce.MapreduceSubmissionEngine.initialize(MapreduceSubmissionEngine.java:97)
        at org.apache.sqoop.driver.JobManager.initialize(JobManager.java:257)
        at org.apache.sqoop.core.SqoopServer.initialize(SqoopServer.java:64)
        ... 2 more

修改conf/sqoop.properties里面的参数: 

org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/home/hadoop/apps/hadoop/etc/hadoop

 

Server Life Cycle

After installation and configuration you can start Sqoop server with following command:

sqoop2-server start

You can stop the server using the following command:

sqoop2-server stop

By default Sqoop server daemon use port 12000. You can set org.apache.sqoop.jetty.port in configuration file conf/sqoop.properties to use different port.

Client installation

Just copy Sqoop distribution artifact on target machine and unzip it in desired location. You can start client with following command:

sqoop2-shell

Client is not acting as a Hadoop client and thus you do not need to be installed on node with Hadoop libraries and configuration files.

You can find more documentation for Sqoop shell in Command Line Shell.

参考网站:http://sqoop.apache.org/docs/1.99.7/admin/Installation.html

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

R_记忆犹新

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值