CitusDB, PostgreSQLs Use Hadoop Distribute Query - 3 : hadoop-sync install

Postgres2015全国用户大会将于11月20至21日在北京丽亭华苑酒店召开。本次大会嘉宾阵容强大,国内顶级PostgreSQL数据库专家将悉数到场,并特邀欧洲、俄罗斯、日本、美国等国家和地区的数据库方面专家助阵:

  • Postgres-XC项目的发起人铃木市一(SUZUKI Koichi)
  • Postgres-XL的项目发起人Mason Sharp
  • pgpool的作者石井达夫(Tatsuo Ishii)
  • PG-Strom的作者海外浩平(Kaigai Kohei)
  • Greenplum研发总监姚延栋
  • 周正中(德哥), PostgreSQL中国用户会创始人之一
  • 汪洋,平安科技数据库技术部经理
  • ……


 
  • 2015年度PG大象会报名地址:http://postgres2015.eventdove.com/
  • PostgreSQL中国社区: http://postgres.cn/
  • PostgreSQL专业1群: 3336901(已满)
  • PostgreSQL专业2群: 100910388
  • PostgreSQL专业3群: 150657323


CitusDB本身支持表的分布式存储, 同时利用PostgreSQL 9.1的FDW功能, 还可以与Hadoop结合使用.
目前CitusDB只能与Hadoop 1.x配合使用. Hadoop 2.x支持多NameNode的结构暂时还不能配合使用.

【CitusDB + Hadoop 架构图如下 : 】
CitusDB, A PostgreSQL Use Hadoop Distribute Query - 3 : hadoop-sync install - 德哥@Digoal - The Heart,The World.
目前的版本可以在 http://www.citusdata.com/main-download下载.
下载到的实际上是修改过的PostgreSQL数据库版本.
CitusDB的运行还需要依赖两个插件, 如下 : 
1. 其中FDW用做PostgreSQL和HDFS的接口, 需要另外安装. ( https://github.com/citusdata/file_fdw)
2. 同时CitusDB还需要从Hadoop的NameNode中同步HDFS的元数据到CitusDB的Master节点(PostgreSQL DB). 
还需要将DataNode的元数据同步到CitusDB的worker节点(PostgreSQL DB). 也需要另外安装. ( https://github.com/citusdata/hadoop-sync)

【部署建议 : 】
1. hadoop-sync最好与Hadoop NameNode部署在同一主机.
2. CitusDB Master node最好与 Hadoop NameNode部署在同一主机.
3. CitusDB Worker node与Hadoop DataNode一对一部署在同一主机.

【本文主要讲一下CitusDB第二个插件hadoop-sync的安装方法 : 】
1. maven架构
CitusDB, A PostgreSQL Use Hadoop Distribute Query - 3 : hadoop-sync install - 德哥@Digoal - The Heart,The World.
 
下载maven, 用于安装hadoop-sync.

# wget http://mirror.bjtu.edu.cn/apache/maven/maven-3/3.0.5/binaries/apache-maven-3.0.5-bin.tar.gz
# tar -zxvf apache-maven-3.0.5-bin.tar.gz
# mv apache-maven-3.0.5 /opt/


2. maven依赖java, javac, 所以还需要安装java, javac.

# yum install java
# java -version
java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.9) (rhel-1.36.1.11.9.el5_9-x86_64)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
# which java
/usr/bin/java

# 安装javac.

# yum install java-devel
# javac -version
   javac 1.6.0_24
# which javac
  /usr/bin/javac


3. 安装maven
配置环境变量 : 
# vi ~/.bash_profile

export M2_HOME=/opt/apache-maven-3.0.5
export M2=$M2_HOME/bin
export MAVEN_OPTS="-Xms256m -Xmx512m"
export JAVA_HOME=/usr
export PATH=$M2:$JAVA_HOME/bin:$PATH:.

# . ~/.bash_profile
# mvn --version

Apache Maven 3.0.5 (r01de14724cdef164cd33c7c8c2fe155faf9602da; 2013-02-19 21:51:28+0800)
Maven home: /opt/apache-maven-3.0.5
Java version: 1.6.0_24, vendor: Sun Microsystems Inc.
Java home: /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "2.6.18-274.el5", arch: "amd64", family: "unix"


4. 下载hadoop-sync

# wget --no-check-certificate https://github.com/citusdata/hadoop-sync/archive/master.zip -O master.zip
# unzip master.zip


5. 安装hadoop-sync
使用maven安装hadoop-sync时需要连接外网, 下载依赖包. 所以首先要确保主机能通外网.
如果不能通外网, 那么请自行下载依赖包.
# cd hadoop-sync-master
ll

total 12
-rw-r--r-- 1 root root 2533 Feb 16 04:56 pom.xml
-rw-r--r-- 1 root root 3047 Feb 16 04:56 README.md
drwxr-xr-x 3 root root 4096 Feb 16 04:56 src

# cat pom.xml 

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

  <modelVersion>4.0.0</modelVersion>
  <groupId>com.citusdata.sync</groupId>
  <artifactId>hadoop-sync</artifactId>
  <version>0.1</version>
  <packaging>jar</packaging>
  <name>hadoop-sync</name>
  <url>http://maven.apache.org</url>

  <build>
        <plugins>
          <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>2.3.2</version>
                <configuration>
                  <source>1.6</source>
                  <target>1.6</target>
                </configuration>
          </plugin>            
          <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-jar-plugin</artifactId>
                <version>2.3.2</version>
                <configuration>
                  <archive>
                        <manifest>
                          <addClasspath>true</addClasspath>
                          <classpathPrefix>lib/</classpathPrefix>
                          <mainClass>com.citusdata.sync.hdfs.HdfsSynchronizer</mainClass>
                        </manifest>
                  </archive>
                </configuration>
          </plugin>
          <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-dependency-plugin</artifactId>
                <executions>
                  <execution>
                        <id>copy</id>
                        <phase>package</phase>
                        <goals>
                          <goal>copy-dependencies</goal>
                        </goals>
                        <configuration>
                          <outputDirectory>${project.build.directory}/lib</outputDirectory>
                        </configuration>
                  </execution>
                </executions>
          </plugin>
        </plugins>
    <resources>
    <resource><directory>src/main/config</directory></resource>
    </resources>
  </build>

  <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>

  <dependencies>
    <dependency>
      <groupId>commons-configuration</groupId>
      <artifactId>commons-configuration</artifactId>
      <version>1.6</version>
      <scope>compile</scope>
    </dependency>
        <dependency>
          <groupId>junit</groupId>
          <artifactId>junit</artifactId>
          <version>3.8.1</version>
          <scope>test</scope>
        </dependency>
        <dependency>
          <groupId>log4j</groupId>
          <artifactId>log4j</artifactId>
          <version>1.2.17</version>
          <type>jar</type>
        </dependency>
        <dependency>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>hadoop-core</artifactId>
          <version>1.0.4</version>
          <type>jar</type>
        </dependency>
        <dependency>
          <groupId>postgresql</groupId>
          <artifactId>postgresql</artifactId>
          <version>9.1-901.jdbc4</version>
        </dependency>
  </dependencies>

</project>

安装hadoop-sync
# mvn install

[INFO] Scanning for projects...
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building hadoop-sync 0.1
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-resources-plugin:2.5:resources (default-resources) @ hadoop-sync ---
[debug] execute contextualize
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 2 resources
[INFO] 
[INFO] --- maven-compiler-plugin:2.3.2:compile (default-compile) @ hadoop-sync ---
[INFO] Compiling 7 source files to /root/hadoop-sync-master/target/classes
[INFO] 
[INFO] --- maven-resources-plugin:2.5:testResources (default-testResources) @ hadoop-sync ---
[debug] execute contextualize
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /root/hadoop-sync-master/src/test/resources
[INFO] 
[INFO] --- maven-compiler-plugin:2.3.2:testCompile (default-testCompile) @ hadoop-sync ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-surefire-plugin:2.10:test (default-test) @ hadoop-sync ---
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/maven-plugin-api/2.0.9/maven-plugin-api-2.0.9.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/maven-plugin-api/2.0.9/maven-plugin-api-2.0.9.pom (2 KB at 1.4 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/maven/2.0.9/maven-2.0.9.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/maven/2.0.9/maven-2.0.9.pom (19 KB at 16.5 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/maven-parent/8/maven-parent-8.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/maven-parent/8/maven-parent-8.pom (24 KB at 9.7 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefire-booter/2.10/surefire-booter-2.10.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefire-booter/2.10/surefire-booter-2.10.pom (3 KB at 5.3 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefire-api/2.10/surefire-api-2.10.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefire-api/2.10/surefire-api-2.10.pom (3 KB at 4.0 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/surefire/maven-surefire-common/2.10/maven-surefire-common-2.10.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/surefire/maven-surefire-common/2.10/maven-surefire-common-2.10.pom (4 KB at 1.3 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/2.1/plexus-utils-2.1.pom
Downloaded: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/2.1/plexus-utils-2.1.pom (4 KB at 4.7 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/sonatype/spice/spice-parent/16/spice-parent-16.pom
Downloaded: http://repo.maven.apache.org/maven2/org/sonatype/spice/spice-parent/16/spice-parent-16.pom (9 KB at 7.3 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/sonatype/forge/forge-parent/5/forge-parent-5.pom
Downloaded: http://repo.maven.apache.org/maven2/org/sonatype/forge/forge-parent/5/forge-parent-5.pom (9 KB at 9.7 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/maven-artifact/2.0.9/maven-artifact-2.0.9.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/maven-artifact/2.0.9/maven-artifact-2.0.9.pom (2 KB at 2.8 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/maven-project/2.0.9/maven-project-2.0.9.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/maven-project/2.0.9/maven-project-2.0.9.pom (3 KB at 4.7 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/maven-settings/2.0.9/maven-settings-2.0.9.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/maven-settings/2.0.9/maven-settings-2.0.9.pom (3 KB at 3.6 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/maven-model/2.0.9/maven-model-2.0.9.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/maven-model/2.0.9/maven-model-2.0.9.pom (4 KB at 5.2 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/maven-profile/2.0.9/maven-profile-2.0.9.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/maven-profile/2.0.9/maven-profile-2.0.9.pom (3 KB at 3.6 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/maven-artifact-manager/2.0.9/maven-artifact-manager-2.0.9.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/maven-artifact-manager/2.0.9/maven-artifact-manager-2.0.9.pom (3 KB at 1.6 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/maven-repository-metadata/2.0.9/maven-repository-metadata-2.0.9.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/maven-repository-metadata/2.0.9/maven-repository-metadata-2.0.9.pom (2 KB at 3.3 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/maven-plugin-registry/2.0.9/maven-plugin-registry-2.0.9.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/maven-plugin-registry/2.0.9/maven-plugin-registry-2.0.9.pom (2 KB at 3.4 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/maven-core/2.0.9/maven-core-2.0.9.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/maven-core/2.0.9/maven-core-2.0.9.pom (8 KB at 6.8 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/maven-plugin-parameter-documenter/2.0.9/maven-plugin-parameter-documenter-2.0.9.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/maven-plugin-parameter-documenter/2.0.9/maven-plugin-parameter-documenter-2.0.9.pom (2 KB at 3.4 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/reporting/maven-reporting-api/2.0.9/maven-reporting-api-2.0.9.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/reporting/maven-reporting-api/2.0.9/maven-reporting-api-2.0.9.pom (2 KB at 3.1 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/reporting/maven-reporting/2.0.9/maven-reporting-2.0.9.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/reporting/maven-reporting/2.0.9/maven-reporting-2.0.9.pom (2 KB at 2.6 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/maven-error-diagnostics/2.0.9/maven-error-diagnostics-2.0.9.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/maven-error-diagnostics/2.0.9/maven-error-diagnostics-2.0.9.pom (2 KB at 3.0 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/maven-plugin-descriptor/2.0.9/maven-plugin-descriptor-2.0.9.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/maven-plugin-descriptor/2.0.9/maven-plugin-descriptor-2.0.9.pom (3 KB at 3.6 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/maven-monitor/2.0.9/maven-monitor-2.0.9.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/maven-monitor/2.0.9/maven-monitor-2.0.9.pom (2 KB at 2.2 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/maven-toolchain/2.0.9/maven-toolchain-2.0.9.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/maven-toolchain/2.0.9/maven-toolchain-2.0.9.pom (4 KB at 6.0 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-common-artifact-filters/1.3/maven-common-artifact-filters-1.3.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-common-artifact-filters/1.3/maven-common-artifact-filters-1.3.pom (4 KB at 6.4 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-shared-components/12/maven-shared-components-12.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-shared-components/12/maven-shared-components-12.pom (10 KB at 6.6 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/maven-parent/13/maven-parent-13.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/maven-parent/13/maven-parent-13.pom (23 KB at 13.2 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/apache/6/apache-6.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/apache/6/apache-6.pom (13 KB at 14.9 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-container-default/1.0-alpha-9/plexus-container-default-1.0-alpha-9.pom
Downloaded: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-container-default/1.0-alpha-9/plexus-container-default-1.0-alpha-9.pom (2 KB at 2.1 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefire-booter/2.10/surefire-booter-2.10.jar
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/surefire/maven-surefire-common/2.10/maven-surefire-common-2.10.jar
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-common-artifact-filters/1.3/maven-common-artifact-filters-1.3.jar
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefire-api/2.10/surefire-api-2.10.jar
Downloading: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/2.1/plexus-utils-2.1.jar
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefire-booter/2.10/surefire-booter-2.10.jar (34 KB at 19.7 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/reporting/maven-reporting-api/2.0.9/maven-reporting-api-2.0.9.jar
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-common-artifact-filters/1.3/maven-common-artifact-filters-1.3.jar (31 KB at 18.0 KB/sec)
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/reporting/maven-reporting-api/2.0.9/maven-reporting-api-2.0.9.jar (10 KB at 5.2 KB/sec)
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/surefire/maven-surefire-common/2.10/maven-surefire-common-2.10.jar (60 KB at 13.0 KB/sec)
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefire-api/2.10/surefire-api-2.10.jar (158 KB at 14.3 KB/sec)
Downloaded: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/2.1/plexus-utils-2.1.jar (220 KB at 13.6 KB/sec)
[INFO] No tests to run.
[INFO] Surefire report directory: /root/hadoop-sync-master/target/surefire-reports
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefire-junit3/2.10/surefire-junit3-2.10.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefire-junit3/2.10/surefire-junit3-2.10.pom (2 KB at 2.9 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefire-providers/2.10/surefire-providers-2.10.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefire-providers/2.10/surefire-providers-2.10.pom (3 KB at 2.1 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefire-junit3/2.10/surefire-junit3-2.10.jar
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefire-junit3/2.10/surefire-junit3-2.10.jar (26 KB at 11.3 KB/sec)

-------------------------------------------------------
 T E S T S
-------------------------------------------------------

Results :

Tests run: 0, Failures: 0, Errors: 0, Skipped: 0

[INFO] 
[INFO] --- maven-jar-plugin:2.3.2:jar (default-jar) @ hadoop-sync ---
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/maven-archiver/2.4.2/maven-archiver-2.4.2.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/maven-archiver/2.4.2/maven-archiver-2.4.2.pom (4 KB at 6.7 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-archiver/2.0.1/plexus-archiver-2.0.1.pom
Downloaded: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-archiver/2.0.1/plexus-archiver-2.0.1.pom (3 KB at 4.9 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/sonatype/spice/spice-parent/17/spice-parent-17.pom
Downloaded: http://repo.maven.apache.org/maven2/org/sonatype/spice/spice-parent/17/spice-parent-17.pom (7 KB at 11.8 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/sonatype/forge/forge-parent/10/forge-parent-10.pom
Downloaded: http://repo.maven.apache.org/maven2/org/sonatype/forge/forge-parent/10/forge-parent-10.pom (14 KB at 8.0 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.0/plexus-utils-3.0.pom
Downloaded: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.0/plexus-utils-3.0.pom (4 KB at 3.2 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-io/2.0.1/plexus-io-2.0.1.pom
Downloaded: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-io/2.0.1/plexus-io-2.0.1.pom (2 KB at 3.0 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-components/1.1.19/plexus-components-1.1.19.pom
Downloaded: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-components/1.1.19/plexus-components-1.1.19.pom (3 KB at 4.7 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus/3.0.1/plexus-3.0.1.pom
Downloaded: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus/3.0.1/plexus-3.0.1.pom (19 KB at 7.3 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/commons-lang/commons-lang/2.1/commons-lang-2.1.pom
Downloaded: http://repo.maven.apache.org/maven2/commons-lang/commons-lang/2.1/commons-lang-2.1.pom (10 KB at 8.7 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/maven-archiver/2.4.2/maven-archiver-2.4.2.jar
Downloading: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-io/2.0.1/plexus-io-2.0.1.jar
Downloading: http://repo.maven.apache.org/maven2/commons-lang/commons-lang/2.1/commons-lang-2.1.jar
Downloading: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.0/plexus-utils-3.0.jar
Downloading: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-archiver/2.0.1/plexus-archiver-2.0.1.jar
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/maven-archiver/2.4.2/maven-archiver-2.4.2.jar (20 KB at 9.1 KB/sec)
Downloaded: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-io/2.0.1/plexus-io-2.0.1.jar (57 KB at 20.5 KB/sec)
Downloaded: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-archiver/2.0.1/plexus-archiver-2.0.1.jar (176 KB at 19.4 KB/sec)
Downloaded: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.0/plexus-utils-3.0.jar (221 KB at 14.8 KB/sec)
Downloaded: http://repo.maven.apache.org/maven2/commons-lang/commons-lang/2.1/commons-lang-2.1.jar (203 KB at 12.7 KB/sec)
[INFO] Building jar: /root/hadoop-sync-master/target/hadoop-sync-0.1.jar
[INFO] 
[INFO] --- maven-dependency-plugin:2.1:copy-dependencies (copy) @ hadoop-sync ---
Downloading: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/1.5.1/plexus-utils-1.5.1.pom
Downloaded: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/1.5.1/plexus-utils-1.5.1.pom (3 KB at 4.0 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-sink-api/1.0-alpha-10/doxia-sink-api-1.0-alpha-10.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-sink-api/1.0-alpha-10/doxia-sink-api-1.0-alpha-10.pom (2 KB at 2.3 KB/sec)
...............................略...........................................
Downloaded: http://repo.maven.apache.org/maven2/plexus/plexus-utils/1.0.2/plexus-utils-1.0.2.jar (157 KB at 22.3 KB/sec)
Downloaded: http://repo.maven.apache.org/maven2/velocity/velocity/1.4/velocity-1.4.jar (353 KB at 13.5 KB/sec)
Downloaded: http://repo.maven.apache.org/maven2/velocity/velocity-dep/1.4/velocity-dep-1.4.jar (506 KB at 15.0 KB/sec)
[INFO] Copying ant-1.6.5.jar to /root/hadoop-sync-master/target/lib/ant-1.6.5.jar
[INFO] Copying commons-beanutils-1.7.0.jar to /root/hadoop-sync-master/target/lib/commons-beanutils-1.7.0.jar
[INFO] Copying commons-beanutils-core-1.8.0.jar to /root/hadoop-sync-master/target/lib/commons-beanutils-core-1.8.0.jar
[INFO] Copying commons-cli-1.2.jar to /root/hadoop-sync-master/target/lib/commons-cli-1.2.jar
[INFO] Copying commons-codec-1.4.jar to /root/hadoop-sync-master/target/lib/commons-codec-1.4.jar
[INFO] Copying commons-collections-3.2.1.jar to /root/hadoop-sync-master/target/lib/commons-collections-3.2.1.jar
[INFO] Copying commons-configuration-1.6.jar to /root/hadoop-sync-master/target/lib/commons-configuration-1.6.jar
[INFO] Copying commons-digester-1.8.jar to /root/hadoop-sync-master/target/lib/commons-digester-1.8.jar
[INFO] Copying commons-el-1.0.jar to /root/hadoop-sync-master/target/lib/commons-el-1.0.jar
[INFO] Copying commons-httpclient-3.0.1.jar to /root/hadoop-sync-master/target/lib/commons-httpclient-3.0.1.jar
[INFO] Copying commons-lang-2.4.jar to /root/hadoop-sync-master/target/lib/commons-lang-2.4.jar
[INFO] Copying commons-logging-1.1.1.jar to /root/hadoop-sync-master/target/lib/commons-logging-1.1.1.jar
[INFO] Copying commons-net-1.4.1.jar to /root/hadoop-sync-master/target/lib/commons-net-1.4.1.jar
[INFO] Copying hsqldb-1.8.0.10.jar to /root/hadoop-sync-master/target/lib/hsqldb-1.8.0.10.jar
[INFO] Copying junit-3.8.1.jar to /root/hadoop-sync-master/target/lib/junit-3.8.1.jar
[INFO] Copying log4j-1.2.17.jar to /root/hadoop-sync-master/target/lib/log4j-1.2.17.jar
[INFO] Copying jets3t-0.7.1.jar to /root/hadoop-sync-master/target/lib/jets3t-0.7.1.jar
[INFO] Copying kfs-0.3.jar to /root/hadoop-sync-master/target/lib/kfs-0.3.jar
[INFO] Copying commons-math-2.1.jar to /root/hadoop-sync-master/target/lib/commons-math-2.1.jar
[INFO] Copying hadoop-core-1.0.4.jar to /root/hadoop-sync-master/target/lib/hadoop-core-1.0.4.jar
[INFO] Copying jackson-core-asl-1.0.1.jar to /root/hadoop-sync-master/target/lib/jackson-core-asl-1.0.1.jar
[INFO] Copying jackson-mapper-asl-1.0.1.jar to /root/hadoop-sync-master/target/lib/jackson-mapper-asl-1.0.1.jar
[INFO] Copying core-3.1.1.jar to /root/hadoop-sync-master/target/lib/core-3.1.1.jar
[INFO] Copying jetty-6.1.26.jar to /root/hadoop-sync-master/target/lib/jetty-6.1.26.jar
[INFO] Copying jetty-util-6.1.26.jar to /root/hadoop-sync-master/target/lib/jetty-util-6.1.26.jar
[INFO] Copying jsp-2.1-6.1.14.jar to /root/hadoop-sync-master/target/lib/jsp-2.1-6.1.14.jar
[INFO] Copying jsp-api-2.1-6.1.14.jar to /root/hadoop-sync-master/target/lib/jsp-api-2.1-6.1.14.jar
[INFO] Copying servlet-api-2.5-20081211.jar to /root/hadoop-sync-master/target/lib/servlet-api-2.5-20081211.jar
[INFO] Copying servlet-api-2.5-6.1.14.jar to /root/hadoop-sync-master/target/lib/servlet-api-2.5-6.1.14.jar
[INFO] Copying oro-2.0.8.jar to /root/hadoop-sync-master/target/lib/oro-2.0.8.jar
[INFO] Copying postgresql-9.1-901.jdbc4.jar to /root/hadoop-sync-master/target/lib/postgresql-9.1-901.jdbc4.jar
[INFO] Copying jasper-compiler-5.5.12.jar to /root/hadoop-sync-master/target/lib/jasper-compiler-5.5.12.jar
[INFO] Copying jasper-runtime-5.5.12.jar to /root/hadoop-sync-master/target/lib/jasper-runtime-5.5.12.jar
[INFO] Copying xmlenc-0.52.jar to /root/hadoop-sync-master/target/lib/xmlenc-0.52.jar
[INFO] 
[INFO] --- maven-install-plugin:2.3.1:install (default-install) @ hadoop-sync ---
Downloading: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-digest/1.0/plexus-digest-1.0.pom
Downloaded: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-digest/1.0/plexus-digest-1.0.pom (2 KB at 1.9 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-components/1.1.7/plexus-components-1.1.7.pom
Downloaded: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-components/1.1.7/plexus-components-1.1.7.pom (5 KB at 5.8 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-digest/1.0/plexus-digest-1.0.jar
Downloaded: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-digest/1.0/plexus-digest-1.0.jar (12 KB at 10.4 KB/sec)
[INFO] Installing /root/hadoop-sync-master/target/hadoop-sync-0.1.jar to /root/.m2/repository/com/citusdata/sync/hadoop-sync/0.1/hadoop-sync-0.1.jar
[INFO] Installing /root/hadoop-sync-master/pom.xml to /root/.m2/repository/com/citusdata/sync/hadoop-sync/0.1/hadoop-sync-0.1.pom
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3:41.824s
[INFO] Finished at: Mon Mar 18 22:22:00 CST 2013
[INFO] Final Memory: 14M/247M
[INFO] ------------------------------------------------------------------------

# 安装完后的目录结构 : 

# ll
total 16
-rw-r--r-- 1 root root 2533 Feb 16 04:56 pom.xml
-rw-r--r-- 1 root root 3047 Feb 16 04:56 README.md
drwxr-xr-x 3 root root 4096 Feb 16 04:56 src
drwxr-xr-x 7 root root 4096 Mar 18 22:21 target
# cd target/
# ll -R
.:
total 44
drwxr-xr-x 3 root root  4096 Mar 18 22:18 classes
drwxr-xr-x 3 root root  4096 Mar 18 22:14 generated-sources
-rw-r--r-- 1 root root 24084 Mar 18 22:19 hadoop-sync-0.1.jar
drwxr-xr-x 2 root root  4096 Mar 18 22:21 lib
drwxr-xr-x 2 root root  4096 Mar 18 22:19 maven-archiver
drwxr-xr-x 2 root root  4096 Mar 18 22:22 surefire

./classes:
total 12
drwxr-xr-x 3 root root 4096 Mar 18 22:18 com
-rw-r--r-- 1 root root 1228 Mar 18 22:14 log4j.properties
-rw-r--r-- 1 root root  266 Mar 18 22:14 sync.properties

./classes/com:
total 4
drwxr-xr-x 3 root root 4096 Mar 18 22:18 citusdata

./classes/com/citusdata:
total 4
drwxr-xr-x 3 root root 4096 Mar 18 22:18 sync

./classes/com/citusdata/sync:
total 4
drwxr-xr-x 2 root root 4096 Mar 18 22:18 hdfs

./classes/com/citusdata/sync/hdfs:
total 60
-rw-r--r-- 1 root root 10403 Mar 18 22:18 CitusMasterNode.class
-rw-r--r-- 1 root root  5728 Mar 18 22:18 CitusWorkerNode.class
-rw-r--r-- 1 root root  6867 Mar 18 22:18 HdfsMasterNode.class
-rw-r--r-- 1 root root 12373 Mar 18 22:18 HdfsSynchronizer.class
-rw-r--r-- 1 root root  1332 Mar 18 22:18 HdfsSynchronizer$MetadataDifference.class
-rw-r--r-- 1 root root  2265 Mar 18 22:18 HdfsWorkerNode.class
-rw-r--r-- 1 root root   642 Mar 18 22:18 MinMaxValue.class
-rw-r--r-- 1 root root  1687 Mar 18 22:18 ShardPlacement.class

./generated-sources:
total 4
drwxr-xr-x 2 root root 4096 Mar 18 22:14 annotations

./generated-sources/annotations:
total 0

./lib:
total 16920
-rw-r--r-- 1 root root 1034049 Mar 18 22:21 ant-1.6.5.jar
-rw-r--r-- 1 root root  188671 Mar 18 22:21 commons-beanutils-1.7.0.jar
-rw-r--r-- 1 root root  206035 Mar 18 22:21 commons-beanutils-core-1.8.0.jar
-rw-r--r-- 1 root root   41123 Mar 18 22:21 commons-cli-1.2.jar
-rw-r--r-- 1 root root   58160 Mar 18 22:21 commons-codec-1.4.jar
-rw-r--r-- 1 root root  575389 Mar 18 22:21 commons-collections-3.2.1.jar
-rw-r--r-- 1 root root  298829 Mar 18 22:21 commons-configuration-1.6.jar
-rw-r--r-- 1 root root  143602 Mar 18 22:21 commons-digester-1.8.jar
-rw-r--r-- 1 root root  112341 Mar 18 22:21 commons-el-1.0.jar
-rw-r--r-- 1 root root  279781 Mar 18 22:21 commons-httpclient-3.0.1.jar
-rw-r--r-- 1 root root  261809 Mar 18 22:21 commons-lang-2.4.jar
-rw-r--r-- 1 root root   60686 Mar 18 22:21 commons-logging-1.1.1.jar
-rw-r--r-- 1 root root  832410 Mar 18 22:21 commons-math-2.1.jar
-rw-r--r-- 1 root root  180792 Mar 18 22:21 commons-net-1.4.1.jar
-rw-r--r-- 1 root root 3566844 Mar 18 22:21 core-3.1.1.jar
-rw-r--r-- 1 root root 3929148 Mar 18 22:21 hadoop-core-1.0.4.jar
-rw-r--r-- 1 root root  706710 Mar 18 22:21 hsqldb-1.8.0.10.jar
-rw-r--r-- 1 root root  136059 Mar 18 22:21 jackson-core-asl-1.0.1.jar
-rw-r--r-- 1 root root  270781 Mar 18 22:21 jackson-mapper-asl-1.0.1.jar
-rw-r--r-- 1 root root  405086 Mar 18 22:21 jasper-compiler-5.5.12.jar
-rw-r--r-- 1 root root   76698 Mar 18 22:21 jasper-runtime-5.5.12.jar
-rw-r--r-- 1 root root  377780 Mar 18 22:21 jets3t-0.7.1.jar
-rw-r--r-- 1 root root  539912 Mar 18 22:21 jetty-6.1.26.jar
-rw-r--r-- 1 root root  177131 Mar 18 22:21 jetty-util-6.1.26.jar
-rw-r--r-- 1 root root 1024680 Mar 18 22:21 jsp-2.1-6.1.14.jar
-rw-r--r-- 1 root root  134910 Mar 18 22:21 jsp-api-2.1-6.1.14.jar
-rw-r--r-- 1 root root  121070 Mar 18 22:21 junit-3.8.1.jar
-rw-r--r-- 1 root root   11981 Mar 18 22:21 kfs-0.3.jar
-rw-r--r-- 1 root root  489884 Mar 18 22:21 log4j-1.2.17.jar
-rw-r--r-- 1 root root   65261 Mar 18 22:21 oro-2.0.8.jar
-rw-r--r-- 1 root root  539705 Mar 18 22:21 postgresql-9.1-901.jdbc4.jar
-rw-r--r-- 1 root root  134133 Mar 18 22:21 servlet-api-2.5-20081211.jar
-rw-r--r-- 1 root root  132368 Mar 18 22:21 servlet-api-2.5-6.1.14.jar
-rw-r--r-- 1 root root   15010 Mar 18 22:21 xmlenc-0.52.jar

./maven-archiver:
total 4
-rw-r--r-- 1 root root 112 Mar 18 22:19 pom.properties

./surefire:
total 0


6. 配置hadoop-sync
# 配置文件范例
# cat sync.properties 

# HDFS related cluster configuration settings
HdfsMasterNodeName = localhost
HdfsMasterNodePort = 9000
HdfsWorkerNodePort = 50020

# CitusDB related cluster configuration settings
CitusMasterNodeName = localhost
CitusMasterNodePort = 5432
CitusWorkerNodePort = 9700


7. 用法

# java -jar hadoop-sync-0.1.jar table_name [--fetch-min-max]


【参考】
5. hadoop-sync readme

hadoop-sync
hadoop-sync is a Java application that synchronizes block metadata from the HDFS NameNode to the CitusDB master node. The application also creates a colocated foreign table on CitusDB worker nodes for each block in the Hadoop File System (HDFS). Optionally, the application also collects statistics about the underlying HDFS blocks.

hadoop-sync is idempotent. You can run it multiple times; it calculates metadata differences between the HDFS NameNode and CitusDB master node, and only syncs metadata for newly added or removed HDFS blocks. If no such blocks exist, the application does nothing.

hadoop-sync also leaves metadata on the CitusDB master node in a consistent state. When you run the application, it either completes by propagating enough metadata that makes recent HDFS blocks queriable; or it errors out by reverting recent metadata updates and leaves CitusDB in the queriable state it was in before hadoop-sync was run.

hadoop-sync assumes that you are already running a Hadoop cluster and you have your CitusDB databases colocated on this cluster. For step by step instructions on how to get started, please see our documentation page at http://citusdata.com/docs/sql-on-hadoop

Usage
Our documentation page explains in detail running the hadoop-sync application. The following section talks about a few details that relate to hadoop-sync's internals. Please note that these details may change over time as hadoop-sync is still in Beta form.

Now, assuming that you already have a Hadoop cluster with data loaded into it, CitusDB deployed on the Hadoop cluster, and a distributed foreign table created on the CitusDB master node, you synchronize metadata like the following:

java -jar target/hadoop-sync-0.1.jar table_name [--fetch-min-max]
The distributed foreign table name is a required argument, and hadoop-sync will only synchronize block metadata for that particular table. The fetch-min-max is an optional argument that triggers the collection of min/max statistics for each new HDFS block. Since this requires going over all records in the HDFS block, passing this argument notably increases synchronization times. On the other hand, with these statistics, CitusDB can perform optimizations such as partition or join pruning that dramatically reduce query execution times.

It is worth mentioning that these min/max statistics are only relevant in the context of distributed queries. For local queries, the worker nodes themselves already use PostgreSQL's statistics collector to optimize queries, if the foreign data wrapper supports data sampling.

It is also worth noting that hadoop-sync relies on a properties file to determine node names and port numbers that Hadoop and CitusDB clusters use. This sync.properties file is bundled with the JAR file; and you can change the default settings by using emacs or vi on the JAR bundle, or by copying the file from target/classes/sync.properties to the current directory and editing it.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值