mac中配置hadoop常见问题

最新推荐文章于 2024-04-09 10:26:21 发布

kunjxl

最新推荐文章于 2024-04-09 10:26:21 发布

阅读量565

点赞数 1

分类专栏： hadoop 文章标签： mac 搭建 hadoop eclipse MapReduce

本文链接：https://blog.csdn.net/gongkunjxl/article/details/52563035

版权

hadoop 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

最近在mac中配置hadoop的伪分布式，遇到一些问题，尤其是配置完成后再eclipse中不能识别加入的hadoop-eclipse-plugin-2.6.4.jar软件包，作为一名刚接触hadoop的菜鸟，将自己的经验记录下来，仅供大家作为参考，不喜勿喷！

1.安装jdk1.8

下载并安装好jdk,我使用的是最新的jdk-8u101-macosx-x64.dmg，这里一定要注意，后面hdfs出现的一大堆往往是因为jdk和eclipse的版本问题，我尝试很多版本后才搭建起来。下载地址:http://www.oracle.com/technetwork/java/javase/downloads/index-jsp-138363.html，直接官网下载，一步一步点击安装就可以了。默认条件下会安装到:/Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home目录下。

运行java -version查看安装成功

2.下载hadoop2.6.4

我这里配置的是hadoop2.6.4，更高版本的需要自己制作eclipse的插件，可以参考往上如何制作插件的教程，下载地址:http://www-eu.apache.org/dist/hadoop/common/

选择相应的版本下载即可。

3.配置ssh免密码登陆和Hadoop环境变量

mac 系统自带ssh，无需进行安装，只需要配置免密登录即可。 ssh-keygen -t rsa -P "" 生成公钥 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 追加到文件 ssh localhost 测试

环境变量的配置 vim ~/.bash-profile,添加如下的环境变量，然后执行source ~/.bash_profile使配置文件生效。

4.配置为分布式环境

这里只需要修改五个文件就可以，

hadoop-env.sh文件

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home

export HADOOP_HOME=/Users/gongkun/work/hadoop-2.6.4

export PATH=$PATH:/Users/gongkun/work/hadoop-2.6.4/bin

core-site.xml文件:

<configuration>
<property>
  <name>hadoop.native.lib</name>
  <value>true</value>
  <description>Should native hadoop libraries, if present, be used.</description>
</property>
<property>
 <name>hadoop.tmp.dir</name>
<value>/Users/gongkun/work/hadoop-2.6.4/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

hdfs-site.xml文件：

configuration>
	<property>
<name>dfs.replication</name>
        <value>2</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/Users/gongkun/work/hadoop-2.6.4/hdfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/Users/gongkun/work/hadoop-2.6.4/hdfs/data</value>
</property>
</configuration>

mapred-site.xml文件

<configuration>
<property> 
<name>mapreduce.framework.name</name>
  <value>yarn</value>
  </property>
<property> 
  <name>mapreduce.framework.name</name> 
  <value>yarn</value> 
</property> 
<property>
  <name>mapred.tasktracker.map.tasks.maximum</name>
  <value>2</value>
</property>
<property>
  <name>mapred.tasktracker.reduce.tasks.maximum</name>
  <value>2</value>
</property> 
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
  </property>
</configuration>

yarn-site.xml文件

<configuration>

<!-- Site specific YARN configuration properties -->
<configuration>
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
</configuration>
</configuration>

5.启动和测试

每次删除前面残留的log等，如果不清楚对于伪分布式可能会造成hdfs格式化时出错。 1.格式化hdfs，每次重启Hadoop都需要执行这一步，$HADOOP_HOME/bin namenode -format 2. 启动NameNode和DataNode的守护进程:$HADOOP_HOME/sbin/start-dfs.sh 3. 启动ResourceManager和NodeManager的守护进程:$HADOOP_HOME/sbin/start-yarn.sh 或者前面2，3步直接执行: $HADOOP_HOME/sbin/start-all.sh也可以，停止命令: $HADOOP_HOME/sbin/stop-all.sh4.查看执行结果 jps如下：

5. 访问localhost:50070和localhost:8088测试是否正常:输入localhost:8088时

这个地方容易出现的问题就是端口被占用的情况，可以使用sudo lsof -i -P | grep -i "listen"查看被占端口，使用sudo kill -9 61342 杀死进程(后面是ID号)6.测试单词统计程序hadooop fs -mkdir /input创建一个输入文件夹，hadoop fs -put *.txt /input拷贝一些英文文档放在该目录下可以使用hdfs dfs -ls [-R] <args>查看文件拷贝是否成功,执行:hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /input /output 测试样例，查看结果:cat output。hdfs的操作参考:http://blog.csdn.net/u010220089/article/details/45937417#t17

这个地方别人说得也不是很清楚，网上的博客也比较模糊，我具体展示一下测试countword步骤:

1.创建一个文件夹: bin/hdfs dfs -mkdir /input

2.上传一个文件: bin/hdfs dfs -put LICENSE.txt /input

3.测试统计词频程序: hadoop jar share/hadoop/map reduce/hadoop-mapreduce-examples-2.6.4.jar wordcount /input /ouput

4.查看结果:在localhost:50070的Utilities标签-->Browse file system的下面的output下的结果文件下载打开就是结果了。

6.eclipse中配置Hadoop

1.下载eclipse相应的版本，这个版本非常重要，很多版本不能识别hadoop编译出来的hadoop-eclipse-plugin-2.6.4.jar，我这里下载的是Eclipse 4.4 for Mac OS X，安装好eclipse，这个在官网上有。

2.将 hadoop-eclipse-plugin-2.6.4.jar库下载地址然后将这个拷贝如eclipse的XXXX/ eclipse/Eclipse.app/Contents/Eclipse/plugins中，找到eclipse安装文件，右键--》显示内容就可以找到这个目录。

3.在eclipse中Windows--->Perspective---》Other 可以看到Map/Reduce,我两个端口都是配的9000，这个和core-site中配置的一样。

附上test的代码:

package wordCount;
import java.io.IOException;
import java.util.StringTokenizer;


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;


public class wordCount {


  /**
   * the Mapper class
   * INPUT: KEY: line
   * <span style="white-space:pre">		</span>VALUE: text--lineString
   * 
   * OUTPUT:KEY: text--token
   * <span style="white-space:pre">		</span>VALUE:count--1
   *
   */
  public static class TokenizerMapper 
       extends Mapper<Object, Text, Text, IntWritable>{
    
<span style="white-space:pre">	</span>//define the constant number ONE
    private final static IntWritable one = new IntWritable(1);
    
    private Text word = new Text();
      
    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
    <span style="white-space:pre">	</span>
      StringTokenizer itr = new StringTokenizer(value.toString());
      
      while (itr.hasMoreTokens()) {
    <span style="white-space:pre">	</span>  
        word.set(itr.nextToken());
        
        context.write(word, one);
      }
    }
  }
  
  
  /**
   * the reducer class
   * calculate the count sum of each word
   * 
   * INPUT: KEY: text
   * <span style="white-space:pre">		</span>VALUE: IntWritable
   * 
   * OUTPUT:KEY:text
   * <span style="white-space:pre">		</span>VALUE: IntWritable
   *
   */
  public static class IntSumReducer 
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();


    //values are the set of each node or file 
    public void reduce(Text key, Iterable<IntWritable> values, 
                       Context context
                       ) throws IOException, InterruptedException {
    <span style="white-space:pre">	</span>
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      
      result.set(sum);
      
      context.write(key, result);
    }
  }
  
  /**
   * 
   * @param args
   * @throws Exception
   * one arguments is the input path
   * the other arguments is the output path
   */
  public static void main(String[] args) throws Exception {
<span style="white-space:pre">	</span>  
<span style="white-space:pre">	</span>Configuration conf = new Configuration();
<span style="white-space:pre">	</span>String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
<span style="white-space:pre">	</span>if (otherArgs.length != 2) {
<span style="white-space:pre">	</span>  System.err.println("Usage: Wordcount <in> <out>");
<span style="white-space:pre">	</span>  System.exit(2);
<span style="white-space:pre">	</span>}
<span style="white-space:pre">	</span>
<span style="white-space:pre">	</span>@SuppressWarnings("deprecation")
<span style="white-space:pre">	</span>Job job = new Job(conf, "Word count");
<span style="white-space:pre">	</span>job.setJarByClass(wordCount.class);
<span style="white-space:pre">	</span>job.setJobName("Word count test");
<span style="white-space:pre">	</span>
<span style="white-space:pre">	</span>job.setMapperClass(TokenizerMapper.class);
<span style="white-space:pre">	</span>
<span style="white-space:pre">	</span>//combiner use the same method as reducer and calculate the local count sum
<span style="white-space:pre">	</span>job.setCombinerClass(IntSumReducer.class);
<span style="white-space:pre">	</span>
<span style="white-space:pre">	</span>//reducer task get the total count of each word
<span style="white-space:pre">	</span>job.setReducerClass(IntSumReducer.class);
<span style="white-space:pre">	</span>
<span style="white-space:pre">	</span>//set the output KEY and VALUE type
<span style="white-space:pre">	</span>job.setOutputKeyClass(Text.class);
<span style="white-space:pre">	</span>job.setOutputValueClass(IntWritable.class);
<span style="white-space:pre">	</span>
<span style="white-space:pre">	</span>//use the arguments to set the job input and output path
<span style="white-space:pre">	</span>FileInputFormat.addInputPath(job, new Path("/Users/gongkun/work/hadoop-2.6.4/README.txt"));
<span style="white-space:pre">	</span>FileOutputFormat.setOutputPath(job, new Path("/Users/gongkun/work/hadoop-2.6.4/result1.txt"));
<span style="white-space:pre">	</span>
<span style="white-space:pre">	</span>System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

kunjxl

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
mac中配置hadoop常见问题

最近在mac中配置hadoop的伪分布式，遇到一些问题，尤其是配置完成后再eclipse中不能识别加入的hadoop-eclipse-plugin-2.6.4.jar软件包，作为一名刚接触hadoop的菜鸟，将自己的经验记录下来，仅供大家作为参考，不喜勿喷！1.安装jdk1.8下载并安装好jdk,我使用的是最先的jdk1.8，这里一定要注意，后面hdfs出现的一大堆往往是因为jdk和ecl
复制链接

扫一扫