Hadoop复习（六） --- 使用DBWriterable完成同MySql的交互，机架感知，配置高可用HA

最新推荐文章于 2019-11-26 14:14:34 发布

疯狂学习的白菜

最新推荐文章于 2019-11-26 14:14:34 发布

阅读量255

点赞数

分类专栏：大数据 Hadoop

本文链接：https://blog.csdn.net/xcvbxv01/article/details/82426187

版权

大数据同时被 2 个专栏收录

142 篇文章 3 订阅

订阅专栏

Hadoop

17 篇文章 1 订阅

订阅专栏

一、使用DBWriterable完成同MySql的交互
------------------------------------------------------------
0.pom.xml中增加mysql驱动
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.17</version>
</dependency>

1.准备数据库
-- use mydata;
-- create table words(id int primary key auto_increment , txt varchar(255));
-- desc words;
-- insert into words (txt) values ('hello world tom');
-- insert into words (txt) values ('hello1 world1 tom1');
-- insert into words (txt) values ('hello world tom2');
-- insert into words (txt) values ('hello3 world3 tom3');
-- alter table words add column name varchar(255);
-- update words set name = 'tomeslee';

2.编写hadoop的DBWriterable
注意：在读取和写入数据库数据的时候，一定要注意列ID[ppst.setString(2, name)]要与AppMain函数中select查询的列id(name)对应上,否则可能导致读取的数据是其它列数据
//数据库的串行化,将数据写入DB
@Override
public void write(PreparedStatement ppst) throws SQLException {
ppst.setString(2, name);
ppst.setString(3,txt);
}

//数据库的反串行化，从DB中读取数据
@Override
public void readFields(ResultSet rs) throws SQLException {
name = rs.getString(2);
txt = rs.getString(3);
}

-----------------------------------------------------------------------------------------------------------------------

public class MyDBWritable implements Writable,DBWritable {

    private String name;
    private String txt;

    private String word;
    private int count;

    public String getWord() {
        return word;
    }

    public void setWord(String word) {
        this.word = word;
    }

    public int getCount() {
        return count;
    }

    public void setCount(int count) {
        this.count = count;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public String getTxt() {
        return txt;
    }

    public void setTxt(String txt) {
        this.txt = txt;
    }

    //数据库的串行化,将数据写入stats表中对用的第1,2列数据中
    @Override
    public void write(PreparedStatement ppst) throws SQLException {
        ppst.setString(1, word);
        ppst.setInt(2,count);
    }

    //数据库的反串行化，从DB中读取数据,从words表中读取第2,3列的数据
    @Override
    public void readFields(ResultSet rs) throws SQLException {
        name = rs.getString(2);
        txt = rs.getString(3);
    }

    //序列化和反序列化
    @Override
    public void write(DataOutput out) throws IOException {

        out.writeUTF(name);
        out.writeUTF(txt);
        out.writeUTF(word);
        out.writeInt(count);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        name = in.readUTF();
        txt = in.readUTF();
        word = in.readUTF();
        count = in.readInt();
    }
}

3.在MyApp的main函数中设置DB相关,数据输入源为数据库的表words
//配置数据库信息
DBConfiguration.configureDB(job.getConfiguration(),
"com.mysql.jdbc.Driver",
"jdbc:mysql://localhost:3306/mydata",
"mysql",
"mysql");

//配置数据输入源，注意此处select id = 1,name = 2,txt = 3的顺序与DBWriterable的串行和反串行的顺序是一一对应的，不要弄乱
DBInputFormat.setInput(job,MyDBWritable.class,
"select id,name,txt from words ",
"select count(*) from words");

-------------------------------------------------------------------------------------------------------------------

public class MyApp {

    public static void main(String [] args)
    {
        try {
            Configuration conf = new Configuration();

            Job job = Job.getInstance(conf);
            if(args.length > 1)
            {
                FileSystem.get(conf).delete(new Path(args[1]));
            }
            //设置job
            job.setJobName("WC");
            job.setJarByClass(MyApp.class);
            //FileOutputFormat.setOutputPath(job,new Path(args[1]));

            //配置数据库信息
            DBConfiguration.configureDB(job.getConfiguration(),
                    "com.mysql.jdbc.Driver",
                    "jdbc:mysql://192.168.0.104:3306/mydata",
                    "mysql",
                    "mysql");


            //配置数据输入源
            DBInputFormat.setInput(job,MyDBWritable.class,
                    "select id,name,txt from words ",
                    "select count(*) from words");

            //设置输出路径--输出到数据库
            DBOutputFormat.setOutput(job,"stats","word","count");


            //设定map和reduce
            job.setMapperClass(MyMapper.class);
            job.setReducerClass(MyReducer.class);
            //设定任务属性
            job.setNumReduceTasks(2);
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);
            //
            job.waitForCompletion(true);

        } catch (Exception e) {
            e.printStackTrace();
        }

    }
}

------------------------------------------------------------------------------------------------------

/**
 * Mapper类
 */
public class MyMapper extends Mapper<LongWritable, MyDBWritable,Text,IntWritable> {

    @Override
    protected void map(LongWritable key, MyDBWritable value, Context context) throws IOException, InterruptedException {

        System.out.println("key: " + key + "valuename: " + value.getName() + "_" +value.getTxt());
        String v = value.getTxt();
        String [] strs = v.split(" ");
        for (String s : strs) {
            context.write(new Text(s), new IntWritable(1));
        }
    }
}

-------------------------------------------------------------------------------------------------

/**
 * Reducer类
 */
public class MyReducer extends Reducer<Text, IntWritable,MyDBWritable, NullWritable> {

    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

        int count = 0;
        Iterator<IntWritable> it = values.iterator();
        while(it.hasNext())
        {
            int i = it.next().get();
            count += i;
        }
        MyDBWritable db = new MyDBWritable();
        db.setWord(key.toString());
        db.setCount(count);
        context.write(db,NullWritable.get());
    }
}

4.将MR的统计结果，写入到MySql数据库，数据输出源为数据库的表stats
a.准备表stats
create table stats(word varchar(255),count int);

b.设置数据库的输出格式
DBOutputFormat.setOutput(job,"stats","word","count");

二、机架感知，优化hadoop集群方式的一种
---------------------------------------------
1.默认的副本放置策略
首选本地机架的两个节点存放两个副本，第三个副本存放在另一个不同机架的一个节点上

2.设定机架感知
a.自定义实现类
b.配置[core-site.xml] 的 topology.node.switch.mapping.impl 值为自定义的实现类的名称
c.导出jar包
d.复制jar包到/soft/hadoop/share/hadoop/common/lib下
f.重启名称节点

三、配置高可用HA
---------------------------------------------------
1.两个状态： active激活态 standby待命态

2.active与客户端进行交互，standby不交互，仅仅是同步active的各种操作状态，
以保证当active节点挂掉的时候，立马接替工作

3.两个节点都和JN守护进程构成的组（一般三台主机）进行通信（因为编辑日志，镜像文件已经不再active上保存了，
取而代之的是在JN这个组上保存，为了保证副本数，一般是3个节点组成一个JN组）。

4.active的任何操作都会同步到JNs上，standby则会时刻的从JNs上读取状态，以保证和active的同步。
在灾难发生的时候，standby会读取JNs上的所有编辑日志信息，保证和active挂掉的时候状态一直，紧接着开始服役，变成active

5.在HA模式下，为了保证standby的快速服役，DataNode不再单一向Active的NameNode发送心跳和块信息，而是同时向Actice和Standby发送块信息

6.同一时刻只能有一个激活态的名称节点。以免脑裂。谁Active,谁担任JNs的写入工作

7.硬件资源条件：
a.名称节点：两个（Active和standby）名称节点，硬件配置一致。
b.JNs:JournalNodes,至少由三个节点组成，一般为基数个。用于执行JN守护进程（一种轻量级的进程，
一般不单独开主机，而是在其他线程节点上运行）。JNs允许挂掉的最大节点数为（N-1）/2.

8.HA状态下，不需再单独配置辅助名称节点2NN，因为2NN的工作已经被Standby节点取代了

9.部署HA
-.配置的两个名称节点应该具有同样的配置，包括ssh要能无密连接到所有节点

a.配置dfs.nameservices[hdfs-site.xml]
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>

b.配置集群下的名称节点nn1,nn2：dfs.ha.namenodes.[nameservice ID]
[hdfs-site.xml]


<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>

c.配置每个nn的PRC地址[hdfs-site.xml]
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>s100:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>s500:8020</value>
</property>

d.配置webui端口[hdfs-site.xml]
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>s100:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>s500:50070</value>
</property>

e.配置名称节点共享编辑目录的jn地址[hdfs-site.xml]
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://s200:8485;s300:8485;s400:8485/mycluster</value>
</property>

f.配置dfs.client.failover.proxy.provider.[nameservice ID]
java类，client使用它判断哪个节点是激活态。
[hdfs-site.xml]
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

g.配置dfs.ha.fencing.methods
脚本列表或者java类，在容灾保护激活态的nn.
[hdfs-site.xml]
<property>
<name>dfs.ha.fencing.methods</name>

<value>
sshfence
</value>
</property>

<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/ubuntu/.ssh/id_rsa</value>
</property>

h.配置hdfs文件系统名称服务:fs.defaultFS
[core-site.xml]
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>

i.配置JN存放edit的本地路径:dfs.journalnode.edits.dir
[hdfs-site.xml]
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/ubuntu/hadoop/journal</value>
</property>

10.部署注意事项
a.修改配置文件并分发到所有节点。将所有节点的符号连接设置到hadoop_ha上

b.在JNs的所有jn节点上启动jn守护进程
$>hadoop-daemon.sh start journalnode

c.启动jn之后，要在两个nn之间进行数据同步处理
-- 如果是新的集群，请先在其中一个名称节点上，格式化文件系统
-- 如果是非ha转换成ha,那么将原来nn1（s100）的镜像文件和编辑日志拷贝到另一个nn2(s500)上。
拷贝完之后，在nn2(s500)上运行$>hdfs namenode -bootstrapStandby,实现待命状态引导,
注意：需要nn1(s100)为启动状态,而且当提示是否重新格式化,选择N.

d.在nn1或者nn2上执行以下命令，完成sharededit日志到jn节点的传输，然后查看s300,s400,s500是否有edit数据.
$>hdfs namenode -initializeSharedEdits

e.启动所有节点

11.管理HA
$>hdfs haadmin -transitionToActive nn1 //切成激活态
$>hdfs haadmin -transitionToStandby nn1 //切成待命态
$>hdfs haadmin -transitionToActive --forceactive nn2 //强行激活
$>hdfs haadmin -failover nn1 nn2 //模拟容灾，在nn1和nn2之间切换active

疯狂学习的白菜

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hadoop复习（六） --- 使用DBWriterable完成同MySql的交互，机架感知，配置高可用HA

一、使用DBWriterable完成同MySql的交互------------------------------------------------------------ 0.pom.xml中增加mysql驱动 &lt;dependency&gt; &lt;groupId&gt;mysql&lt;/groupId&gt; ...
复制链接

扫一扫

专栏目录