SparkSQL+Hbase+HDFS实现SQL完全封装(二)

注:SparkSQL+Hbase+HDFS实现SQL完全封装(二) 是在一的基础上做了一些优化

1.描述:

通过读取SQL脚本文件内的SQL脚本,实现在大数据平台中的业务处理,对于没有JAVA或是语言基础的同学来说,通过封装完全可以只写SQL就能实现业务的处理。

注:

      优点:部署后团队其它人员只要会写SQL就行。

      缺点:优化方面也只能对写的SQL进行优化

2.平台环境

Spark:spark-2.2.1-bin-hadoop2.7

3.具体思路:

 通过读取HDFS上的SQL脚本文件[可以直接放到Linux上面],解析SQL脚本获取SparkSQL需要的原表、目标表、目标表的字段名以及查询SQL,通过SparkSession.sql()方法执行查询SQL返回结果保存到目标表。中间通过广播的方式源表以及要注册的字段广播到个个节点。

注:SPARKSQL在注册的时候需要指定字段名也就是表头,而HBASE的存储数据的方式为K,V的形式,如果某条记录的一个字段是没有值的情况下那这条记录的K是没有的,那在SPARKSQL中执行的时候会报数组越界的错误,所以我们只要我们需要的字段。

4.涉及到的技术点:

1.SQL解析

2.Broadcast广播

5.代码实现如下:

(一) Main 方法:RealTimeTransaction

说明:调用的入口

public class RealTimeTransaction {
  
  
    public static void main(String[] args){
        //deal:SQL文件所在的文件夹名
//m03_test:SQL文件名
//true:是否对目标表插入数据
        SparkSQLUtils.sparkSQLMain("deal","m03_test",true);
    }
}

(二) SparkSQLUtils

说明:实现的主体,针对配置文件的内容放入ConfigEntity实体类中。

import com.chs.realtime.entity.ConfigEntity;
  import net.sf.jsqlparser.JSQLParserException;
  import org.apache.hadoop.conf.Configuration;
  import org.apache.hadoop.hbase.Cell;
  import org.apache.hadoop.hbase.CellUtil;
  import org.apache.hadoop.hbase.client.Put;
  import org.apache.hadoop.hbase.client.Result;
  import org.apache.hadoop.hbase.client.Table;
  import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
  import org.apache.hadoop.hbase.mapreduce.TableInputFormat;
  import org.apache.hadoop.hbase.util.Bytes;
  import org.apache.spark.api.java.JavaPairRDD;
  import org.apache.spark.api.java.JavaRDD;
  import org.apache.spark.api.java.JavaSparkContext;
  import org.apache.spark.api.java.function.FlatMapFunction;
  import org.apache.spark.api.java.function.VoidFunction;
  import org.apache.spark.broadcast.Broadcast;
  import org.apache.spark.sql.Dataset;
  import org.apache.spark.sql.Row;
  import org.apache.spark.sql.RowFactory;
  import org.apache.spark.sql.SparkSession;
  import org.apache.spark.sql.catalyst.TableIdentifier;
  import org.apache.spark.sql.catalyst.parser.ParseException;
  import org.apache.spark.sql.types.DataTypes;
  import org.apache.spark.sql.types.StructField;
  import org.apache.spark.sql.types.StructType;
  import scala.Tuple2;
  
  import java.util.*;
  
  /**
 * Created by chsong on 2018/2/9.
 */
  public class SparkSQLUtils {
  
  
    private static SparkSession getSparkSession(String appName) {
        SparkSession spark = SparkSession.builder()
                .appName(appName)
  //                .master("local[5]")
                .config("spark.sql.codegen", "true")
                .config("spark.sql.inMemoryColumnStorage.compressed", "true")
  //                .config("spark.sql.shuffle.partitions", "1000")
                //       .config("spark.kryoserializer.buffer.max", "1280")
                //       .config("spark.sql.inMemoryColumnarStorage.batchSize", "1000")
                .getOrCreate();
        return spark;
    }
  
  
    /**
     * 对整体逻辑的封装,具体功能查看方法内的说明
     *
     * @param path:SQL文件所在文件夹名
     * @param filename         :SQL文件名
     * @param isInsert         :是否执行插入操作
     */
    public static void sparkSQLMain(String path, String filename, boolean isInsert) {
  
        //SQL文件路径
        String sqlPath = null;
        //获取文件路径
        ConfigEntity centity = DataConfigUtils.dataConfigInit();
        if (path == null) {
            sqlPath = centity.getFilePath() + "/" + filename + ".sql";
        } else {
            sqlPath = centity.getFilePath() + "/" + path + "/" + filename + ".sql";
        }
        //获取表和字段配置文件内容
        String sqlColumnFile = centity.getFilePath() + "/" + centity.getSqlColumnFile();
        Map<String, List<String>> sqlColumnMap = SQLAnalysisUtils.getSelectColumnMap(sqlColumnFile, centity.getFileSystemType());
        //获取提交条数
        int commitNum = Integer.parseInt(centity.getCommitNum());
        System.out.println("======="+commitNum);
  
  
        /******************************1.以下部分把MAP内的内容读取出来*******************/
        //整体读取出来
        List<String> sqlTeamDetailedList = FileReadUtils.getFileValue(sqlPath, centity.getFileSystemType());
        //去掉注释
        List<String> sqlTeamDetailedClearList = FileReadUtils.fileAnalysis(sqlTeamDetailedList);
        //按insert分成多个
        List<String> sqlTeamList = SQLAnalysisUtils.getSqlTeamList(sqlTeamDetailedClearList);
  
        List<Map> sqlConfigList = new ArrayList<>();
        try {
            //解析SQL
            for (String insertSQL : sqlTeamList) {
                Map insertSQLConfig = SQLAnalysisUtils.getSQLPares(insertSQL);
                //提交条数放入MAP
                insertSQLConfig.put("commitNum", commitNum);
                sqlConfigList.add(insertSQLConfig);
            }
  
        } catch (JSQLParserException e) {
            e.printStackTrace();
        }
        //SparkSQL执行
        SparkSession sparkSession = getSparkSession(filename);
        JavaSparkContext javaSparkContext = JavaSparkContext.fromSparkContext(sparkSession.sparkContext());
  
        int tableNameNum = 0;
        List<String> tableNameList=new ArrayList<>();
        String selSQL="";
        String tgtTableName="";
        Broadcast<Map> tgtBroadcastMap=null;
        Dataset<Row> nameDf=null;
        for (Map insertSQLMap : sqlConfigList) {
            //固定写法得到源表的表名
             tableNameList = (ArrayList<String>) insertSQLMap.get("srcTableName");
            //固定写法得到业务逻辑处理的SQL脚本,这里只是包括查询部分
             selSQL = (String) insertSQLMap.get("SQL");
            //固定写法得到业务逻辑处理的目标表名
             tgtTableName = (String) insertSQLMap.get("tgtTableName");
            //广播配置数据
            tgtBroadcastMap = javaSparkContext.broadcast(insertSQLMap);
            //源业务系统表视图化
//            try {
                for (String srcTableName : tableNameList) {
  //                    TableIdentifier tableIdentifier = sparkSession.sessionState().sqlParser().parseTableIdentifier(srcTableName);
//                    if (!sparkSession.sessionState().catalog().tableExists(tableIdentifier)) {
                        SparkSQLUtils.initSparkSQLTable(sparkSession, javaSparkContext, srcTableName, sqlColumnMap);
  //                    }
                }
  //            } catch (ParseException e) {
//                e.printStackTrace();
//            }
             nameDf = sparkSession.sql(selSQL);
            //目标表数据视图化
            if (tableNameNum != sqlConfigList.size()) {
                SparkSQLUtils.insertTgtData(tgtTableName, nameDf, tgtBroadcastMap);
                SparkSQLUtils.initSparkSQLTable(sparkSession, javaSparkContext, tgtTableName, sqlColumnMap);
            }
            tableNameNum +=1;
        }
        if (isInsert) {
            SparkSQLUtils.insertTgtData(tgtTableName, nameDf, tgtBroadcastMap);
        } else {
            nameDf.show();
        }
    }
  
  
    /**
     * @param sparkSession
     * @param javaSparkContext
     * @param tableNameStr
     */
    public static void initSparkSQLTable(SparkSession sparkSession, JavaSparkContext javaSparkContext, String tableNameStr, Map<String, List<String>> sqlColumnMap) {
  
  
        //获取HBASE连接
        Configuration hbaseConfig = SparkHBaseUtils.getConfiguration();
  
        //循环遍历把要查询的表视图化
        hbaseConfig.set(TableInputFormat.INPUT_TABLE, tableNameStr);
  
        JavaPairRDD<ImmutableBytesWritable, Result> hbaseRDD = javaSparkContext.newAPIHadoopRDD(hbaseConfig, TableInputFormat.class, ImmutableBytesWritable.class, Result.class);
        List<String> columnList = sqlColumnMap.get(tableNameStr);
        Broadcast<List<String>> broadcastList = javaSparkContext.broadcast(columnList);
        //匹配数据
        JavaRDD<Row> recordValuesRDD = hbaseRDD/*.repartition(50)*/.mapPartitions(new FlatMapFunction<Iterator<Tuple2<ImmutableBytesWritable, Result>>, Row>() {
            public Iterator<Row> call(Iterator<Tuple2<ImmutableBytesWritable, Result>> iter) throws Exception {
                //存储源表字段的LIST
                List<String> srcColumnList = broadcastList.getValue();
  
                //存储所有记录的LIST
                List<Row> recordList = new ArrayList<Row>();
  
                while (iter.hasNext()) {
                    Result result = iter.next()._2;
                    //存储一条记录的LIST用于返回ROW
                    List<String> columnList = new ArrayList<String>();
                    //存储一条记录的MAP
                    Map columnMap = new HashMap();
                    Cell[] cells = result.rawCells();
  
                    for (Cell cell : cells) {
                        columnMap.put(new String(CellUtil.cloneQualifier(cell)), new String(CellUtil.cloneValue(cell)));
                    }
                    for (String columnStr : srcColumnList) {
                        if (!columnMap.containsKey(columnStr)) {
                            columnMap.put(columnStr, "");
                        }
                        columnList.add((String) columnMap.get(columnStr));
                    }
                    recordList.add((Row) RowFactory.create(columnList.toArray()));
                }
                return recordList.iterator();
            }
        });
  
        //设置字段
        List<StructField> structFields = new ArrayList<StructField>();
        for (String columnStr : columnList) {
            structFields.add(DataTypes.createStructField(columnStr, DataTypes.StringType, true));
        }
        //新建列schema
        StructType schema = DataTypes.createStructType(structFields);
        Dataset employeesDataset = sparkSession.createDataFrame(recordValuesRDD, schema);
        employeesDataset.printSchema();
        //spark表视图
        employeesDataset.createOrReplaceTempView(tableNameStr);
        System.out.println("开始cache表:" + tableNameStr);
        sparkSession.sqlContext().cacheTable(tableNameStr);
        System.out.println("完成cache表:" + tableNameStr);
    }
  
    /**
     * @param tgtTableName
     * @param nameDf
     * @param broadcastMap
     */
    private static void insertTgtData(String tgtTableName, Dataset<Row> nameDf, Broadcast<Map> broadcastMap) {
        //清空目标表数据
//        SparkHBaseUtils.truncateData(tgtTableName);
        //插入数据
       nameDf.javaRDD().coalesce(100).foreachPartition(new VoidFunction<Iterator<Row>>() {
            @Override
            public void call(Iterator<Row> rowIterator) throws Exception {
                Put recordPut = null;
                Map confMap = broadcastMap.getValue();
                //获取目标表名
                String tableName = (String) confMap.get("tgtTableName");
                Table tgtTableClass = SparkHBaseUtils.getTable(tableName);
                //固定写法得到目标表的列字段名
                List<String> columnList = (ArrayList<String>) confMap.get("tgtTableColumn");
                //获取提交数
                int commitNum = (int) confMap.get("commitNum");
                List<Put> recordPutList = new ArrayList<Put>();
                while (rowIterator.hasNext()) {
                    Row record = rowIterator.next();
                    recordPut = new Put(Bytes.toBytes(record.getAs("tgtid").toString()));
                    //循环遍历
                    for (String columnName : columnList) {
                        if (!columnName.equals("tgtid")) {
                            if (record.getAs(columnName) == null) {
                                recordPut.addColumn(Bytes.toBytes("data"), Bytes.toBytes(columnName), Bytes.toBytes(""));
                            } else {
                                recordPut.addColumn(Bytes.toBytes("data"), Bytes.toBytes(columnName), Bytes.toBytes(record.getAs(columnName).toString()));
                            }
                        }
                    }
                    recordPutList.add(recordPut);
                    if (recordPutList.size() == commitNum) {
                        SparkHBaseUtils.insertData(tgtTableClass, recordPutList);
                        recordPutList.clear();
                    }
                }
                SparkHBaseUtils.insertData(tgtTableClass, recordPutList);
            }
        });
    }
  
  
}

 

(三)  ConfigEntity

说明:实体类主要是存放配置文件的内容

public class ConfigEntity {
    private String filePath;
    private String fileSystemType;
    private String sqlColumnFile;
    private String commitNum;
  
    public String getCommitNum() {
        return commitNum;
    }
  
    public void setCommitNum(String commitNum) {
        this.commitNum = commitNum;
    }
  
    public String getSqlColumnFile() {
        return sqlColumnFile;
    }
  
    public void setSqlColumnFile(String sqlColumnFile) {
        this.sqlColumnFile = sqlColumnFile;
    }
  
    public String getFilePath() {
        return filePath;
    }
  
    public void setFilePath(String filePath) {
        this.filePath = filePath;
    }
  
    public String getFileSystemType() {
        return fileSystemType;
    }
  
    public void setFileSystemType(String fileSystemType) {
        this.fileSystemType = fileSystemType;
    }
}

 

 

(四) 读取配置文件类:DataConfigUtils

说明:读取配置文件的内容存到实体类中

import com.chs.realtime.entity.ConfigEntity;
  
  import java.io.IOException;
  import java.util.Properties;
  
  /**
 * Created by chsong on 2018/2/27.
 */
  public class DataConfigUtils {
  
    public static ConfigEntity dataConfigInit(){
        Properties mqconfig = new Properties();
        ConfigEntity  ce = new ConfigEntity();
        try {
            mqconfig.load(DataConfigUtils.class.getClassLoader().getResourceAsStream("config.properties"));
        } catch (IOException e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        }
        String filePath =mqconfig.getProperty("filePath");
        String fileSystemType=mqconfig.getProperty("fileSystemType");
        String sqlColumnFile = mqconfig.getProperty("sqlColumnFile");
        String commitNum = mqconfig.getProperty("commitNum");
        ce.setFilePath(filePath);
        ce.setFileSystemType(fileSystemType);
        ce.setSqlColumnFile(sqlColumnFile);
        ce.setCommitNum(commitNum);
        return ce;
    }
}

 

 

(五) 表和字段配置文件读取类:FileReadUtils

在读取HBASE的数据的时候我们通过从配置文件中读取到的表名和字段名来确定要注册的表用哪些字段。里面也包括对SQL文件中的注释处理的方法。

import net.sf.jsqlparser.JSQLParserException;
  import org.apache.hadoop.conf.Configuration;
  import org.apache.hadoop.fs.FileSystem;
  import org.apache.hadoop.fs.Path;
  
  import java.io.*;
  import java.util.ArrayList;
  import java.util.List;
  import java.util.Map;
  
  /**
 * 针对配置文件的读取类
 */
  public class FileReadUtils {
  
  
  
    /**
     * 输入文件路径和文件系统类型读取文件内容
     *
     * @param filePath
     * @return
     */
    public static List<String> getFileValue(String filePath, String fileSystemType) {
        List<String> fileValueList = new ArrayList<>();
        Configuration conf = new Configuration();
        BufferedReader confBuff = null;
        try {
            InputStream confIn = null;
            if (fileSystemType.equals("HDFS")) {
                //读取HDFS上的文件系统
                FileSystem fs = FileSystem.get(conf);
                // 调取任务的配置信息
                Path confPath = new Path(filePath);
                //设置流读入和写入
                confIn = fs.open(confPath);
            }
            if (fileSystemType.equals("OS")) {
                File readFile = new File(filePath);
                confIn = new FileInputStream(readFile);
            }
  
            //使用缓冲流,进行按行读取的功能
            confBuff = new BufferedReader(new InputStreamReader(confIn));
            String confStr = null;
            while ((confStr = confBuff.readLine()) != null) {
                fileValueList.add(confStr);
            }
            confBuff.close();
            confIn.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return fileValueList;
  
    }
  
    /**
     * 解析SQL文件
     *
     * @param fileSrcList
     * @return
     */
    public static List<String> fileAnalysis(List<String> fileSrcList) {
        List<String> fileSQLList = new ArrayList<>();
        String keyOneStr = null;
        String keyTwoStr = null;
        for (String confStr : fileSrcList) {
            //截取注释
            if (confStr.trim().length() != 0) {
                if (confStr.trim().length() >= 3) {
                    keyOneStr = confStr.trim().substring(0, 3);
                    keyTwoStr = confStr.trim().substring(0, 2);
                    if (!keyOneStr.equals("/**") && !keyTwoStr.equals("--")) {
                        //本行无注释
                        if (confStr.indexOf("/**") == -1 && confStr.indexOf("--") == -1) {
                            fileSQLList.add(confStr);
                        }
                        //本行以/**开头后面的注释包含--
                        if ((confStr.indexOf("/**") > -1 && confStr.indexOf("--") > -1) && (confStr.indexOf("/**") < confStr.indexOf("--"))) {
                            fileSQLList.add(confStr.substring(0, confStr.indexOf("/**")) + " ");
                        }
                        //本行以--开头后面的注释包含/**
                        if ((confStr.indexOf("/**") > -1 && confStr.indexOf("--") > -1) && (confStr.indexOf("/**") > confStr.indexOf("--"))) {
                            fileSQLList.add(confStr.substring(0, confStr.indexOf("--")) + " ");
                        }
                        //本行以/**注释开头
                        if (confStr.indexOf("/**") > -1 && confStr.indexOf("--") == -1) {
                            fileSQLList.add(confStr.substring(0, confStr.indexOf("/**")) + " ");
                        }
                        //本行以--注释开头
                        if (confStr.indexOf("/**") == -1 && confStr.indexOf("--") > -1) {
                            fileSQLList.add(confStr.substring(0, confStr.indexOf("--")) + " ");
                        }
                    }
                } else {
                    fileSQLList.add(confStr + " ");
                }
            }
        }
        return fileSQLList;
    }
  
  
}

 

 

(六) SQL解析类:SQLAnalysisUtils

通过JSqlParser来实现SQL文件的解析
import com.chs.realtime.entity.ConfigEntity;
  import net.sf.jsqlparser.JSQLParserException;
  import net.sf.jsqlparser.parser.CCJSqlParserManager;
  import net.sf.jsqlparser.schema.Column;
  import net.sf.jsqlparser.statement.insert.Insert;
  import net.sf.jsqlparser.statement.select.*;
  import net.sf.jsqlparser.util.TablesNamesFinder;
  
  import java.io.StringReader;
  import java.util.ArrayList;
  import java.util.HashMap;
  import java.util.List;
  import java.util.Map;
  
  import static com.chs.realtime.dataset.comm.FileReadUtils.getFileValue;
  
  /**
 * Created by chsong on 2018/2/12.
 */
  public class SQLAnalysisUtils {
  
    /**
     * 通过输入的SQL语句获取插入表名与插入的列
     * @param sqlStr
     * @return
     * @throws JSQLParserException
     */
    public static Map getSQLPares(String sqlStr) throws JSQLParserException {
        //解析SQL后把要的内容存储到MAP里
        Map sqlPareMap = new HashMap();
        //生成对象
        CCJSqlParserManager pm = new CCJSqlParserManager();
        //返回一个InsertStatement对象
        System.out.println("sqlStr ================ " + sqlStr);
        Insert insertStatement = (Insert) pm.parse(new StringReader(sqlStr));
        //返回要插入的目标表表名
        String insertTableName=insertStatement.getTable().getName();
        //放入MAP里
        sqlPareMap.put("tgtTableName",insertTableName);
        //通过目标表名得到字段名
        List<String> tgtTableColumnList = new ArrayList<>();
         tgtTableColumnList=SparkHBaseUtils.getTableColumn(insertTableName);
        //如果目标表为空字段名直接从SQL语句里取得
        if(tgtTableColumnList.size()==0||tgtTableColumnList==null){
            tgtTableColumnList = getColumnName(insertStatement);
        }
        //把返回的列名LIST放入MAP里
        sqlPareMap.put("tgtTableColumn", tgtTableColumnList);
        //把insert语句后面跟着的SELECT语句放到MAP里
        sqlPareMap.put("SQL",insertStatement.getSelect().toString());
        //返回一个查询对象
        Select selectStatement = (Select) pm.parse(new StringReader(insertStatement.getSelect().toString()));
        TablesNamesFinder tablesNamesFinder = new TablesNamesFinder();
        //获取查询对象中源表的表名LIST
        List<String> tableNameList = tablesNamesFinder.getTableList(selectStatement);
        //放入到MAP里
        sqlPareMap.put("srcTableName",tableNameList);
  
        return sqlPareMap;
    }
  
    /**
     * 返回SQL语句中INSERT后面的字段名
     * @param insertStatement
     * @return
     * @throws JSQLParserException
     */
    public static List<String> getColumnName(Insert insertStatement) throws JSQLParserException {
        List<String> columnNameList = new ArrayList<String>();
        List<Column> columns=insertStatement.getColumns();
        for(Column column:columns){
            columnNameList.add(column.getColumnName());
        }
        return columnNameList;
    }
  
    public  static Map<String,List<String>>  getSelectColumnMap(String confFile,String fileSystemType){
        String tableName =null;
        String[] columns =null;
        List<String> columnList = null;
        Map<String,List<String>> tableConfigMap = new HashMap<>();
        List<String> fileValueList = getFileValue(confFile,fileSystemType);
        for(String tableStr:fileValueList){
            tableName = tableStr.split("\\|")[0];
            columns = tableStr.split("\\|")[1].split(",");
            columnList = new ArrayList<>();
            for(String column:columns){
                columnList.add(column);
            }
            tableConfigMap.put(tableName,columnList);
        }
        return tableConfigMap;
    }
  
    /**
     * 返回SQL文件内容
     *
     * @param sqlPath
     * @param fileSystemType
     * @return
     */
    public static Map getSQLConfig(String sqlPath, String fileSystemType) {
        Map tableConfigMap = null;
        List<String> tableList = new ArrayList<String>();
        List<String> fileValueList = FileReadUtils.fileAnalysis(getFileValue(sqlPath,fileSystemType));
        String sqlValueStr = "";
        for (String lineStr : fileValueList) {
            sqlValueStr = sqlValueStr + lineStr + "\n";
        }
        try {
            tableConfigMap = SQLAnalysisUtils.getSQLPares(sqlValueStr);
        } catch (JSQLParserException e) {
            e.printStackTrace();
        }
        return tableConfigMap;
    }
  
    /**
     * 把明细SQL组成一个新的SQL组
     * @param sqlTeamDetailedList
     * @return
     */
    public static List<String> getSqlTeamList(List<String> sqlTeamDetailedList){
        List<String> sqlTeamList = new ArrayList<String>();
        String sqlStr = "";
        for (String lineStr : sqlTeamDetailedList) {
            sqlStr = sqlStr + lineStr + "\n";
            if(lineStr.toLowerCase().indexOf(";")!=-1){
                sqlTeamList.add(sqlStr);
                sqlStr = "";
            }
        }
        return sqlTeamList;
    }
}
 
 

(七) HBASE处理类:SparkHBaseUtils

包括数据库连接以及插入等操作

import org.apache.hadoop.conf.Configuration;
  import org.apache.hadoop.hbase.*;
  import org.apache.hadoop.hbase.client.*;
  import org.apache.hadoop.hbase.filter.CompareFilter;
  import org.apache.hadoop.hbase.filter.FilterList;
  import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
  import org.apache.hadoop.hbase.filter.SubstringComparator;
  import org.apache.hadoop.hbase.util.Bytes;
  import java.io.IOException;
  import java.util.ArrayList;
  import java.util.List;
  
  
  public class SparkHBaseUtils {
  
    private static final String QUORUM = "192.168.1.7";
    private static final String CLIENTPORT = "2181";
    private static Configuration conf = null;
    private static Connection conn = null;
  
  
    /**
     * 获取全局唯一的Configuration实例
     * @return
     */
    public static synchronized Configuration getConfiguration()
    {
        if(conf == null)
        {
            conf HBaseConfiguration.create();
            conf.set("hbase.zookeeper.quorum", QUORUM);
            conf.set("hbase.zookeeper.property.clientPort", CLIENTPORT);
            conf.addResource("hbase-site.xml");
            conf.addResource("core-site.xml");
        }
        return conf;
    }
  
    /**
     * 获取全局唯一的HConnection实例
     * @return
     *
     */
    public static synchronized Connection getHConnection() {
  
        if (conf == null){
            getConfiguration();
        }
  
        if(conn == null || conn.isClosed())
        {
           
            try {
                conn  = ConnectionFactory.createConnection(conf);
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
  
        return conn;
    }
  
  
    /**
     * 通过目标表名去目标库查询表中字段名
     * @param tableName
     * @return
     */
    public static List<String> getTableColumn(String tableName){
        List<String> columnList = new ArrayList<>();
        Connection conn = SparkHBaseUtils.getHConnection();
        TableName hbaseTableName = TableName.valueOf(tableName);
        try {
            Table table = conn.getTable(hbaseTableName);
  
            ResultScanner scan = table.getScanner(Bytes.toBytes("data"));
  
            Result resutl = scan.next();
            if(resutl!=null) {
                Cell[] cells = resutl.rawCells();
                for (Cell cell : cells) {
                    columnList.add(new String(CellUtil.cloneQualifier(cell)));
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
  
        return columnList;
    }
  
    /**
     * 清空数据
     * @param tableName
     */
    public static void truncateData(String tableName){
        Connection conn = SparkHBaseUtils.getHConnection();
        TableName hbaseTableName = TableName.valueOf(tableName);
        try {
            // 数据库元数据操作对象
            Admin admin;
            // 取得一个数据库元数据操作对象
            admin = conn.getAdmin();
            // 设置表状态为无效
            admin.disableTable(hbaseTableName);
            // 清空指定表的数据
            admin.truncateTable(hbaseTableName, true);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
  
    /**
     * 插入数据输入表名和putList
     * @param tableName
     * @param putList
     */
    public static void insertData(String tableName,List<Put> putList){
        Connection conn = SparkHBaseUtils.getHConnection();
        TableName hbaseTableName = TableName.valueOf(tableName);
        try {
            Table table = conn.getTable(hbaseTableName);
            for(Put put:putList){
                table.put(put);
            }
  
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
  
    /**
     * 插入数据输入表名和putList
     * @param table
     * @param putList
     */
    public static void insertData( Table table,List<Put> putList){
  
        try {
            table.put(putList);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    public static Table getTable(String tableName){
        Connection conn = SparkHBaseUtils.getHConnection();
        TableName hbaseTableName = TableName.valueOf(tableName);
        try {
            Table table = conn.getTable(hbaseTableName);
            return table;
        } catch (IOException e) {
            e.printStackTrace();
            return null;
        }
    }
  
    // 关闭连接
    public static void connClose() {
        try {
            if (null != conn)
                conn.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
  
    }
}

 

(八) 配置文件:

filePath=/home/app/sqlfile
  fileSystemType=HDFS
  
  #filePath=./sqlfile/
#fileSystemType=OS
  sqlColumnFile=columns.properties
  
  #插入数据到目标表时1000条提交一次
  commitNum=1000

 

(九) 表字段对应配置文件

Table_name1|id,create_time,update_time,name
table_name2|id,create_time,update_time,applyno,status

 

 


评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值