Carbondata并发修改表问题

最近在使用carbondata,业务逻辑中需要实现同时、多个线程共同写入一个表中,即并发写入。
看了官方文档,关于并发操作表的描述特别少:
carbon.lock.type

This configuration specifies the type of lock to be acquired during concurrent operations on table. There are following types of lock implementation: - LOCALLOCK: Lock is created on local file system as file. This lock is useful when only one spark driver (thrift server) runs on a machine and no other CarbonData spark application is launched concurrently. - HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when multiple CarbonData spark applications are launched and no ZooKeeper is running on cluster and HDFS supports file based locking.
大概翻译一下:
这个属性指定了并发操作表时需要获取的锁的类型,包括两种:LOCALLOCK和HDFSLOCK
其中LOCALLOCK建立在本地文件系统上,当只有一个spark driver运行在一台机器上并且没有其他spark应用同时运行时,可以采用这种锁(个人理解也就是单机提交并发任务);
而HDFSLOCK是建立在HDFS文件系统上的,当多个spark应用运行并且没有ZooKeeper管理时可以使用该锁(也就是多台客户端提交了并发任务)

以下是java代码,用于构建一个SparkSession:

 //初始化,获取配置文件
    private static PropertiesConfiguration cfg = null;
    private static String sparkIP;
    private static String hdfsIP;
    private static String hadoopUserName;
    private static String dataPath;
    private static String maxCores;

    static {
        try {
            cfg = new PropertiesConfiguration("cloud.properties");
            sparkIP = cfg.getString("spark.ip");
            hdfsIP = cfg.getString("hdfs.ip");
            hadoopUserName = cfg.getString("hadoop.user.name");
            dataPath = cfg.getString("data.path");
            maxCores = cfg.getString("spark.cores.max");
        } catch (ConfigurationException e) {
            e.printStackTrace();
        }
        // 当文件的内容发生改变时,配置对象也会刷新
        cfg.setReloadingStrategy(new FileChangedReloadingStrategy());
    }

    /**
     * 获取sparkSession,用于操作数据库
     * @return
     */
    public static SparkSession getSparkSession() {
        // 1.需要设置HADOOP用户名与远程文件用户名一致
        System.setProperty("HADOOP_USER_NAME", hadoopUserName);
        // 2.创建SparkConf对象,设置相关配置信息
        SparkConf conf = new SparkConf()
                .setAppName("carbon")
                .setMaster("spark://" + sparkIP + ":7077")
                .set("spark.cores.max", maxCores);

        // 3.设置锁类型
        CarbonProperties.getInstance().addProperty(CarbonCommonConstants.LOCK_TYPE, "HDFSLOCK");
        // 4.构建sparkSession
        SparkSession sparkSession = CarbonSession
                .CarbonBuilder(
                        SparkSession
                                .builder()
                                .config(conf)
                                .config("hive.metastore.uris","thrift://" + sparkIP + ":9083")
                )
                .getOrCreateCarbonSession("hdfs://" + hdfsIP + ":8020" + dataPath);
        return sparkSession;
    }
    /**
     * 加载csv文件到数据库
     * @param csv 文件路径
     * @param tableName 表名称
     * @param delimiter 分隔符
     * @return
     */
    public static boolean loadDataToTable(String csv, String tableName, String delimiter) {
        SparkSession sparkSession = getSparkSession();
        if (StringUtils.isBlank(delimiter)) {
            delimiter = ",";
        }
        String options = " OPTIONS('DELIMITER'='" + delimiter + "')";
        sparkSession.sql("LOAD DATA INPATH 'hdfs://" + hdfsIP + ":8020" + csv + "' INTO TABLE " + tableName + options);
//        sparkSession.close();
        return true;
    }

值得注意的是:生成的SparkSession是单例的!!!所以不能轻易关闭session

测试5个线程并发将csv数据插入carbondata表:

@Test
public void testConcurrentWriteToHDFS() throws InterruptedException {

    CountDownLatch latch = new CountDownLatch(5);
    for (int i = 0; i < 5; i++) {
        new Thread(() -> {
            CloudUtils.loadDataToTable("/opt/event_log_01.csv", "event_log", ",");
            String threadName = Thread.currentThread().getName();
            System.out.println("==============" + eventLogDao.countByWhereClause(""));
            System.out.println(threadName + " is finished");
            latch.countDown();
        }).start();
    }
    latch.await();
    System.out.println("all threads are finished");
    List<EventLog> eventLogs = eventLogDao.selectAll();
    System.out.println(eventLogs);
}

测试结果:

18/01/27 17:06:05 INFO LoadTable: Thread-22 Initiating Direct Load for the Table : (default.event_log)
18/01/27 17:06:05 INFO LoadTable: Thread-23 Initiating Direct Load for the Table : (default.event_log)
18/01/27 17:06:05 INFO LoadTable: Thread-20 Initiating Direct Load for the Table : (default.event_log)
18/01/27 17:06:05 INFO LoadTable: Thread-21 Initiating Direct Load for the Table : (default.event_log)
18/01/27 17:06:05 INFO LoadTable: Thread-19 Initiating Direct Load for the Table : (default.event_log)
18/01/27 17:06:05 INFO CarbonLockFactory: Thread-22 Configured lock type is: HDFSLOCK
18/01/27 17:06:05 INFO HdfsFileLock: Thread-22 HDFS lock path:hdfs://192.168.0.181:8020/opt/default/event_log/tablestatus.lock
18/01/27 17:06:05 INFO HdfsFileLock: Thread-21 HDFS lock path:hdfs://192.168.0.181:8020/opt/default/event_log/tablestatus.lock
18/01/27 17:06:05 INFO HdfsFileLock: Thread-23 HDFS lock path:hdfs://192.168.0.181:8020/opt/default/event_log/tablestatus.lock
18/01/27 17:06:05 INFO CarbonLoaderUtil: Thread-23 Acquired lock for tabledefault.event_log for table status updation
18/01/27 17:06:05 INFO HdfsFileLock: Thread-20 HDFS lock path:hdfs://192.168.0.181:8020/opt/default/event_log/tablestatus.lock
18/01/27 17:06:05 ERROR HdfsFileLock: Thread-20 failed to create file /opt/default/event_log/tablestatus.lock for DFSClient_NONMAPREDUCE_-720553362_71 for client 172.16.30.20 because current leaseholder is trying to recreate file.
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3175)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:3005)
...
18/01/27 17:06:05 INFO HdfsFileLock: Thread-19 HDFS lock path:hdfs://192.168.0.181:8020/opt/default/event_log/tablestatus.lock
18/01/27 17:06:05 ERROR HdfsFileLock: Thread-19 failed to create file /opt/default/event_log/tablestatus.lock for DFSClient_NONMAPREDUCE_-720553362_71 for client 172.16.30.20 because current leaseholder is trying to recreate file.
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3175)

可见,运行期间各线程会去尝试获取锁,如果表已被锁,则无法获取,然后报错
但是最终所有线程均执行完成!五个线程都将数据写入表中!

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 4
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值