ClickHouse源码阅读(0001 0000) —— CK Server对SQL的处理（INSERT SQL - 2）

最新推荐文章于 2022-11-04 11:26:42 发布

B_e_a_u_tiful1205

最新推荐文章于 2022-11-04 11:26:42 发布

阅读量472

点赞数

分类专栏： Dive into ClickHouse

本文链接：https://blog.csdn.net/B_e_a_u_tiful1205/article/details/106882653

版权

Dive into ClickHouse 专栏收录该内容

16 篇文章 26 订阅

订阅专栏

这篇文章主要介绍的是对于INSERT SQL，执行器interpreter调用execute()方法构造block 流的过程，对于具体的写入数据过程，在TCPHandler::processInsertQuery() 方法中，今天来具体分析下：

先看下带注释的代码：

    void TCPHandler::processInsertQuery(const Settings &global_settings) {
        /** Made above the rest of the lines, so that in case of `writePrefix` function throws an exception,
          *  client receive exception before sending data.
          *  最先执行writePrefix()函数, 如果其抛出异常, 客户端就能在发送数据之前接收该异常
          */
        state.io.out->writePrefix();

        /// Send ColumnsDescription for insertion table   发送插入数据目标表的ColumnsDescription
        if (client_revision >= DBMS_MIN_REVISION_WITH_COLUMN_DEFAULTS_METADATA) {
            const auto &db_and_table = query_context->getInsertionTable();
            if (query_context->getSettingsRef().input_format_defaults_for_omitted_fields)
                sendTableColumns(query_context->getTable(db_and_table.first, db_and_table.second)->getColumns());
        }

        /// Send block to the client - table structure. 发送包含表结构的block给client, client将数据按照这个格式发送给server
        //由 Client.cpp中 processInsertQuery() -> receiveSampleBlock()的方法接收并进行后续格式化数据并发送给server
        sendData(state.io.out->getHeader());

        //这个方法是重点, 主要方法调用链readData() -> receivePacket() -> receiveData() -> write()
        readData(global_settings);
        state.io.out->writeSuffix();
        state.io.onFinish();
    }

这里面是重点readData(global_settings)方法, 主要方法调用链readData() -> receivePacket() -> receiveData() -> write()

再看下receiveData()方法：

    bool TCPHandler::receiveData() {
        initBlockInput();

        /// The name of the temporary table for writing data, default to empty string
        String external_table_name;
        readStringBinary(external_table_name, *in);

        /// Read one block from the network and write it down 从网络中读取一个block, 并写到下层流/文件
        Block block = state.block_in->read();

        if (block) {
            /// If there is an insert request, then the data should be written directly to `state.io.out`.
            /// Otherwise, we write the blocks in the temporary `external_table_name` table.
            // 如果是INSERT语句, need_receive_data_for_insert = true, 则直接写到state.io.out
            // 如果是INSERT SELECT语句, need_receive_data_for_insert = false, 先写到临时的external_table_name表中
            if (!state.need_receive_data_for_insert) {
                StoragePtr storage;
                /// If such a table does not exist, create it.
                if (!(storage = query_context->tryGetExternalTable(external_table_name))) {
                    NamesAndTypesList columns = block.getNamesAndTypesList();
                    //临时表external_table_name是一个Memory引擎的表
                    storage = StorageMemory::create(external_table_name, ColumnsDescription{columns});
                    storage->startup();
                    query_context->addExternalTable(external_table_name, storage);
                }
                /// The data will be written directly to the table.   data会直接被写到table中
                state.io.out = storage->write(ASTPtr(), *query_context);
            }
            if (block)
                state.io.out->write(block);//data会直接被写到table中
            return true;
        } else
            return false;
    }

逻辑还是比较清晰的：

1、读取到一个block后，根据INSERT SQL 和 INSERT SELECT SQL两种类型，进行不同的处理；

2、对于INSERT SQL，直接调用IBlockOutputStream的write()方法将数据写入table中；

3、对于INSERT SELECT SQ，写第一个block的时候会创建一个Memory引擎的临时表external_table_name，执行该表的startup()。然后再调用Memory引擎临时表的write()方法（最终调用的还是IBlockOutputStream的write()方法）将数据写入table中。

（这里注意下startup()方法，针对不同的表引擎有不同的实现，有的有具体实现，没有具体实现的表示do nothing）

对于IBlockOutputStream的write()方法，看其具体实现，发现一部分是将数据写入下层流的，一部分是将数据写入具体的table的。

具体分析几个write()方法的具体实现吧。

1、MemoryBlockOutputStream，这个应该就是对应着Memory引擎的表，表中的数据以block list的形式保存在内存中。

class MemoryBlockOutputStream : public IBlockOutputStream
{
public:
    explicit MemoryBlockOutputStream(StorageMemory & storage_) : storage(storage_) {}

    Block getHeader() const override { return storage.getSampleBlock(); }

    void write(const Block & block) override
    {
        storage.check(block, true);
        std::lock_guard lock(storage.mutex);
        storage.data.push_back(block);
    }
private:
    StorageMemory & storage;
};

StorageMemory中有一个成员变量BlocksList data;保存着所有的block，向Memory引擎的表中写入数据调用storage.data.push_back(block);即可。

2、MergeTreeBlockOutputStream，对应着MergeTree引擎的表。

    void MergeTreeBlockOutputStream::write(const Block &block) {
        storage.delayInsertOrThrowIfNeeded();//如果一个partition中的part过多(可能是merge不过来了), 可能会延迟插入或抛出异常

        /// split Block Into Parts
        /// 将一个block分割成多个block, 每个block对应一个part (每个part中的分区都是一样的, 具有相同分区的part组成了partition)
        auto part_blocks = storage.writer.splitBlockIntoParts(block, max_parts_per_block);
        for (auto &current_block : part_blocks) {
            Stopwatch watch;

            //写temp part
            MergeTreeData::MutableDataPartPtr part = storage.writer.writeTempPart(current_block);
            //重命名temp part
            storage.renameTempPartAndAdd(part, &storage.increment);
            //添加新part
            PartLog::addNewPart(storage.global_context, part, watch.elapsed());

            /// Initiate async merge - it will be done if it's good time for merge and if there are space in 'background_pool'.
            ///启动异步合并. 如果是应该进行merge且background_pool中有空余线程, 则会执行merge
            storage.background_task_handle->wake();//唤醒background_pool中的merge线程
        }
    }

每一步都增加注释了，就不具体说了。但是里面的方法还是值得看一下的。

3、DistributedBlockOutputStream，对应着Distributed引擎的表。

void DistributedBlockOutputStream::write(const Block & block)
{
    if (insert_sync)
        writeSync(block);
    else
        writeAsync(block);
}

根据同步写入还是异步写入，分别执行不同的逻辑。

B_e_a_u_tiful1205

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ClickHouse源码阅读(0001 0000) —— CK Server对SQL的处理（INSERT SQL - 2）

这篇文章主要介绍的是对于INSERT SQL，执行器interpreter调用execute()方法构造block 流的过程，对于具体的写入数据过程，在TCPHandler::processInsertQuery() 方法中，今天来具体分析下：先看下带注释的代码： void TCPHandler::processInsertQuery(const Settings &global_settings) { /** Made above the rest of the l
复制链接

扫一扫

专栏目录