ClickHouse源码阅读(0000 1010) —— CK Server对SQL的处理（INSERT SQL）

最新推荐文章于 2023-04-05 22:29:39 发布

B_e_a_u_tiful1205

最新推荐文章于 2023-04-05 22:29:39 发布

阅读量625

点赞数

分类专栏： Dive into ClickHouse

本文链接：https://blog.csdn.net/B_e_a_u_tiful1205/article/details/105883476

版权

Dive into ClickHouse 专栏收录该内容

16 篇文章 26 订阅

订阅专栏

在上篇文章中提到：执行器interpreter调用execute()方法执行, 返回BlockIO (block streams)，这一步就是根据不同的SQL类型去构建block streams了。今天具体针对INSERT SQL的execute()方法来分析一下：

直接看一下InterpreterInsertQuery.cpp中的execute()方法：

BlockIO InterpreterInsertQuery::execute()
{
    const auto & query = query_ptr->as<ASTInsertQuery &>();
    checkAccess(query);
    StoragePtr table = getTable(query);

    //对表结构加锁
    auto table_lock = table->lockStructureForShare(true, context.getCurrentQueryId());

    /// We create a pipeline of several streams, into which we will write data.
    /// 创建一个由多个流组成的管道, 将数据写入其中.
    BlockOutputStreamPtr out;

    //构建PushingToViewsBlockOutputStream
    out = std::make_shared<PushingToViewsBlockOutputStream>(query.database, query.table, table, context, query_ptr, query.no_destination);

    /// Do not squash blocks if it is a sync INSERT into Distributed, since it lead to double bufferization on client and server side.
    /// Client-side bufferization might cause excessive timeouts (especially in case of big blocks).
    // sync INSERT的时候不要积压blocks, 这会导致client和server的双重缓冲. client的缓存可能会导致超时(尤其是在大blocks的情况下)
    // insert_distributed_sync (false/true, 默认false): If setting is enabled, insert query into distributed waits until data will be sent to all nodes in cluster
    // insert_distributed_timeout (默认0): Timeout for insert query into distributed. Setting is used only with insert_distributed_sync enabled. Zero value means no timeout.
    /// 如果insert_distributed_sync=true且table->isRemote()=true, 则是sync INSERT且表是远程表, 则不会走下面的逻辑
    if (!(context.getSettingsRef().insert_distributed_sync && table->isRemote()))
    {
        out = std::make_shared<SquashingBlockOutputStream>(
            out, table->getSampleBlock(), context.getSettingsRef().min_insert_block_size_rows, context.getSettingsRef().min_insert_block_size_bytes);
    }
    auto query_sample_block = getSampleBlock(query, table);

    /// Actually we don't know structure of input blocks from query/table, because some clients break insertion protocol (columns != header)
    /// 实际上, 我们不知道query/table中输入块的结构, 因为有些客户端破坏了插入协议（columns != header）
    out = std::make_shared<AddingDefaultBlockOutputStream>(
        out, query_sample_block, table->getSampleBlock(), table->getColumns().getDefaults(), context);

    auto out_wrapper = std::make_shared<CountingBlockOutputStream>(out);
    out_wrapper->setProcessListElement(context.getProcessListElement());
    out = std::move(out_wrapper);

    BlockIO res;
    res.out = std::move(out);
    //res.out 多个流组成的管道: CountingBlockOutputStream -> AddingDefaultBlockOutputStream -> (SquashingBlockOutputStream) -> PushingToViewsBlockOutputStream -> .....

    /// What type of query: INSERT or INSERT SELECT?
    if (query.select) //如果是INSERT SELECT, 则query.select不为空, 执行下面的逻辑
    {
        /// Passing 1 as subquery_depth will disable limiting size of intermediate result.
        // 将子查询深度限定为1, 将禁用限制中间结果的大小 (中间结果的大小将不受限制)
        InterpreterSelectWithUnionQuery interpreter_select{query.select, context, SelectQueryOptions(QueryProcessingStage::Complete, 1)};

        res.in = interpreter_select.execute().in;

        res.in = std::make_shared<ConvertingBlockInputStream>(context, res.in, res.out->getHeader(), ConvertingBlockInputStream::MatchColumnsMode::Position);
        res.in = std::make_shared<NullAndDoCopyBlockInputStream>(res.in, res.out);

        res.out = nullptr;

        if (!allow_materialized)
        {
            Block in_header = res.in->getHeader();
            for (const auto & column : table->getColumns())
                if (column.default_desc.kind == ColumnDefaultKind::Materialized && in_header.has(column.name))
                    throw Exception("Cannot insert column " + column.name + ", because it is MATERIALIZED column.", ErrorCodes::ILLEGAL_COLUMN);
        }
    }
    else if (query.data && !query.has_tail) /// can execute without additional data 可以在没有附加数据的情况下执行
    {
        res.in = std::make_shared<InputStreamFromASTInsertQuery>(query_ptr, nullptr, query_sample_block, context);
        res.in = std::make_shared<NullAndDoCopyBlockInputStream>(res.in, res.out);
        res.out = nullptr;
    }

    return res;
}

先看下getTable(query)方法（说这个主要是想提一下table_function表函数的概念）：

StoragePtr InterpreterInsertQuery::getTable(const ASTInsertQuery & query)
{
    //使用了表函数, 如remote()表函数 SELECT count() FROM remote('example01-01-1', merge, hits) 或 INSERT INTO remote('example01-01-1', merge, hits)
    if (query.table_function)
    {
        const auto * table_function = query.table_function->as<ASTFunction>();
        const auto & factory = TableFunctionFactory::instance();
        return factory.get(table_function->name, context)->execute(query.table_function, context);
    }

    /// Into what table to write.
    return context.getTable(query.database, query.table);
}

然后对表结构加锁，然后构建PushingToViewsBlockOutputStream，需要看下这个stream的构造方法：

PushingToViewsBlockOutputStream::PushingToViewsBlockOutputStream(
    const String & database, const String & table, const StoragePtr & storage_,
    const Context & context_, const ASTPtr & query_ptr_, bool no_destination)
    : storage(storage_), context(context_), query_ptr(query_ptr_)
{
    /** TODO
      * This is a very important line. At any insertion into the table one of streams should own lock.
      * Although now any insertion into the table is done via PushingToViewsBlockOutputStream, but it's clear that here is not the best place for this functionality.
      *  在数据插入过程中, 对表结构加锁, 其中一个stream应该占有该锁, 以保证表结构不变(貌似别的线程也不能改变表中的数据).
      *  尽管现在所有的insert操作都是通过PushingToViewsBlockOutputStream完成的, 但是很明显这不是最佳的方法
      */
    addTableLock(storage->lockStructureForShare(true, context.getCurrentQueryId()));

    /// If the "root" table deduplactes blocks, there are no need to make deduplication for children
    /// Moreover, deduplication for AggregatingMergeTree children could produce false positives due to low size of inserting blocks
    bool disable_deduplication_for_children = !no_destination && storage->supportsDeduplication();

    if (!table.empty())
    {
        Dependencies dependencies = context.getDependencies(database, table);

        /// We need special context for materialized views insertions
        //如果需要向物化试图中插入数据, 需要创建新的上下文context
        if (!dependencies.empty())
        {
            views_context = std::make_unique<Context>(context);
            // Do not deduplicate insertions into MV if the main insertion is Ok
            if (disable_deduplication_for_children)
                views_context->getSettingsRef().insert_deduplicate = false;
        }

        for (const auto & database_table : dependencies)
        {
            auto dependent_table = context.getTable(database_table.first, database_table.second);
            auto & materialized_view = dynamic_cast<const StorageMaterializedView &>(*dependent_table);

            if (StoragePtr inner_table = materialized_view.tryGetTargetTable())
                addTableLock(inner_table->lockStructureForShare(true, context.getCurrentQueryId()));

            auto query = materialized_view.getInnerQuery();
            BlockOutputStreamPtr out = std::make_shared<PushingToViewsBlockOutputStream>(
                database_table.first, database_table.second, dependent_table, *views_context, ASTPtr());
            views.emplace_back(ViewInfo{std::move(query), database_table.first, database_table.second, std::move(out)});
        }
    }

    //代码执行到这里, 已经把数据写到了view中, 接下来判断是否需要把数据写到表中
    //no_destination = false, 表示需要insert数据到表中, 执行这里的逻辑
    //no_destination = true, 表示不需要insert数据到表中, 不执行这里的逻辑
    /* Do not push to destination table if the flag is set */
    if (!no_destination)
    {
        output = storage->write(query_ptr, context);
        replicated_output = dynamic_cast<ReplicatedMergeTreeBlockOutputStream *>(output.get());
    }
}

这里需要看下no_destination的意思：

        /// Set to true if the data should only be inserted into attached views
        // no_destination = true, 则数据不需要insert到表中 且 只需要insert到视图中
        // no_destination = false, 则数据需要insert到表中 且 需要insert到视图中
        //所以是先向view中写数据, 再向表中写数据么？？？
        bool no_destination = false;

所以是先向view中写入数据，再向具体的表中插入数据。（目前这么认为的，待验证！）

B_e_a_u_tiful1205

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ClickHouse源码阅读(0000 1010) —— CK Server对SQL的处理（INSERT SQL）

在上篇文章中提到：执行器interpreter调用execute()方法执行, 返回BlockIO (block streams)，这一步就是根据不同的SQL类型去构建block streams了。今天具体针对INSERT SQL的execute()方法来分析一下：直接看一下InterpreterInsertQuery.cpp中的execute()方法：BlockIO Interpre...
复制链接

扫一扫

专栏目录