目录
测试的环境
测试的插入语句如下所示:
-- 建表语句
CREATE TABLE NIGHT_INSERT(C1 INT, C2 INT) ENGINE=Memory;
-- 插入语句(解析这句执行流程)
INSERT INTO NIGHT_INSERT VALUES(10, 20);
ClickHouse数据库的版本如下所示:
ClickHouse-20.12.3.3-stable
涉及到的核心类和功能
SQL解析器层涉及的核心类
- ParserInsertQuery类功能: 解析Insert语句的语法。
- ASTInsertQuery类功能: Insert语法产生的抽象语法树。
SQL解释器层涉及的核心类
- InterpreterInsertQuery类功能: 解释执行insert语法。
存储层的核心类
- StorageMemory类功能: 处理Memory表引擎的功能。在RAM中实现存储,适用于临时数据。
DataStream涉及的核心类
- PushingToViewsBlockOutputStream类功能: 将数据写入指定的表和所有相关的实例化视图。
- SquashingBlockOutputStream类功能: 将连续的流块合并到指定的最小大小。
- CountingBlockOutputStream类功能: 代理类,它计算写入的块,行,字节的数量。
- AddingDefaultBlockOutputStream类功能:
此流将三种类型的列添加到块中
1. 请求中遗漏了但在表中没有默认值的列(缺少的列)
2. 在请求中遗漏但在表中具有默认值的列(具有默认值的列)
3. 从其他列实例化的列。(实例化的列)
这三种类型的列都是实体化的(不是常量)。
语句的执行链路
SQL解析器层逻辑
调用堆栈关系
#0 DB::ParserInsertQuery::parseImpl (this=<optimized out>, pos=..., node=..., expected=...) at ../src/Parsers/ParserInsertQuery.cpp:177
#1 0x000000000e924d92 in DB::IParserBase::parse(DB::IParser::Pos&, std::__1::shared_ptr<DB::IAST>&, DB::Expected&)::$_0::operator()() const (this=<optimized out>)
at ../src/Parsers/IParserBase.cpp:13
#2 DB::IParserBase::wrapParseImpl<DB::IParserBase::parse(DB::IParser::Pos&, std::__1::shared_ptr<DB::IAST>&, DB::Expected&)::$_0>(DB::IParser::Pos&, DB::IParserBase::IncreaseDepthTag, DB::IParserBase::parse(DB::IParser::Pos&, std::__1::shared_ptr<DB::IAST>&, DB::Expected&)::$_0 const&) (pos=..., func=...) at ../src/Parsers/IParserBase.h:31
#3 DB::IParserBase::parse (this=0x7f5a1f65cc48, pos=..., node=..., expected=...) at ../src/Parsers/IParserBase.cpp:11
#4 0x000000000e9515b0 in DB::ParserQuery::parseImpl (this=<optimized out>, pos=..., node=..., expected=...) at ../src/Parsers/ParserQuery.cpp:45
#5 0x000000000e924d92 in DB::IParserBase::parse(DB::IParser::Pos&, std::__1::shared_ptr<DB::IAST>&, DB::Expected&)::$_0::operator()() const (this=<optimized out>)
at ../src/Parsers/IParserBase.cpp:13
#6 DB::IParserBase::wrapParseImpl<DB::IParserBase::parse(DB::IParser::Pos&, std::__1::shared_ptr<DB::IAST>&, DB::Expected&)::$_0>(DB::IParser::Pos&, DB::IParserBase::IncreaseDepthTag, DB::IParserBase::parse(DB::IParser::Pos&, std::__1::shared_ptr<DB::IAST>&, DB::Expected&)::$_0 const&) (pos=..., func=...) at ../src/Parsers/IParserBase.h:31
#7 DB::IParserBase::parse (this=0x7f5a1f65d548, pos=..., node=..., expected=...) at ../src/Parsers/IParserBase.cpp:11
#8 0x000000000e98049c in DB::tryParseQuery (parser=..., pos=@0x7f5a1f65dca0: 0x7f5a8f5dc900 "insert into night_insert values", end=0x7f5a8f5dc91f "", out_error_message=...,
hilite=false, query_description=..., allow_multi_statements=<optimized out>, max_query_size=262144, max_parser_depth=1000) at ../src/Parsers/parseQuery.cpp:249
#9 0x000000000e98863d in DB::parseQueryAndMovePosition (parser=..., pos=<error reading variable>, end=0x0, query_description=..., allow_multi_statements=<optimized out>,
max_query_size=262144, max_parser_depth=1000) at ../src/Parsers/parseQuery.cpp:320
#10 0x000000000deeb265 in DB::parseQuery (parser=..., begin=0x7f5a8f5dc900 "insert into night_insert values", end=0x7f5a8f5dc91f "", query_description=..., max_query_size=262144,
max_parser_depth=140025050549320) at ../src/Parsers/parseQuery.cpp:338
#11 DB::executeQueryImpl (begin=0x7f5a8f5dc900 "insert into night_insert values", end=0x7f5a8f5dc91f "", context=..., internal=<optimized out>, stage=<optimized out>,
has_query_tail=<optimized out>, istr=<optimized out>) at ../src/Interpreters/executeQuery.cpp:339
#12 0x000000000deeaedd in DB::executeQuery (query=..., context=..., internal=false, stage=320, may_have_embedded_data=<optimized out>) at ../src/Interpreters/executeQuery.cpp:813
#13 0x000000000e5c0666 in DB::TCPHandler::runImpl (this=0x7f5a8e32c000) at ../src/Server/TCPHandler.cpp:254
#14 0x000000000e5ccb67 in DB::TCPHandler::run (this=0x7f5a8e32c000) at ../src/Server/TCPHandler.cpp:1334
#15 0x0000000010d76a3f in Poco::Net::TCPServerConnection::start (this=0x7f5a8e41f320) at ../contrib/poco/Net/src/TCPServerConnection.cpp:43
#16 0x0000000010d78451 in Poco::Net::TCPServerDispatcher::run (this=0x7f5a2cd3c900) at ../contrib/poco/Net/src/TCPServerDispatcher.cpp:112
#17 0x0000000010ea5a29 in Poco::PooledThread::run (this=0x7f5a91900800) at ../contrib/poco/Foundation/src/ThreadPool.cpp:199
#18 0x0000000010ea19ba in Poco::ThreadImpl::runnableEntry (pThread=<optimized out>) at ../contrib/poco/Foundation/src/Thread_POSIX.cpp:345
#19 0x00007f5a920cc609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#20 0x00007f5a91fe2293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
代码关系
bool ParserInsertQuery::parseImpl(...)
{
ParserKeyword s_insert_into("INSERT INTO");
ParserKeyword s_table("TABLE");
ParserKeyword s_function("FUNCTION");
ParserToken s_dot(TokenType::Dot);
ParserKeyword s_values("VALUES");
ParserKeyword s_format("FORMAT");
ParserKeyword s_settings("SETTINGS");
ParserKeyword s_select("SELECT");
ParserKeyword s_watch("WATCH");
ParserKeyword s_with("WITH");
......
auto query = std::make_shared<ASTInsertQuery>();
query->columns = columns;
query->select = select;
query->watch = watch;
query->settings_ast = settings_ast;
query->data = data != end ? data : nullptr;
query->end = end;
......
return true;
}
逻辑归纳总结
- ClickHouse实例接收到客户端的Insert语句的执行请求。
- ClickHouse实例通过ParserInsertQuery类解析Insert语句。
- ParserInsertQuery解析语句后产生ASTInsertQuery抽象语法树为后续使用。
需要特殊注意下的是,这里只有Insert语句不包含Insert语句Values后面的具体数据,具体数据客户端会后续送过来。
SQL解释器层逻辑
调用堆栈关系(Insert解释器产生)
#0 DB::InterpreterInsertQuery::InterpreterInsertQuery (this=0x7f5a8f6dc990, query_ptr_=..., context_=..., allow_materialized_=false, no_squash_=false, no_destination_=false)
at ../src/Interpreters/InterpreterInsertQuery.cpp:62
#1 std::__1::make_unique<DB::InterpreterInsertQuery, std::__1::shared_ptr<DB::IAST>&, DB::Context&, bool&> (__args=<optimized out>, __args=<optimized out>, __args=<optimized out>)
at ../contrib/libcxx/include/memory:3028
#2 DB::InterpreterFactory::get (query=..., context=..., stage=<optimized out>) at ../src/Interpreters/InterpreterFactory.cpp:113
#3 0x000000000deec142 in DB::executeQueryImpl (begin=<optimized out>, end=<optimized out>, context=..., internal=<optimized out>, stage=<optimized out>,
has_query_tail=<optimized out>, istr=<optimized out>) at ../src/Interpreters/executeQuery.cpp:462
#4 0x000000000deeaedd in DB::executeQuery (query=..., context=..., internal=false, stage=DB::QueryProcessingStage::WithMergeableState, may_have_embedded_data=<optimized out>)
at ../src/Interpreters/executeQuery.cpp:813
#5 0x000000000e5c0666 in DB::TCPHandler::runImpl (this=0x7f5a8e32c000) at ../src/Server/TCPHandler.cpp:254
#6 0x000000000e5ccb67 in DB::TCPHandler::run (this=0x7f5a8e32c000) at ../src/Server/TCPHandler.cpp:1334
#7 0x0000000010d76a3f in Poco::Net::TCPServerConnection::start (this=0x58) at ../contrib/poco/Net/src/TCPServerConnection.cpp:43
#8 0x0000000010d78451 in Poco::Net::TCPServerDispatcher::run (this=0x7f5a2cd3c900) at ../contrib/poco/Net/src/TCPServerDispatcher.cpp:112
#9 0x0000000010ea5a29 in Poco::PooledThread::run (this=0x7f5a91900800) at ../contrib/poco/Foundation/src/ThreadPool.cpp:199
#10 0x0000000010ea19ba in Poco::ThreadImpl::runnableEntry (pThread=<optimized out>) at ../contrib/poco/Foundation/src/Thread_POSIX.cpp:345
#11 0x00007f5a920cc609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x00007f5a91fe2293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
代码关系(Insert解释器产生)
std::unique_ptr<IInterpreter> InterpreterFactory::get(...)
{
......
if (query->as<ASTInsertQuery>())
{
ProfileEvents::increment(ProfileEvents::InsertQuery);
bool allow_materialized = static_cast<bool>(context.getSettingsRef().insert_allow_materialized_columns);
return std::make_unique<InterpreterInsertQuery>(query, context, allow_materialized);
}
......
}
调用堆栈关系(Insert解释器执行)
#0 DB::InterpreterInsertQuery::execute (this=0x7f5a8f43b6e0) at ../src/Interpreters/InterpreterInsertQuery.cpp:175
#1 0x000000000deec307 in DB::executeQueryImpl (begin=<optimized out>, end=<optimized out>, context=..., internal=<optimized out>, stage=<optimized out>,
has_query_tail=<optimized out>, istr=<optimized out>) at ../src/Interpreters/executeQuery.cpp:482
#2 0x000000000deeaedd in DB::executeQuery (query=..., context=..., internal=false, stage=2386010728, may_have_embedded_data=<optimized out>)
at ../src/Interpreters/executeQuery.cpp:813
#3 0x000000000e5c0666 in DB::TCPHandler::runImpl (this=0x7f5a8e399000) at ../src/Server/TCPHandler.cpp:254
#4 0x000000000e5ccb67 in DB::TCPHandler::run (this=0x7f5a8e399000) at ../src/Server/TCPHandler.cpp:1334
#5 0x0000000010d76a3f in Poco::Net::TCPServerConnection::start (this=0x7f5a1ee5c270) at ../contrib/poco/Net/src/TCPServerConnection.cpp:43
#6 0x0000000010d78451 in Poco::Net::TCPServerDispatcher::run (this=0x7f5a2cd3c900) at ../contrib/poco/Net/src/TCPServerDispatcher.cpp:112
#7 0x0000000010ea5a29 in Poco::PooledThread::run (this=0x7f5a2ccb8000) at ../contrib/poco/Foundation/src/ThreadPool.cpp:199
#8 0x0000000010ea19ba in Poco::ThreadImpl::runnableEntry (pThread=<optimized out>) at ../contrib/poco/Foundation/src/Thread_POSIX.cpp:345
#9 0x00007f5a920cc609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#10 0x00007f5a91fe2293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
代码关系(Insert解释器执行)
BlockIO InterpreterInsertQuery::execute()
{
auto & query = query_ptr->as<ASTInsertQuery &>();
BlockIO res;
StoragePtr table = getTable(query);
auto table_lock = table->lockForShare(...);
context.checkAccess(...);
BlockOutputStreams out_streams;
size_t out_streams_size = 1;
for (size_t i = 0; i < out_streams_size; i++)
{
BlockOutputStreamPtr out;
out = std::make_shared<PushingToViewsBlockOutputStream>(...);
out = std::make_shared<AddingDefaultBlockOutputStream>(...);
out = std::make_shared<SquashingBlockOutputStream>(...);
auto out_wrapper = std::make_shared<CountingBlockOutputStream>(out);
out = std::move(out_wrapper);
out_streams.emplace_back(std::move(out));
}
res.out = std::move(out_streams.at(0));
return res;
}
逻辑归纳总结
- 根据SQL解析器生成的抽象语法树产生Insert语句的解释器。
- Insert解释器会给表上锁,并且检查权限相关的东西。
- Insert解释器会构造DataStream流的流程,OuputStream把Block一层一层的传递到存储层去。
存储层逻辑
调用堆栈关系
#0 DB::StorageMemory::write (this=0x7f730b107140, metadata_snapshot=...) at ../src/Storages/StorageMemory.cpp:196
#1 0x000000000dafb183 in DB::PushingToViewsBlockOutputStream::PushingToViewsBlockOutputStream (this=<optimized out>, storage_=..., metadata_snapshot_=..., context_=...,
query_ptr_=..., no_destination=false) at ../src/DataStreams/PushingToViewsBlockOutputStream.cpp:121
#2 0x000000000daebbab in std::__1::__compressed_pair_elem<DB::PushingToViewsBlockOutputStream, 1, false>::__compressed_pair_elem<std::__1::shared_ptr<DB::IStorage>&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const>&, DB::Context const&, std::__1::shared_ptr<DB::IAST>&, bool const&, 0ul, 1ul, 2ul, 3ul, 4ul> (this=0x7f73180ba4d8, __args=...)
at ../contrib/libcxx/include/memory:2214
#3 std::__1::__compressed_pair<std::__1::allocator<DB::PushingToViewsBlockOutputStream>, DB::PushingToViewsBlockOutputStream>::__compressed_pair<std::__1::allocator<DB::PushingToViewsBlockOutputStream>&, std::__1::shared_ptr<DB::IStorage>&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const>&, DB::Context const&, std::__1::shared_ptr<DB::IAST>&, bool const&>
(this=0x7f73180ba4d8, __first_args=..., __second_args=..., __pc=...) at ../contrib/libcxx/include/memory:2298
#4 std::__1::__shared_ptr_emplace<DB::PushingToViewsBlockOutputStream, std::__1::allocator<DB::PushingToViewsBlockOutputStream> >::__shared_ptr_emplace<std::__1::shared_ptr<DB::IStorage>&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const>&, DB::Context const&, std::__1::shared_ptr<DB::IAST>&, bool const&> (this=0x7f73180ba4c0,
__args=@0x7f7375b52a22: false, __args=@0x7f7375b52a22: false, __args=@0x7f7375b52a22: false, __args=@0x7f7375b52a22: false, __args=@0x7f7375b52a22: false, __a=...)
at ../contrib/libcxx/include/memory:3569
#5 std::__1::make_shared<DB::PushingToViewsBlockOutputStream, std::__1::shared_ptr<DB::IStorage>&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const>&, DB::Context const&, std::__1::shared_ptr<DB::IAST>&, bool const&> (__args=@0x7f7375b52a22: false, __args=@0x7f7375b52a22: false, __args=@0x7f7375b52a22: false, __args=@0x7f7375b52a22: false,
__args=@0x7f7375b52a22: false) at ../contrib/libcxx/include/memory:4400
#6 DB::InterpreterInsertQuery::execute (this=<optimized out>) at ../src/Interpreters/InterpreterInsertQuery.cpp:339
#7 0x000000000de4d0f8 in DB::executeQueryImpl (begin=<optimized out>, end=<optimized out>, context=..., internal=<optimized out>, stage=<optimized out>,
has_query_tail=<optimized out>, istr=<optimized out>) at ../src/Interpreters/executeQuery.cpp:422
#8 0x000000000de4bf9d in DB::executeQuery (query=..., context=..., internal=false, stage=16, may_have_embedded_data=<optimized out>) at ../src/Interpreters/executeQuery.cpp:718
#9 0x000000000e4f38b6 in DB::TCPHandler::runImpl (this=0x7f72fc5e2000) at ../src/Server/TCPHandler.cpp:254
#10 0x000000000e5006b7 in DB::TCPHandler::run (this=0x7f72fc5e2000) at ../src/Server/TCPHandler.cpp:1311
#11 0x0000000010cd95ef in Poco::Net::TCPServerConnection::start (this=0x7f7301ecc1a0) at ../contrib/poco/Net/src/TCPServerConnection.cpp:43
#12 0x0000000010cdb001 in Poco::Net::TCPServerDispatcher::run (this=0x7f7375a4d000) at ../contrib/poco/Net/src/TCPServerDispatcher.cpp:112
#13 0x0000000010e0c1a9 in Poco::PooledThread::run (this=0x7f7315e00000) at ../contrib/poco/Foundation/src/ThreadPool.cpp:199
#14 0x0000000010e080da in Poco::ThreadImpl::runnableEntry (pThread=<optimized out>) at ../contrib/poco/Foundation/src/Thread_POSIX.cpp:345
#15 0x00007f7376aa7609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#16 0x00007f73769bd293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
代码关系
BlockOutputStreamPtr StorageMemory::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, const Context & /*context*/)
{
return std::make_shared<MemoryBlockOutputStream>(*this, metadata_snapshot);
}
逻辑归纳总结
因为我们使用的Memory表引擎,所以Insert解释器会调用Memory表引擎实现的StorageMemory类。StorageMemory类会吐出一个MemoryBlockOutputStream DataStream流,让外部把想要插入的数据放如到DataStream流里面,最后DataStream流会把数据写入表中。
把客户端的数据放到DataStream流里面
客户端(clickhouse-client)构造Block数据的逻辑
void sendData(Block & sample, const ColumnsDescription & columns_description)
{
/// If INSERT data must be sent.
auto * parsed_insert_query = parsed_query->as<ASTInsertQuery>();
if (parsed_insert_query->data)
{
/// Send data contained in the query.
ReadBufferFromMemory data_in(parsed_insert_query->data, parsed_insert_query->end - parsed_insert_query->data);
sendDataFrom(data_in, sample, columns_description);
// Remember where the data ended. We use this info later to determine
// where the next query begins.
parsed_insert_query->end = data_in.buffer().begin() + data_in.count();
}
......
}
服务器接收客户端数据
void TCPHandler::processInsertQuery(...)
{
state.io.out->writePrefix();
......
/// Send block to the client - table structure.
sendData(state.io.out->getHeader());
// 读取客户端发送过来的insert数据
readData(connection_settings);
// 因为DataStream中有数据merge的DataStream,之前的数据可能仅仅Merge起来,没有发送,这一步就正式发送了
state.io.out->writeSuffix();
}
```
```
bool TCPHandler::receiveData(bool scalar)
{
// 构造网络数据读取的DataStream
initBlockInput();
// 从DataStream里面读取一个Block数据
Block block = state.block_in->read();
// 把Block数据放到之前InterpreterSelectQuery构造出来的DataStream里面进行处理
state.io.out->write(block);
}
数据落到具体的表引擎中
调用堆栈关系
#0 DB::MemoryBlockOutputStream::write (this=0x7f7316088ea8, block=...) at ../src/Storages/StorageMemory.cpp:98
#1 0x000000000dafd874 in DB::PushingToViewsBlockOutputStream::write (this=0x7f73180ba7d8, block=...) at ../src/DataStreams/PushingToViewsBlockOutputStream.cpp:156
#2 0x000000000db3e733 in DB::AddingDefaultBlockOutputStream::write (this=0x7f7373abf318, block=...) at ../src/DataStreams/AddingDefaultBlockOutputStream.cpp:10
#3 0x000000000db3d4bc in DB::SquashingBlockOutputStream::finalize (this=0x7f73180b7418) at ../src/DataStreams/SquashingBlockOutputStream.cpp:30
#4 0x000000000db3d539 in DB::SquashingBlockOutputStream::writeSuffix (this=0x7f73180b7418) at ../src/DataStreams/SquashingBlockOutputStream.cpp:50
#5 0x000000000e4f8420 in DB::TCPHandler::processInsertQuery (this=0x7f72fc5e2000, connection_settings=...) at ../src/Server/TCPHandler.cpp:511
#6 0x000000000e4f3aa1 in DB::TCPHandler::runImpl (this=0x7f72fc5e2000) at ../src/Server/TCPHandler.cpp:264
#7 0x000000000e5006b7 in DB::TCPHandler::run (this=0x7f72fc5e2000) at ../src/Server/TCPHandler.cpp:1311
#8 0x0000000010cd95ef in Poco::Net::TCPServerConnection::start (this=0x7f7316088ea8) at ../contrib/poco/Net/src/TCPServerConnection.cpp:43
#9 0x0000000010cdb001 in Poco::Net::TCPServerDispatcher::run (this=0x7f7375a4d000) at ../contrib/poco/Net/src/TCPServerDispatcher.cpp:112
#10 0x0000000010e0c1a9 in Poco::PooledThread::run (this=0x7f7315e00000) at ../contrib/poco/Foundation/src/ThreadPool.cpp:199
#11 0x0000000010e080da in Poco::ThreadImpl::runnableEntry (pThread=<optimized out>) at ../contrib/poco/Foundation/src/Thread_POSIX.cpp:345
#12 0x00007f7376aa7609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#13 0x00007f73769bd293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
代码关系
class MemoryBlockOutputStream : public IBlockOutputStream
{
public:
......
void write(const Block & block) override
{
std::lock_guard lock(storage.mutex);
storage.data.push_back(block);
}
......
};
Memory表引擎的数据是通过list结构来保存的。
逻辑归纳总结
- 客户端会把values的数据构造成Block发送给服务器端。
- 服务器端接收到Block数据后,把Block传递给InterpreterSelectQuery构造出来的DataStream执行。
- 最后表数据落到了Memory表引擎中。
结论
ClickHouse数据库插入语句执行链路大致总结如下:
- ClickHouse实例接收到客户端的Insert语句的执行请求。
- ClickHouse实例通过ParserInsertQuery类解析Insert语句。
- ParserInsertQuery解析语句后产生ASTInsertQuery抽象语法树为后续使用。
- 根据SQL解析器生成的抽象语法树产生Insert语句的解释器。
- Insert解释器会给表上锁,并且检查权限相关的东西。
- Insert解释器会构造DataStream流的流程,OuputStream把Block一层一层的传递到存储层去。