一、前言
发现kudu的C++ API十分的冷门,网上除了官网基本上啥都没有,java API好歹还有点东西呢;自己摸索了一周时间,把最基本的工作(连接、建表、insert、scan)搞出来了,分享一下。
官方文档:Kudu C++ client API: Class List (apache.org)
二、预备工作
首先你得有个kudu集群,我这里是2master,3tserver(好像master必须为奇数,但我这个的确能跑起来),安装方法网上有很多。
然后需要把/usr/include/kudu添加到环境变量或者CMakeLists里面,还要链接libkudu_client.so,位置可能不一样,自己搜一下吧。
最后我的CMakeLists是这样的:
SET(CMAKE_C_COMPILER "/usr/local/bin/gcc")
SET(CMAKE_CXX_COMPILER "/usr/local/bin/g++")
project(kudutest)
include_directories(/usr/include/kudu/client)
include_directories(/usr/include/kudu/common)
include_directories(/usr/include/kudu/util)
add_executable(kudutest main.cpp)
target_link_libraries(kudutest /usr/lib64/libkudu_client.so)
把kudu头文件都扔进来,再加一个禁用C++11,因为C++11里的string会和kudu的冲突,报错诸如undefined reference to `kudu::client::KuduClientBuilder::add_master_server_addr(std::__cxx11::basic_string, std::allocator > const&)'之类的。
#define _GLIBCXX_USE_CXX11_ABI 0
#include <callbacks.h>
#include <client.h>
#include <resource_metrics.h>
#include <row_result.h>
#include <scan_batch.h>
#include <scan_predicate.h>
#include <schema.h>
#include <shared_ptr.h>
#include <stubs.h>
#include <value.h>
#include <write_op.h>
#include <partial_row.h>
#include <kudu_export.h>
#include <monotime.h>
#include <slice.h>
#include <status.h>
三、连接
要连上kudu,就要创建KuduClient,要创建KuduClient,就要有KuduClientBuilder。
使用KuduClientBuilder::add_master_server_addr(const std::string& addr)添加master的地址。
Build会返回kudu::Status,可以使用ToString()转换为string输出。成功是OK。
kudu::client::sp::shared_ptr<kudu::client::KuduClient> KuduClPointer;
kudu::client::KuduClientBuilder KuduClBuilder;
std::string result = KuduClBuilder.add_master_server_addr(kuduMaster1).
add_master_server_addr(kuduMaster2).Build(&KuduClPointer).ToString();
四、建表
这里我建的是int32 ID, string name,一个简单的表。
建表需要KuduTableCreator,要有表名、schema和分区规则;
schema需要用KuduSchemaBuilder,AddColumn()。
分区规则有hash分区、范围分区和复合分区,自己选一个。
建完后,还可以用TableExists()确认一下有没有建好。
kudu::client::KuduSchemaBuilder test;
KUDU_CHECK_OK(test.AddColumn("ID")->Type(kudu::client::KuduColumnSchema::INT32)->NotNull()->PrimaryKey());
KUDU_CHECK_OK(test.AddColumn("name")->Type(kudu::client::KuduColumnSchema::STRING)->NotNull());
kudu::client::KuduSchema schema;
result = test.Build(&schema).ToString();
std::vector<std::string> hashvec;
hashvec.push_back("ID");
kudu::client::KuduTableCreator* KuduTbCreator = KuduClPointer->NewTableCreator();
KuduTbCreator->table_name(TABLE_NAME);
KuduTbCreator->schema(&schema);
KuduTbCreator->add_hash_partitions(hashvec, 2);
result = KuduTbCreator->Create().ToString();
bool exists = 0;
result = KuduClPointer->TableExists(TABLE_NAME, &exists).ToString();
五、插入
要对表做操作首先应该打开表。函数为kudu::client::KuduClient::Opentable (const std::string& table_name, sp::shared_ptr<KuduTable>* table)。
kudu::client::sp::shared_ptr<kudu::client::KuduTable> KuduTbPointer;
result = KuduClPointer->OpenTable(TABLE_NAME, &KuduTbPointer).ToString();
插入操作,需要建一个KuduSession和KuduInsert,使用kudu::client::KuduSession::Apply (KuduWriteOperation *write_op)。其他操作也都是类似的。
kudu::client::sp::shared_ptr<kudu::client::KuduSession> KuduSePointer = KuduClPointer->NewSession();
kudu::client::KuduInsert* KuduInPointer = KuduTbPointer->NewInsert();
KUDU_CHECK_OK(KuduInPointer->mutable_row()->SetInt32("ID",1));
KUDU_CHECK_OK(KuduInPointer->mutable_row()->SetStringCopy("name","A"));
result = KuduSePointer->Apply(KuduInPointer).ToString();
六、查询
查询首先需要KuduScanToken。我们需要它的Builder,以及将需要查询的表达式放入KuduPredicate中。像这里的示例是查询所有ID>=0的。
kudu::client::KuduScanTokenBuilder KuduSTBuilder(KuduTbPointer.get());
kudu::client::KuduPredicate* KuduPrPointer = KuduTbPointer->NewComparisonPredicate("ID",
kudu::client::KuduPredicate::GREATER_EQUAL, kudu::client::KuduValue::FromInt(0));
result = KuduSTBuilder.AddConjunctPredicate(KuduPrPointer).ToString();
std::vector<kudu::client::KuduScanToken*> KuduSTVector;
result = KuduSTBuilder.Build(&KuduSTVector).ToString();
然后是一大坨循环,我们一点一点看:
kudu::client::KuduScanner* KuduScPointer;
//第一个循环是对ScanToken*的vector的,将其转换为Scanner
for (std::vector<kudu::client::KuduScanToken*>::iterator KuduSTVit = KuduSTVector.begin();
KuduSTVit != KuduSTVector.end(); KuduSTVit++)
{
result = (*KuduSTVit)->IntoKuduScanner(&KuduScPointer).ToString();
//Scanner::Open()才是具体查询的函数
result = KuduScPointer->Open().ToString();
//第二个循环是对Scanner查询结果的,将其转换为ScanBatch
while(KuduScPointer->HasMoreRows())
{
kudu::client::KuduScanBatch KuduScBatch;
KuduScPointer->NextBatch(&KuduScBatch);
//ScanBatch内有多个row,第三个循环将它们解析并输出
for (kudu::client::KuduScanBatch::const_iterator KuduSBit = KuduScBatch.begin();
KuduSBit != KuduScBatch.end(); KuduSBit++)
{
kudu::client::KuduScanBatch::RowPtr row(*KuduSBit);
kudu::Slice s1("ID"); kudu::Slice s2("name");
int32_t i; kudu::Slice s;
result = row.GetInt32(0, &i).ToString();
std::cout << "ID: " << i << " " << result << std::endl;
result = row.GetString(1, &s).ToString();
std::cout << "name: " << s.ToString() << " " << result << std::endl;
//RowPtr::Get()的第一个参数有两种模式,一种是列位置(int),一种是列名(Kudu::Slice)。
}
}
}
KuduScPointer->Close();
至于ScanToken、Scanner、ScanBatch这些具体的意义我还没有研究明白,这里只是对这个API的简单使用。