dbkernel 的博客

专注于分享数据库技术原理、内核代码分析，主要关注于 MySQL/PostgreSQL/Greenplum/TiDB 等关系型/分布式数据库。

原创技术分享 | 如何为你的代码选择一个合适的开源协议？

作者：卢文双资深数据库内核研发近期公司全面拥抱开源，在选择开源协议方面遇到了一些问题，查阅了很多资料，特此总结~~对于很多刚踏入开源软件这个行业的小伙伴来说，在编码过程中难免会用到其他人的成果，如果你足够细心，很容易注意到即使是一小段代码，优秀的作者都在文件开头附上一段关于版权的声明，比如。同时，一些博客也会标明”此文章采用协议“。如果我们拷贝了别人的代码或文章却没注意版权问题，在国外法律意识特别强的环境下（国内版权意识也在逐步加强），那么我们的作品会因触犯别人的权益而违法。

2022-07-08 11:14:22 1034

原创特性介绍 | MySQL生态现有计算下推方案汇总

计算下推是数据库优化器优化查询性能的一种常见手段，早期的数据库系统提及的计算下推一般是指谓词下推，其理论源自关系代数理论。2000 年以后，随着 Oracle RAC 的盛行以及一众开源分布式数据库的崛起，存算分离的概念逐步流行，计算下推的涵盖范围由此从基本的谓词+投影下推延伸到了数据库所支持的一切可能计算的下推（JOIN、聚合、完整 query、部分 query 等）。

2024-03-18 11:15:09 1759 1

原创问题分析 | 为什么主库Waiting for semi-sync ACK from slave会阻塞set global super_read_only=ON的执行

在 MDL 中 MDL_KEY 按照 namespace+DB+OBJECT_NAME 的方式进行表示，所谓的 namespace 也比较重要，调用 Global_read_lock::lock_global_read_lock 函数的其他位置与事务提交没太大关系，应与此无关。申请的 S 模式的 MDL_key::GLOBAL 锁，查阅手册（二、如果主库执行的是事务型的语句，比如。进程，之后在主库执行的新事务会处于。锁，S 模式，这两种模式是冲突的，状态的事务还未提交，也就未释放。

2024-03-18 10:38:28 1980

原创捉虫日记 | MySQL 8.0从库某些情况下记录重放的CREATE TABLE、DROP TABLE语句到慢日志(slow log)

当主从复制采用 binlog 的行模式时，如果从库启用 slow_query_log、log_slow_replica_statements 且从库重放 CREATE TABLE、DROP TABLE 时因特殊情况（比如被从库其他 SQL 占用 MDL 锁）执行耗时较长，会被从库记录到慢日志（slow log），而 ALTER TABLE 却不会被记录到慢日志。按照官方的描述，在 binlog_format 是行模式的情况下，即使启用。相关的 Bug Fix，说明该问题官方尚未修复。

2024-03-18 09:44:45 1450

原创万字长文 | 业内 MySQL 线程池主流方案详解 - MariaDB/Percona/AliSQL/TXSQL/MySQL企业版

MySQL 企业版MariaDBPercona腾讯 TXSQL阿里云 AliSQL功能实现方式插件非插件非插件非插件推测是非插件版本5.5 版本引入5.5 版本引入，10.2 版本完善5.7/8.0是否开源否是是否否动态开关线程池插件式，不支持不支持不支持支持支持优先级处理策略设定高低优先级，且低优先级事件等待一段时间可升为高优先级队列设定高低优先级，且低优先级事件等待一段时间可升为高优先级队列设定高低优先级，且限制每个连接在高优先级队列中的票数。

2023-10-29 15:45:41 1643

原创特性介绍 | MySQL测试框架 MTR 系列教程（四）：语法篇

作者：卢文双资深数据库内核研发以前对 MySQL 测试框架 MTR 的使用，主要集中于 SQL 正确性验证。近期由于工作需要，深入了解了 MTR 的方方面面，发现 MTR 的能力不仅限于此，还支持单元测试、压力测试、代码覆盖率测试、内存错误检测、线程竞争与死锁等功能，因此，本着分享的精神，将其总结成一个系列。本文是第四篇语法篇。

2023-07-06 23:23:37 1187

原创源码分析 | MySQL测试框架 MTR 系列教程（三）：源码篇

作者：卢文双资深数据库内核研发以前对 MySQL 测试框架 MTR 的使用，主要集中于 SQL 正确性验证。近期由于工作需要，深入了解了 MTR 的方方面面，发现 MTR 的能力不仅限于此，还支持单元测试、压力测试、代码覆盖率测试、内存错误检测、线程竞争与死锁等功能，因此，本着分享的精神，将其总结成一个系列。本文是第三篇源码篇。

2023-07-06 23:17:37 1214

原创特性介绍 | MySQL 测试框架 MTR 系列教程（二）：进阶篇 - 内存/线程/代码覆盖率/单元/压力测试

以前对 MySQL 测试框架 MTR 的使用，主要集中于 SQL 正确性验证。近期由于工作需要，深入了解了 MTR 的方方面面，发现 MTR 的能力不仅限于此，还支持单元测试、压力测试、代码覆盖率测试、内存错误检测、线程竞争与死锁等功能，因此，本着分享的精神，将其总结成一个系列。主要内容如下：- 入门篇：工作机制、编译安装、参数、指令示例、推荐用法、添加 case、常见问题、异常调试- 进阶篇：高阶用法，包括单元测试、压力测试、代码覆盖率测试、内存错误检测、线程竞争与死锁- 源码篇- 语法篇

2023-05-07 21:28:13 1034 1

原创特性介绍 | MySQL 测试框架 MTR 系列教程（一）：入门篇

作者：卢文双资深数据库内核研发以前对 MySQL 测试框架 MTR 的使用，主要集中于 SQL 正确性验证。近期由于工作需要，深入了解了 MTR 的方方面面，发现 MTR 的能力不仅限于此，还支持单元测试、压力测试、代码覆盖率测试、内存错误检测、线程竞争与死锁等功能，因此，本着分享的精神，将其总结成一个系列。主要内容如下：入门篇：机制、编译安装、参数、指令、推荐用法、添加 case、异常调试进阶篇：高阶用法，包括单元测试、压力测试、代码覆盖率测试、内存错误检测、线程竞争与死锁源码篇语法篇

2023-04-17 13:42:59 1079 1

转载源码分析 | ClickHouse和他的朋友们（15）Group By 为什么这么快

本文首发于 2021-01-26 21:31:12在揭秘 ClickHouse Group By 之前，先聊聊数据库的性能对比测试问题。在虎哥看来，一个“讲武德”的性能对比测试应该提供什么信息呢？首先要尊重客观事实，在什么场景下，x 比 y 快？其次是为什么 x 会比 y 快？如果以上两条都做到了，还有一点也比较重要： x 的优势可以支撑多久？是架构等带来的长期优势，还是一袋烟的优化所得，是否能持续跟上自己的灵魂。如果只是贴几个妖艳的数字，算不上是 benchmark，而是 benchmarket。好了

2022-07-11 17:33:29 890

转载源码分析 | ClickHouse和他的朋友们（14）存储计算分离方案与实现

本文首发于 2020-09-21 22:01:12如果多个 ClickHouse server 可以挂载同一份数据(分布式存储等)，并且每个 server 都可写，这样会有什么好处呢？首先，我们可以把副本机制交给分布式存储来保障，上层架构变得简单朴素；其次，clickhouse-server 可以在任意机器上增加、减少，使存储和计算能力得到充分发挥。本文就来探讨一下 ClickHouse 的存储计算分离方案，实现上并不复杂。ClickHouse 运行时数据由两部分组成：内存元数据和磁盘数据。我们先看写流程：

2022-07-11 17:27:46 648

转载源码分析 | ClickHouse和他的朋友们（13）ReplicatedMergeTree表引擎及同步机制

本文首发于 2020-09-15 20:15:14在 MySQL 里，为了保证高可用以及数据安全性会采取主从模式，数据通过 binlog 来进行同步。在 ClickHouse 里，我们可以使用 ReplicatedMergeTree 引擎，数据同步通过 zookeeper 完成。本文先从搭建一个多 replica 集群开始，然后一窥底层的机制，简单吃两口。搭建一个 2 replica 测试集群，由于条件有限，这里在同一台物理机上起 clickhouse-server(2个 replica) + zookee

2022-07-11 17:23:37 601

转载源码分析 | ClickHouse和他的朋友们（12）神奇的物化视图(Materialized View)与原理

本文首发于 2020-09-03 21:22:14在 ClickHouse 里，物化视图(Materialized View)可以说是一个神奇且强大的东西，用途别具一格。本文从底层机制进行分析，看看 ClickHouse 的 Materalized View 是怎么工作的，以方便更好的使用它。对大部分人来说，物化视图这个概念会比较抽象，物化？视图？。。。为了更好的理解它，我们先看一个场景。假设你是一个“幸福”的小程序员，某天产品经理有个需求：实时统计每小时视频下载量。用户下载明细表：计算每小时下载量：

2022-07-11 17:20:36 380

转载源码分析 | ClickHouse和他的朋友们（11）MySQL实时复制之GTID模式

本文首发于 2020-08-28 20:40:14MySQL实时复制原理篇几天前 ClickHouse 官方发布了 v20.8.1.4447-testing，这个版本已经包含了 MaterializeMySQL 引擎，实现了 ClickHouse 实时复制 MySQL 数据的能力，感兴趣的朋友可以通过官方安装包来做体验，安装方式参考 https://clickhouse.tech/#quick-start，需要注意的是要选择 testing 分支。MaterializeMySQL 在 v20.8.1.444

2022-07-11 17:18:27 255

转载源码分析 | ClickHouse和他的朋友们（10）MergeTree Write-Ahead Log

本文首发于 2020-08-20 19:55:14数据库系统为了提高写入性能，会把数据先写到内存，等“攒”到一定程度后再回写到磁盘，比如 MySQL 的 buffer pool 机制。因为数据先写到内存，为了数据的安全性，我们需要一个 Write-Ahead Log (WAL) 来保证内存数据的安全性。今天我们来看看 ClickHouse 新增的 MergeTreeWriteAheadLog 模块，它到底解决了什么问题。对于 ClickHouse MergeTree 引擎，每次写入(即使１条数据)都会在磁盘

2022-07-11 17:15:28 328

转载源码分析 | ClickHouse和他的朋友们（８）纯手工打造的SQL解析器

本文首发于 2020-07-26 21:55:10现实生活中的物品一旦被标记为“纯手工打造”，给人的第一感觉就是“上乘之品”，一个字“贵”，比如北京老布鞋。但是在计算机世界里，如果有人告诉你 ClickHouse 的 SQL 解析器是纯手工打造的，是不是很惊讶！这个问题引起了不少网友的关注，所以本篇聊聊 ClickHouse 的纯手工解析器，看看它们的底层工作机制及优缺点。枯燥先从一个 SQL 开始：token首先对 SQL 里的字符逐个做判断，然后根据其关联性做 token 分割：比如连续的 Word

2022-07-11 17:10:00 477

转载源码分析 | ClickHouse和他的朋友们（6）MergeTree存储结构

本文首发于 2020-06-30 21:41:12上篇的存储引擎技术进化与MergeTree 介绍了存储算法的演进。存储引擎是一个数据库的底盘，一定要稳和动力澎湃。接下来我们将一起来探索下 ClickHouse MergeTree 列式存储引擎，解构下这台“跑车”最重要的部件。所有的存储引擎，无论精良与粗制滥造，最终都是要把数据回写到磁盘，来满足存储和索引目的。磁盘文件的构造可以说是算法的物理体现，我们甚至可以通过这些存储结构反推出其算法实现。所以，要想深入了解一个存储引擎，最好的入手点是它的磁盘存储结构

2022-07-11 17:05:42 282

转载源码分析 | ClickHouse和他的朋友们（5）存储引擎技术进化与MergeTree

本文首发于 2020-06-22 21:55:1021 世纪的第二个 10 年，虎哥已经在存储引擎一线奋战近 10 年，由于强大的兴趣驱动，这么多年来几乎不放过 arXiv 上与存储相关的每一篇 paper。尤其是看到带有 draft 的 paper 时，有一种乞丐听到“叮当”响时的愉悦。看paper这玩意就像鉴宝，多数是“赝品”，需要你有“鉴真”的本领，否则今天是张三的算法超越xx，明儿又是王二的硬件提升了yy，让你永远跟不上节奏zz，湮灭在这些没有营养的技术垃圾中，浪费大好青春。言归正传，接下来的3篇，

2022-07-11 16:56:37 286

转载源码分析 | ClickHouse和他的朋友们（4）Pipeline处理器和调度器

本文首发于 2020-06-12 20:57:10最后更新: 2020-08-15本文谈下 ClickHouse 核心科技：处理器 Processor 和有向无环调度器 DAG Scheduler。这些概念并不是 ClickHouse 首创，感兴趣的同学可以关注下 materialize 的 timely-dataflow，虎哥用 golang 也写过一个原型。拼的是实现细节，正是这些模块的精良设计，才有了 ClickHous e整体的高性能。在传统数据库系统中，一个 Query 处理流程大体是:其中在 P

2022-07-10 21:26:02 531

转载源码分析 | ClickHouse和他的朋友们（3）MySQL Protocol和Write调用栈

本文首发于 2020-06-08 19:57:10上篇的MySQL Protocol和Read调用里介绍了 ClickHouse 一条查询语句的调用栈，本文继续介绍写的调用栈，开整。建表:写入数据：调用栈分析1. 获取存储引擎 OutputStream2. 从 SQL 组装 InputStream 如何组装成 inputstream 结构呢？然后通过 NullAndDoCopyBlockInputStream的 copyData 方法构造出 Block：3. 组装 OutputSt

2022-07-10 21:21:48 139

转载源码分析 | ClickHouse和他的朋友们（2）MySQL Protocol和Read调用栈

本文首发于 2020-06-07 17:17:10作为一个 OLAP 的 DBMS 来说，有2个端非常重要：这样内外互通，多条朋友多条路，以实现“数据”级的编排能力。今天谈的是入口端的 MySQL 协议，也是本系列 ClickHouse 的第一个好朋友，用户可通过 MySQL 客户端或相关 Driver 直接链接到 ClickHouse，进行数据读写等操作。本文通过 MySQL的 Query 请求，借用调用栈来了解下 ClickHouse 的数据读取全过程。入口文件在:MySQLHandler.cppMy

2022-07-10 21:17:41 299

转载源码分析 | ClickHouse和他的朋友们（1）编译、开发、测试

本文首发于 2020-06-05 19:37:10一次偶然的机会，和ClickHouse团队做了一次线下沟通，Alexey提到ClickHouse的设计哲学:用工程思维解决商业问题的典范啊！对用户来说，他们关心的不是什么天花乱坠、上天入地的高科技，只是需要一个能很好解决自己问题的方案，这在开源社区是非常难得的，靠实力“野蛮式”生长。于是，我对这个散发着伏特加味道的利器充满了好奇，并参与到ClickHouse的社区中一探究竟，第一感觉是开放、友好、战斗力强(AK47 vs CK16, ClickHouse 2

2022-07-10 21:06:42 332

原创特性介绍 | MySQL select count(*) 、count(1)、count(列) 详解（1）：概念及区别

从接触MySQL开始断断续续的看过一些文章，对操作众说纷纭，其中分歧点主要在于和哪个效率高，有说比快的（），有说二者一样快的。个人理解这两种行为可能适用于的是不同的版本，我只关心较新的MySQL版本是什么行为，详见下文。首先，先说明一下常见操作及含义：MySQL手册中相关描述如下：官方这段描述要点如下：到这里我们明白了和本质上面其实是一样的，那么又是怎么回事呢？基于以上描述，如果要查询innodb存储引擎的表的总行数，有如下建议：反之，如果必须要获取准确的总行数，建议：篇幅有限，深入验

2022-07-10 21:00:39 1169

原创特性介绍 | MySQL 自增列详解（1）：自增列概念及使用

自增列，即 AUTO_INCREMENT，可用于为新的记录生成唯一标识。要求：2.2. 插入数据2.3. 如何查看表的 AUTO_INCREMENT 涨到了多少？2.4. 插入数据时能否有空洞？可以的，但要注意。2.5. 能否插入重复记录既然自增列是唯一记录，那么肯定不能插入重复记录。2.6. 怎么修改 AUTO_INCREMENT 的值？注意：AUTO_INCREMENT 不能小于当前自增列记录的最大值。3. 问题3.1. 自增列是否有上限？由上文可见，自增列会一直增加，那是否

2022-07-10 20:55:58 5644

原创引擎特性 | MySQL MEMORY(HEAP) 存储引擎导致 Slave 节点有本地事务

作者：卢文双资深数据库内核研发可能有的朋友对MEMORY存储引擎不太了解，首先介绍一下（以下描述来自官方）：问：MEMORY表和临时表有什么区别？2. 故障分析现象：最近碰到有用户使用 MEMORY 存储引擎，引发主从 GTID 不一致、从节点 GTID 比主节点多一条的情况。分析：向用户反馈问题原因后，用户将 MEMORY 表改为了 InnoDB 表。这段描述的含义是：举例来说，集群有三个节点A、B、C，节点A为主节点。并且，节点A的 G

2022-07-10 20:50:13 539

原创实用工具 | Linux 定时任务 crontab 命令详解

Linux 下的任务调度分为两类：系统任务调度和用户任务调度。Linux 系统任务是由这个系统服务来控制的，这个系统服务是默认启动的。用户自己设置的计划任务则使用命令。在 Ubuntu/Debian 中，配置文件路径为（CentOS也类似），其内容为：环境变量用于指定系统要使用的shell，此处为。环境变量指定了系统执行命令的路径。也可以添加变量，如果指定，则表示 crond 的任务执行信息将通过电子邮件发送给指定的用户。其他部分在后文详细讲述。用户定期要执行的工作，比如用户数据

2022-07-10 20:46:19 1145

原创特性分析 | GreenPlum 的并行查询优化策略详解

作者：卢文双资深数据库内核研发GreenPlum 采用 Share Nothing 的架构，良好的发挥了廉价PC的作用。自此I/O不在是 DW(data warehouse) 的瓶颈，相反网络的压力会大很多。但是 GreenPlum 的查询优化策略能够避免尽量少的网络交换。对于初次接触 GreenPlum 的人来说，肯定耳目一新。GreenPlum 的 master 节点负责 SQL 解析和执行计划的生成，具体来说，查询优化器会将 SQL 解析成每个节点（segments）要执行的物理执行计划。

2022-07-10 20:42:47 1319

原创问题定位 | PostgreSQL 报错 requested WAL segment has already been removed

在使用配置了热备的 PostgreSQL 数据库时，在执行大量事务时，尤其是一个需要插入几千万条数据的 insert 事务时（典型的做法是持续），后台 csv log 中报错如下：问题分析根据报错信息分析，推测是主库大事务产生了大量 xlog，这是因为 PostgreSQL 在执行事务过程中，直到提交时才会发送到备库。由于该事务需要执行的时间过长，超过了 checkpoint 的默认间隔，所以导致有的 xlog 还未发送到备库却被 remove 掉了。要解决该问题，一般可用的方案有：将 GUC 参数

2022-07-10 16:50:10 2292

原创源码分析 | 使用 gcov 和 lcov 测试 PostgreSQL 代码覆盖率

通常我们评判一个 test case 好坏的标准之一是代码的覆盖率，一个好的 test case 应该覆盖到所有的代码。那么问题来了，我们怎么知道这个 test case 有没有覆盖到所有的代码呢？以 PostgreSQL 为例，我们看看如何检测 C 语言程序的代码覆盖率。C 代码覆盖率测试，需要用到 gcc 的配套工具，还有一个可视化工具。首先需要安装依赖 gcov 和 lcov 。gcov 在 gcc 包中已经包含了，lcov 是 ltp 的一个 gcov 扩展插件，用来产生HTML报告。2. 编译、

2022-07-09 08:23:52 666

原创源码分析 | PostgreSQL 回归测试详解

回归测试是 PostgreSQL 的测试方法之一。回归测试，需要事先定义好测试脚本（通常是 SQL 脚本，放在 sql 目录中），同时定义好调用执行测试脚本的预期正确输出文件（通常放在 expected 目录中)。测试使用或进行，它会通过程序调用 sql 目录中的 SQL，并收集输出结果（通常放到 results 目录中），最后 pg_regress 会对 expected 目录和 results 目录中的文件使用 diff 进行一一比较。如果比较发现文件内容不一致，会将不一致的结果输出到文

2022-07-09 08:18:48 764

原创最佳实践 | 源码编译安装配置 Postgres-XC 集群并用 pg_basebackup 配置 Datanode 热备

注意：本篇文章成文时 Postgres-XC 还未改名为 Postgres-X2 。2. 安装依赖对于 Ubuntu/Debian：对于 CentOS：3. 编译安装4. 初始化、启动4.1. 初始化 GTM4.2. 初始化数据库节点初始化所有数据库节点（CO、DN）：4.3. 编辑配置文件编辑 data/co1/postgresql.conf：编辑 data/co2/postgresql.conf：编辑 data/dn1/postgresql.conf：编辑 data/dn2

2022-07-09 08:13:20 438

原创特性分析 | GreenPlum Primary/Mirror 同步机制

PostgreSQL 主备同步机制是通过流复制实现，其原理见 PG主备流复制机制。Greenplum 是基于 PostgreSQL 开发的，它的主备也是通过流复制实现，但是 Segment 节点中的 Primary 和 Mirror 之间的数据同步是基于文件级别的同步实现的。Greenplum 的架构采用了 MPP 无共享体系。在 MPP 系统中，每个数据节点有自己的CPU、磁盘和内存(Share nothing)，每个节点内的 CPU 不能访问另一个节点的内存。节点之间的信息交互是通过节点互联网络实现的，

2022-07-09 08:05:10 637

原创最佳实践 | CentOS和Ubuntu下安装配置GreenPlum数据库集群 - 源码 & 安装包

本文介绍如何在 CentOS/RedHat、Ubuntu/Debian 下通过安装包方式和源码方式安装配置 GreenPlum 集群。安装在目录下。按如下方式在在 h93 和 h94 安装依赖。对于 Ubuntu/Debian：对于 CentOS：1.3. 安装包方式安装从官网下载。解压：以普通用户安装：1.4. 源码安装1.4.1. 克隆源码1.4.2. 编译安装安装时如果遇到某些 python 包（lockfile、 paramiko、PSI等）找不到，可以参考 HAW

2022-07-09 08:01:18 831

原创实用工具 | PostgreSQL 数据库压力测试工具 pgbench 使用示例

PG数据库提供了一款轻量级的压力测试工具叫，其实就是一个编译好后的扩展性的可执行文件。测试环境：数据库参数：进入源码安装包，编译、安装：安装完毕以后可以在 bin 文件夹下看到新生成的 pgbench 文件：参数介绍部分参数中文含义：初始化测试数据初始化数据：查看表数据：查看表结构：说明：2. 50个session3. 100个session超过100个会报错，因为数据库当前设置最大 session 是100。参考http://www.postgresql.org/docs/

2022-07-09 07:51:45 2682

原创特性分析 | PostgreSQL Primary/Standby 主备流复制机制

PostgreSQL 在 9.0 之后引入了主备流复制机制，通过流复制，备库不断的从主库同步相应的数据，并在备库 apply 每个，这里的流复制每次传输单位是 WAL 日志的 record 。而 PostgreSQL 9.0 之前提供的方法是主库写完一个 WAL 日志文件后，才把 WAL 日志文件传送到备库，这样的方式导致主备延迟特别大。同时，PostgreSQL 9.0 之后提供了，备库在应用的同时也能够提供只读服务，大大提升了用户体验。PostgreSQL 主备流复制的核心部分由，和三个

2022-07-09 07:48:07 594

原创特性介绍 | PostgreSQL 的依赖约束详解 - 系统表 pg_depend & pg_constraint

本文成文较早，依赖的是 PostgreSQL 9.3 版本，后续内核版本可能不兼容，但核心原理是相通的，可做参考。pg_depend 是 postgres 的一张系统表，用来记录数据库对象之间的依赖关系，除了常见的主外键，还有其他一些内部依赖关系，可以通过这个系统表呈现出来。字段自 9.1 版本之后多了一个 extension 的类型，目前类型有：查询依赖关系的 SQL如下 SQL 可以列出系统和用户对象的各种依赖关系：示例创建一张表：执行查询依赖关系的 SQL：添加主键约束：正常

2022-07-09 07:45:01 1817 1

原创程序人生 | C 语言编译器对内存空间的分配原则

一个由 C/C++ 编译的程序占用的内存分为以下几个部分：举例来说：在函数体中定义的变量通常是在栈上，用malloc, calloc, realloc等分配内存的函数分配得到的就是在堆上。在所有函数体外定义的是全局量，加了static修饰符后不管在哪里都存放在全局区（静态区），在所有函数体外定义的static变量表示在该文件中有效，不能extern到别的文件使用，在函数体内定义的static表示只在该函数体内有效。函数中的"123456"这样的字符串存放在常量区。还有就是函数调用时会在栈上有一系列的保留现

2022-07-09 07:42:35 1239

转载推荐 | 一体化实时 HTAP 数据库 StoneDB，如何替换 MySQL 并实现近百倍性能提升

本文为数据库圈内好友高日耀首发于“CSDN 微信公众号”的文章。最近几年基于 PostgreSQL 开发的国产数据库如雨后春笋般涌现，而受限于 MySQL 的 GPL Licence（感兴趣的可移步我的博文《技术分享 | 如何为你的代码选择一个合适的开源协议？》），二次开发必须开源，这导致基于 MySQL 开发的国产数据库相对较少（比如：万里开源的 GreatSQL），因此，当听说 StoneDB 开源的消息时，我个人还是很兴奋的。废话不多说，以下为正文。众所周知，MySQL 是世界上最流行的 OLTP 数

2022-07-09 00:01:16 649

原创程序人生 | UNIX 网络编程之 getaddrinfo 函数详解及使用举例

IPv4 中使用函数完成主机名到地址解析，这个函数仅仅支持 IPv4 ，且不允许调用者指定所需地址类型的任何信息，返回的结构只包含了用于存储 IPv4 地址的空间。IPv6中引入了的新API，它是协议无关的，既可用于 IPv4 也可用于IPv6 。函数能够处理名字到地址以及服务到端口这两种转换，返回的是一个的结构（列表）指针而不是一个地址清单。这些结构随后可由socket函数直接使用。函数把协议相关性安全隐藏在这个库函数内部。应用程序只要处理由getaddrinfo函数填写的套接口地址结构。该函数在 P

2022-07-08 12:13:27 1110

原创程序人生 | Linux Daemon 程序设计示例

daemon 程序，又称为守护进程，通常在系统后台长时间运行，由于没有控制终端而无法与前台交互。daemon程序一般作为系统服务使用，Linux系统中运行着很多这样的守护进程，如 iptables，nfs，ypbind，dhcpd 等。daemon 程序管理脚本daemon 程序可以使用 service 工具进行管理，包括启动、停止、查看状态等，但前题是需要编写一个如下的简单SHELL脚本，比如：daemon 程序指令由上述脚本可知，该 daemon 程序支持的指令有 start|stop|re

2022-07-08 12:08:23 273

DB - Challenges and Solutions for Fast Remote Memory Access.pdf

DB - Challenges and Solutions for Fast Remote Memory Access.pdf Non-volatile main memory DIMMs (NVMMs), such as Intel’s Optane DC Persistent Memory modules, provide data durability with orders of magnitude higher performance than prior durable technologies. This paper explores the unique challenges that arise when building high-performance networked systems for NVMM. Compared to DRAM, we find that NVMMs have distinctive fundamental properties that pose unique challenges for networked access

2022-07-09

DB - The Database State Machine Approach.pdf

Database replication protocols have historically been built on top of distributed database systems, and have consequently been designed and implemented using distributed transactional mechanisms, such as atomic commitment. We present the Database State Machine approach, a new way to deal with database replication in a cluster of servers. This approach relies on a powerful atomic broadcast primitive to propagate transactions between database servers, and alleviates the need for atomic commitment.

2022-07-08

DB - Scalable Replay-Based Replication For Fast Databases.pd

Primary-backup replication is commonly used for providing fault tolerance in databases. It is performed by replaying the database recovery log on a backup server. Such a scheme raises several challenges for modern, high-throughput multicore databases. It is hard to replay the recovery log concurrently, and so the backup can become the bottleneck. Moreover, with the high transaction rates on the primary, the log transfer can cause network bottlenecks. Both these bottlenecks can signiﬁcantly slow

2022-07-08

DB - Middleware-based Database Replication.pdf

The need for high availability and performance in data management systems has been fueling a long running interest in database replication from both academia and industry. However, academic groups often attack replication problems in isolation, overlooking the need for completeness in their solutions, while commercial teams take a holistic approach that often misses opportunities for fundamental innovation. This has created over time a gap between academic research and industria

2022-07-08

DB - Practical lock-freedom.pdf

Mutual-exclusion locks are currently the most popular mechanism for interprocess synchronisation, largely due to their apparent simplicity and ease of implementation. In the parallel-computing environments that are increasingly commonplace in high-performance applications, this simplicity is deceptive: mutual exclusion does not scale well with large numbers of locks and many concurrent threads of execution.

2022-07-08

DB - A String Adaptive Hash Table for Analytical Databases.pdf

Hash tables are the fundamental data structure for analytical database workloads, such as aggregation, joining, set ﬁltering and records deduplication. The performance aspects of hash tables differ drastically with respect to what kind of data are being processed or how many inserts, lookups and deletes are constructed. In this paper, we address some common use cases of hash tables: aggregating and joining over arbitrary string data. We designed a new hash table, SAHA, which is tightly integrate

2022-07-09

DB - Failure Trends in a Large Disk Drive Population.pdf

It is estimated that over 90% of all new information produced in the world is being stored on magnetic media, most of it on hard disk drives. Despite their importance, there is relatively little published work on the failure patterns of disk drives, and the key factors that affect their lifetime. Most available data are either based on extrapolation from accelerated aging experiments or from relatively modest sized ﬁeld studies. Moreover, larger population studies rarely have the infrastructure

2022-07-09

DB - Spanner - Google’s Globally-Distributed Database

Spanner is Google’s scalable, multi-version, globallydistributed, and synchronously-replicated database. It is the ﬁrst system to distribute data at global scale and support externally-consistent distributed transactions. This paper describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting external consistency and a variety of powerful ...

2022-07-11

DB - Access Path Selection in a Relational DMS

DB - Access Path Selection in a Relational Database Management System.pdf In a high level query and data manipulation language such as SQL, requests are stated non-procedurally, without reference to access paths. This paper describes how System R chooses access paths for both simple (single relation) and complex queries (such as joins), given a user specification of desired data as a boolean expression of predicates. System R is an experimental database management system developed to carry ..

2022-07-12

DB - Spanner, TrueTime and The CAP Theorem.pdf

Spanner is Google’s highly available global SQL database [CDE+12]. It manages replicated data at great scale, both in terms of size of data and volume of transactions. It assigns globally consistent real-time timestamps to every datum written to it, and clients can do globally consistent reads across the entire database without locking. This leads to three kinds of systems: CA, CP and AP, based on what letter you leave out. Note that you are not entitled to 2 of 3, and many systems have ...

2022-07-11

DB - Dynamic Programming Strikes Back - Hypergraph.pdf

Two highly eﬃcient algorithms are known for optimally ordering joins while avoiding cross products: DPccp, which is based on dynamic programming, and Top-Down Partition Search, based on memoization. Both have two severe limitations: They handle only (1) simple (binary) join predicates and (2) inner joins. However, real queries may contain complex join predicates, involving more than two relations, and outer joins as well as other non-inner joins.

2022-07-12

DB - The Volcano Optimizer Generator - Extensibility and ...

DB - The Volcano Optimizer Generator - Extensibility and Efficient Search.pdf Emerging database application domains demand not only new functionality but also high performance. To satisfy these two requirements, the Volcano project provides efficient, extensible tools for query and request processing, particularly for object-oriented and scientific database systems. One of these tools is a new optimizer generator. Data model, logical algebra, physical algebra, and optimization rules are ...

2022-07-12

DB - The Cascades Framework for Query Optimization.pdf

This doc describes a new extensible query optimization framework that resolves many of the shortcomings of the EXODUS and Volcano optimizer generators. In addition to extensibility, dynamic programming, and memorization based on and extended from the EXODUS and Volcano prototypes, this new optimizer provides (i) manipulation of operator arguments using rules or functions, (ii) operators that are both logical and physical for predicates etc., (iii) schema-speciﬁc rules for materialized views,

2022-07-12

DB - The End of a Myth Distributed Transactions Can Scale.pdf

The common wisdom is that distributed transactions do not scale. But what if distributed transactions could be made scalable using the next generation of networks and a redesign of distributed databases? There would be no need for developers anymore to worry about co-partitioning schemes to achieve decent performance. Application development would become easier as data placement would no longer determine how scalable an application is. Hardware provisioning would be simpliﬁed as the system ...

2022-07-11

DB - F1 Lightning- HTAP as a Service.pdf

The ongoing and increasing interest in HTAP (Hybrid Transactional and Analytical Processing) systems documents the intense interest from data owners in simultaneously running transactional and analytical workloads over the same data set. Much of the reported work on HTAP has arisen in the context of “greenﬁeld” systems, answering the question “if we could design a system for HTAP from scratch, what would it look like?” While there is great merit in such an approach, and a lot of valuable ...

2022-07-11

ARIES - A Transaction Recovery Method Supporting xxxx

ARIES - A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging.pdf Algorithm for Recouery and Isolation Exploiting Semantics), which supports partial rollbacks of transactions, finegranularity (e.g., record) locking and recovery using write-ahead logging (WAL). We introduce the paradigm of repeating history to redo all missing updates before performing the rollbacks of the loser transactions during restart after a system failure.

2022-07-11

DB - A Critique of ANSI SQL Isolation Levels

DB - A Critique of ANSI SQL Isolation Levels.pdf ANSI SQL-92 [MS, ANSI] defines Isolation Levels in terms of phenomena: Dirty Reads, Non-Re-peatable Reads, and Phantoms. This document shows that these phenomena and the ANSI SQL definitions fail to characterize several popular isolation levels, including the standard locking implementations of the levels. Investigating the ambiguities of the phenomena leads to clearer definitions.

2022-07-11

DB - Just Say NO to Paxos Overhead- Replacing Consensus xxx.pdf

DB - Just Say NO to Paxos Overhead- Replacing Consensus with Network Ordering.pdf Distributed applications use replication, implemented by protocols like Paxos, to ensure data availability and transparently mask server failures. This paper presents a new approach to achieving replication in the data center without the performance cost of traditional methods. Our work carefully divides replication responsibility between the network and protocol layers.

2022-07-10

DB - Paxos Replicated State Machines as the Basis of a High

DB - Paxos Replicated State Machines as the Basis of a High-Performance.pdf Conventional wisdom holds that Paxos is too expensive to use for high-volume, high-throughput, data-intensive applications. Consequently, fault-tolerant storage systems typically rely on special hardware, semantics weaker than sequential consistency, a limited update interface (such as append-only), primary-backup replication schemes that serialize all reads through the primary, clock synchronization for correctness

2022-07-10

DB - Towards Low Latency State Machine Replication for Wide-area

DB - Towards Low Latency State Machine Replication for Uncivil Wide-area Networks.pdf We consider the problem of building state machines in a multi-site environment in which there is lack of trust between sites, but not within a site. This system model recognizes the fact that if a server is attacked, then there are larger issues at play than simply masking the failure of the server. We describe the design principles of a low-latency Byzantine state machine protocol, called RAM

2022-07-10

DB - Unbounded Pipelining in Dynamically Paxos Clusters.pdf

DB - Unbounded Pipelining in Dynamically Reconfigurable Paxos Clusters.pdf Consensus is an essential ingredient of a faulttolerant distributed system systems. When equipped with a consensus algorithm a distributed system can act as a replicated state machine (RSM), duplicating its state across a cluster of redundant components to avoid the failure of any single component leading to a system-wide failure. Paxos and Raft are examples of algorithms for achieving distributed consensus.

2022-07-10

DB - Consistency Tradeoffs in Modern Distributed Database System

DB - Consistency Tradeoffs in Modern Distributed Database System Design The CAP theorem’s impact on modern distributed database system design is more limited than is often perceived. Another tradeoff—between consistency and latency —has had a more direct influence on several well-known DDBSs. A proposed new formulation, PACELC, unifies this tradeoff with CAP.

2022-07-09

DB - Designing Distributed Systems Using Approximate Synchrony

Title : Designing Distributed Systems Using Approximate Synchrony in Data Center Networks.pdf Distributed systems are traditionally designed independently from the underlying network, making worst-case assumptions (e.g., complete asynchrony) about its behavior. However, many of today’s distributed applications are deployed in data centers, where the network is more reliable, predictable, and extensible. In these environments, it is possible to co-design distributed systems with their network

2022-07-09

DB - State Machine Replication is More Expensive than Consensus

State Machine Replication is More Expensive than Consensus.pdf Consensus and State Machine Replication (SMR) are generally considered to be equivalent problems. In certain system models, indeed, the two problems are computationally equivalent: any solution to the former problem leads to a solution to the latter, and vice versa. In this paper, we study the relation between consensus and SMR from a complexity perspective.

2022-07-10

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人