原文:Pipelining versus Parallel Query Execution with MySQL 5.7 X Plugin
作者: Alexander Rubin
译者:仲培艺,关注数据库领域,纠错、寻求报道或者投稿请致邮:zhongpy@csdn.net。
本文将以MySQL 5.7 X Plugin为例,对比分析流水线(pipelining)和并行查询技术。
另一篇博文《MySQL 5.7 X Plugin支持异步查询》(Asynchronous Query Execution with MySQL 5.7 X Plugin),介绍了运行MySQL 5.7 X Plugin的方法:
- Hash分区
- 开放MySQL的CPU内核数连接
由于5.7 X Plugin只支持流水线技术(缩短往返延时),且不支持MySQL连接复用(MySQL在执行单项查询时,不启动CPU多核),所以需要手动操作(包括最终的结果排序)。
TL:DR; 版本
本文将分析MySQL 5.7 X Plugin、X协议和文档存储,并加以总结:
- 5.7 X Plugin不支持MySQL连接/会话复用。类似初始协议,每条X插件的连接都会打开一次MySQL会话;
- 一条5.7 X Plugin指令(在库支持的情况下)会立刻返回,且查询终止之前,仍可继续操作(异步调用)。MySQL以队列形式运行;
- 5.7 X Plugin没有附加的服务器级持久性配置。若不进行核查或等待服务器确认(异步),则不能确保数据写入MySQL(“fire and forget”模式)。
同时,X协议适用于以下情况:
- 在MySQL表锁定时,想要实现一个异步客户端(不希望限制网络通信,如下载或API调用);
- 想要使用MySQL队列,缩短往返延时。
Benchmark评价结果:流水线vs.并行vs.单项查询
多组测试对比流水线、并行处理和单项查询,结论如下:
1. 并行查询支持NodeJS:
$ time node async_wikistats.js
...
All done! Total: 17753
...
real 0m30.668s
user 0m0.256s
sys 0m0.028s
2. 流水线技术支持NojeJS:
$ time node async_wikistats_pipeline.js
...
All done! Total: 17753
...
real 5m39.666s
user 0m0.212s
sys 0m0.024s
测试中针对NojeJS环境下运行的流水线,复用相同连接(并且不对线程开放新连接)。
3. 直接查询——分区表:
mysql> select sum(tot_visits) from wikistats.wikistats_by_day_spark_part where url like ‘%postgresql%’;
+-----------------+
| sum(tot_visits) |
+-----------------+
| 17753 |
+-----------------+
1 row in set (5 min 31.44 sec)
4. 直接查询——非分区表
mysql> select sum(tot_visits) from wikistats.wikistats_by_day_spark where url like ‘%postgresql%’;
+-----------------+
| sum(tot_visits) |
+-----------------+
| 17753 |
+-----------------+
1 row in set (4 min 38.16 sec)
5.7 X Plugin流水线技术的优点:
尽管5.7 X Plugin的流水线技术没有增加太多查询响应时间(可以降低总延迟),其仍适用于某些场合,例如从网上下载东西的时候,需要缩减下载进程和文档元数据,这时就可使用youtube-dl进行搜索并下载YouTube的视频元数据。然后,将原数据JSON存入MySQL 5.7文本库。代码如下:
var mysqlx = require('mysqlx');
# This is the same as running $ youtube-dl -j -i ytsearch100:"mysql 5.7"
const spawn = require('child_process').spawn;
const yt = spawn('youtube-dl', ['-j', '-i', 'ytsearch100:"mysql 5.7"'], {maxBuffer: 1024 * 1024 * 128});
var mySession =
mysqlx.getSession({
host: 'localhost',
port: 33060,
dbUser: 'root',
dbPassword: '<your password>'
});
yt.stdout.on('data', (data) => {
try {
dataObj = JSON.parse(data);
console.log(dataObj.fulltitle);
mySession.then(session => {
session.getSchema("yt").getCollection("youtube").add( dataObj )
.execute(function (row) {
}).catch(err => {
console.log(err);
})
.then( function (notices) { console.log("Wrote to MySQL: " + JSON.stringify(notices)) });
}).catch(function (err) {
console.log(err);
process.exit();
});
} catch (e) {
console.log(" --- Can't parse json" + e );
}
});
yt.stderr.on('data', (data) => {
console.log("Error receiving data");
});
yt.on('close', (code) => {
console.log(`child process exited with code ${code}`);
mySession.then(session => {session.close() } );
});
如上例所示,执行二进制youtube-dl(需要提前安装)来搜索MySQL 5.7相关视频。无需下载视频,只需获取JSON格式的元数据(“-j”flag)。因为是JSON格式,所以可以保存至MySQL文本库。表架构如下:
CREATE TABLE `youtube` (
`doc` json DEFAULT NULL,
`_id` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$._id'))) STORED NOT NULL,
UNIQUE KEY `_id` (`_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
执行案例:
$ node yt.js
What's New in MySQL 5.7
Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["3f312c3b-b2f3-55e8-0ee9-b706eddf"]}}
MySQL 5.7: MySQL JSON data type example
Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["88223742-9875-59f1-f535-f1cfb936"]}}
MySQL Performance Tuning: Part 1. Configuration (Covers MySQL 5.7)
Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["c377e051-37e6-8a63-bec7-1b81c6d6"]}}
Dave Stokes — MySQL 5.7 - New Features and Things That Will Break — php[world] 2014
Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["96ae0dd8-9f7d-c08a-bbef-1a256b11"]}}
MySQL 5.7 & JSON: New Opportunities for Developers - Thomas Ulin - Forum PHP 2015
Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["ccb5c53e-561c-2ed5-6deb-1b325739"]}}
Cara Instal MySQL 5.7.10 NoInstaller pada Windows Manual Part3
Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["95efbd79-8d79-e7b6-a535-271640c8"]}}
MySQL 5.7 Install and Configuration on Ubuntu 14.04
Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["b8cfe132-aca4-1eba-c2ae-69e48db8"]}}
有趣的是:当NodeJS + X Plugin = Asynchronous + Pipelining时,一旦表被锁定,程序操作不会停止。现已打开两个会话:
- session 1:
$ node yt.js > test_lock_table.log
- session 2:
mysql> lock table youtube read; select sleep(10); unlock tables;
Query OK, 0 rows affected (0.00 sec)
+-----------+
| sleep(10) |
+-----------+
| 0 |
+-----------+
1 row in set (10.01 sec)
Query OK, 0 rows affected (0.00 sec)
结果:
...
Upgrade MySQL Server from 5.5 to 5.7
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["d4d62a8a-fbfa-05ab-2110-2fd5cf6d"]}}
OSC15 - Georgi Kodinov - Secure Deployment Changes Coming in MySQL 5.7
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["8ac1cdb9-1499-544c-da2a-5db1ccf5"]}}
MySQL 5.7: Create JSON string using mysql
FreeBSD 10.3 - Instalación de MySQL 5.7 desde Código Fuente - Source Code
Webinar replay: How To Upgrade to MySQL 5.7 - The Best Practices - part 1
How to install MySQL Server on Mac OS X Yosemite - ltamTube
Webinar replay: How To Upgrade to MySQL 5.7 - The Best Practices - part 4
COMO INSTALAR MYSQL VERSION 5.7.13
MySQL and JSON
MySQL 5.7: Merge JSON data using MySQL
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["a11ff369-6f23-11e9-187b-e3713e6e"]}}
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["06143a61-4add-79da-0e1d-c2b52cf6"]}}
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["1eb94ef4-db63-cb75-767e-e1555549"]}}
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["e25f15b5-8c19-9531-ed69-7b46807a"]}}
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["02b5a4c9-6a21-f263-90d5-cd761906"]}}
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["e0bef958-10af-b181-81cd-5debaaa0"]}}
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["f48fa635-fa63-7481-0668-addabbac"]}}
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["557fa5c5-3c8a-fe01-c17c-549c557e"]}}
MySQL 5.7 Install and Configuration on Ubuntu 14.04
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["456b11d8-ba03-0aec-8e06-9517c6e1"]}}
MySQL WorkBench 6.3 installation on Ubuntu 14.04
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["0b651987-9b23-b5e0-f8f7-49b8ba5c"]}}
Going through era of IoT with MySQL 5.7 - FOSSASIA 2016
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["e133746c-836c-a7e0-3893-292a7429"]}}
MySQL 5.7: MySQL JSON operator example
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["4d13830d-7b30-5b31-d068-c7305e0a"]}}
由此可见,前两项是即时写入。随后,表被锁定,且无MySQL查询。同时,下载进程(此处为速度最慢的环节)继续进行,没有锁定(上述titles没有附加lines“… => wrote to MySQL:”)。表解锁后,大量等待查询通过。
这有助于下载进程的运行,而网络则成了瓶颈。传统同步查询中,表锁定则应用程序锁定(包括网络通信)。但如果有NodeJS和5.7 X Plugin,则MySQL做队列运行,下载环节继续。
流水线持久性
为了研究流水线的持久性,或者说连接中断后会发生的事,做了一些测试。这一次,在开始NodeJS之前锁定表格,切断连接,最后再解锁表格。结果如下:
Session 1:
----------
mysql> truncate table youtube_new;
Query OK, 0 rows affected (0.25 sec)
mysql> lock table youtube_new read;
Query OK, 0 rows affected (0.00 sec)
mysql> select count(*) from youtube_new;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)
Session 2:
----------
(when table is locked)
$ node yt1.js
11 03 MyISAM
Switching to InnoDB from MyISAM
tablas InnoDB a MyISAM
MongoDB vs MyISAM (MariaDB/MySQL)
MySQL Tutorial 35 - Foreign Key Constraints for the InnoDB Storage Engine
phpmyadmin foreign keys myisam innodb
Convert or change database manual from Myisam to Innodb
... >100 other results omited ...
^C
Session 1:
----------
mysql> select count(*) from youtube_new;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)
Id: 4916
User: root
Host: localhost:33221
db: NULL
Command: Query
Time: 28
State: Waiting for table metadata lock
Info: PLUGIN: INSERT INTO `iot`.`youtube_new` (doc) VALUES ('{"upload_date":"20140319","protocol":"
mysql> unlock table;
Query OK, 0 rows affected (0.00 sec)
mysql> select count(*) from youtube_new;
+----------+
| count(*) |
+----------+
| 2 |
+----------+
1 row in set (0.00 sec)
mysql> select json_unquote(doc->'$.title') from youtube_new;
+---------------------------------+
| json_unquote(doc->'$.title') |
+---------------------------------+
| 11 03 MyISAM |
| Switching to InnoDB from MyISAM |
+---------------------------------+
2 rows in set (0.00 sec)
注意:上例中,没有出现MySQL服务器发出的确认字符。代码接收MySQL回复,列出指令“Wrote to MySQL: {“_state”:{“rows_affected”:1,”doc_ids”:[“…”]}}”
。此外,切断连接后,MySQL进程还在表锁等待。
有趣的是此处只有两行被插入文本库。X协议的作者Jan Kneschke被问及此处是否有“history length”或是其它可扩展的缓存(buffer)时,回答如下:
- Q:是否有history length或缓存,而且我们能够加以调控吗?
- A:此处完全不存在“history”或“缓存”,且其都处于连接器层面。
- Q:为什么最终插入两行?
- A:对于这个问题,我已经连接了tcpdump与port 33060(X协议),如下:
这条信息非常重要,要记住异步流水线的配置不是一尘不变的:如果应用程序出错且存在pending writes,则writes可能丢失(或被写入)。
为了充分了解该协议的运行方式,截获tcpdump(并在Jan Kneschke的帮助下进行分析):
tcpdump -i lo -s0 -w tests/node-js-pipelining.pcap "tcp port 33060"
(tcpdump可视化更新如下)
- 点击CTRL+C,NodeJS关闭连接。由于表仍锁定,MySQL不能写,且无法返回插入结果。
- 表解锁后,尽管连接已锁定,仍可开始第一个statement。随后确认第一项插入,并开始下一项。
- 然而,此时脚本(客户端)已经关闭连接,且最后的数据包(此处是id)被否定。5.7 X Plugin由此发现客户端关闭连接并停止执行流水线。
实际上,这与初始MySQL协议的运行方式非常相像。如果切断脚本/应用程序,MySQL连接不会自动中断(除非在MySQL客户端点击CTRL+C,发出切断信号),且该连接会等待表解锁。表解锁后,它会插入一个文件中的第一个statement。
Session 1
---------
mysql> select * from t_sql;
Empty set (0.00 sec)
mysql> lock table t_sql read;
Query OK, 0 rows affected (0.00 sec)
Session 2:
----------
$ mysql iot < t.sql
$ kill -9 ...
[3] Killed mysql iot < t.sql
Session 1:
----------
mysql> show processlist;
+------+------+-----------------+------+---------+---------+---------------------------------+-----------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+------+------+-----------------+------+---------+---------+---------------------------------+-----------------------------------------------+
| 4913 | root | localhost | iot | Query | 41 | Waiting for table metadata lock | insert into t_sql values('{"test_field":0}') |
+------+------+-----------------+------+---------+---------+---------------------------------+-----------------------------------------------+
4 rows in set (0.00 sec)
mysql> unlock tables;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from t_sql;
+-------------------+
| doc |
+-------------------+
| {"test_field": 0} |
+-------------------+
1 row in set (0.00 sec)
实施unique check
若重启脚本,会再次找出相同的视频,故而可能需要提升数据一致性。默认条件下,插件生成文档唯一键,从而阻止副本插入。
另一条执行unique check的途径是为youtube id创建唯一键。更新表结构如下:
CREATE TABLE `youtube` (
`doc` json DEFAULT NULL,
`youtube_id` varchar(11) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$.id'))) STORED NOT NULL,
UNIQUE KEY `youtube_id` (`youtube_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
将默认“id”列改为YouTube唯一ID。重启脚本,显示如下:
MySQL 5.7: Merge JSON data using MySQL
{ [Error: Document contains a field value that is not unique but required to be]
info:
{ severity: 0,
code: 5116,
msg: 'Document contains a field value that is not unique but required to be',
sql_state: 'HY000' } }
... => wrote to MySQL: undefined
……文档已加载完毕。
结论
尽管X插件流水线不一定能显著提升查询回复(可能缩短往返延时),但仍然能有助于一些应用程序的运行。例如当MySQL表被锁定时,可能不希望网络通信受阻(例如下载或API调用)。同时,除非查询/等待服务器发出的确认字符,否则数据不一定会写入MySQL。
数据分析
下载结果如下:
"is_live": null,
"license": "Standard YouTube License",
"duration": 2965,
"end_time": null,
"playlist": ""mysql 5.7"",
"protocol": "https",
"uploader": "YUI Library",
"_filename": "Douglas Crockford - The JSON Saga--C-JoyNuQJs.mp4",
"age_limit": 0,
"alt_title": null,
"extractor": "youtube",
"format_id": "18",
"fulltitle": "Douglas Crockford: The JSON Saga",
"n_entries": 571,
"subtitles": {},
"thumbnail": "https://i.ytimg.com/vi/-C-JoyNuQJs/hqdefault.jpg",
"categories": ["Science & Technology"],
"display_id": "-C-JoyNuQJs",
"like_count": 251,
"player_url": null,
"resolution": "640x360",
"start_time": null,
"thumbnails": [{
"id": "0",
"url": "https://i.ytimg.com/vi/-C-JoyNuQJs/hqdefault.jpg"
}],
"view_count": 36538,
"annotations": null,
"description": "Yahoo! JavaScript architect Douglas Crockford tells the story of how JSON was discovered and how it became a major standard for describing data.",
"format_note": "medium",
"playlist_id": ""mysql 5.7"",
"upload_date": "20110828",
"uploader_id": "yuilibrary",
"webpage_url": "https://www.youtube.com/watch?v=-C-JoyNuQJs",
"uploader_url": "http://www.youtube.com/user/yuilibrary",
"dislike_count": 5,
"extractor_key": "Youtube",
"average_rating": 4.921875,
"playlist_index": 223,
"playlist_title": null,
"automatic_captions": {},
"requested_subtitles": null,
"webpage_url_basename": "-C-JoyNuQJs"
为了看到最流行的视频,在view_count基础上又添加了一个virtual field,并据此构建了一个索引。
CREATE TABLE `youtube` (
`doc` json DEFAULT NULL,
`youtube_id` varchar(11) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$.id'))) STORED NOT NULL,
`view_count` int(11) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$.view_count'))) VIRTUAL,
UNIQUE KEY `youtube_id` (`youtube_id`),
KEY `view_count` (`view_count`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
查询运行如下:
mysql> select json_unquote(doc->'$.title'),
-> view_count,
-> json_unquote(doc->'$.dislike_count') as dislikes
-> from youtube
-> order by view_count desc
-> limit 10;
+----------------------------------------------------------------------------------------------------+------------+----------+
| json_unquote(doc->'$.title') | view_count | dislikes |
+----------------------------------------------------------------------------------------------------+------------+----------+
| Beginners MYSQL Database Tutorial 1 # Download , Install MYSQL and first SQL query | 664153 | 106 |
| MySQL Tutorial | 533983 | 108 |
| PHP and MYSQL - Connecting to a Database and Adding Data | 377006 | 50 |
| PHP MySQL Tutorial | 197984 | 41 |
| Installing MySQL (Windows 7) | 196712 | 28 |
| Understanding PHP, MySQL, HTML and CSS and their Roles in Web Development - CodersCult Webinar 001 | 195464 | 24 |
| jQuery Ajax Tutorial #1 - Using AJAX & API's (jQuery Tutorial #7) | 179198 | 25 |
| How To Root Lenovo A6000 | 165221 | 40 |
| MySQL Tutorial 1 - What is MySQL | 165042 | 45 |
| How to Send Email in Blackboard Learn | 144948 | 28 |
+----------------------------------------------------------------------------------------------------+------------+----------+
10 rows in set (0.00 sec)
为了找出最受欢迎的决议,如下:
mysql> select count(*) as cnt,
-> sum(view_count) as sum_views,
-> json_unquote(doc->'$.resolution') as resolution
-> from youtube
-> group by resolution
-> order by cnt desc, sum_views desc
-> limit 10;
+-----+-----------+------------+
| cnt | sum_views | resolution |
+-----+-----------+------------+
| 273 | 3121447 | 1280x720 |
| 80 | 1195865 | 640x360 |
| 18 | 33958 | 1278x720 |
| 15 | 18560 | 1152x720 |
| 11 | 14800 | 960x720 |
| 5 | 6725 | 1276x720 |
| 4 | 18562 | 1280x682 |
| 4 | 1581 | 1280x616 |
| 4 | 348 | 1280x612 |
| 3 | 2024 | 1200x720 |
+-----+-----------+------------+
10 rows in set (0.02 sec)
在此特别感谢Oracle的Jan Kneschke和Morgan Tocker在X协议内部构件方面提供的帮助。
更新:Jan Kneschke也生成了tcpdump可视化(连接切断时):
相关阅读
Asynchronous Query Execution with MySQL 5.7 X Plugin
Looking inside the MySQL 5.7 document store
MySQL 5.7 key features