esrally --offline --pipeline=from-distribution --distribution-version=5.2.0 --track=geonames 参数理解: Pipelines管道--基准 A pipeline is a series of steps that are performed to get benchmark results. This is not intended to customize the actual benchmark but rather what happens before and after a benchmark. from-distribution 由rally启动一个ES集群,进行测试 benchmark-only 测试外部ES集群 Track A track is the description of one ore more benchmarking scenarios with a specific document corpus. It defines for example the involved indices, data files and which operations are invoked. 调用特定文档语料库(测试数据源) Challenge The challenges section contains a list of challenges which describe the benchmark scenarios for this data set. It can reference all operations that are defined in the operations section. 挑战部分包含一个列表描述基准场景数据集。 数据源: /home/admin/.rally/benchmarks/data/geonames/documents.json 测试方式: 测试一个本地ES集群: esrally --offline --pipeline=benchmark-only --target-hosts=127.0.0.1:9200 --track=tiny --challenge=append-fast-no-conflicts esrally --pipeline=benchmark-only --target-hosts=127.0.0.1:9200 压测远端ES集群: esrally --pipeline=benchmark-only --target-hosts=192.168.1.1:9200,192.168.1.2:9200 调整压测任务: 如果你其实只关心部分的性能,比如只关心写入,不关心搜索。其实可以自己去修改一下 track 的任务定义。 track 的定义文件在 ~/.rally/benchmarks/tracks/default/geonames/track.json。 建议直接新建一个 track 目录,比如叫 mytest/track.json。 对照 geonames 里的定义,一个 track 包括以下部分: meta:定义数据来源 URL。 indices:定义索引名称、索引 mapping 的文件位置、数据的存放位置和校验信息。 operations:定义一个个操作的名称、类型、索引和请求参数。如果操作类型是 index,可用的索引参数有:client 并发量、bulk 大小、是否强制 merge 等;如果操作类型是 search,可用的 请求参数就是一个 queries 数组,按序放好一个个 queryDSL。 challenges:定义好名称和调用哪些 operation,调用顺序如何。 自定义数据集 { "meta": { "data-url": "/Users/raochenlin/.rally/benchmarks/data/splunklog/1468766825_10.json.bz2" }, "indices": [ { "name": "splunklog", "types": [ { "name": "type", "mapping": "mappings.json", "documents": "1468766825_10.json.bz2", "document-count": 924645, "compressed-bytes": 19149532, "uncompressed-bytes": 938012996 } ] } ] 自定义测试源准备: 数据源下载http://download.geonames.org/export/dump/allCountries.zip toJSON.py import json import csv cols = (('geonameid', 'int'), ('name', 'string'), ('asciiname', 'string'), ('alternatenames', 'string'), ('latitude', 'double'), ('longitude', 'double'), ('feature_class', 'string'), ('feature_code', 'string'), ('country_code', 'string'), ('cc2', 'string'), ('admin1_code', 'string'), ('admin2_code', 'string'), ('admin3_code', 'string'), ('admin4_code', 'string'), ('population', 'long'), ('elevation', 'int'), ('dem', 'string'), ('timezone', 'string')) with open('allCountries.txt') as f: while True: line = f.readline() if line == '': break tup = line.strip().split('\t') d = {} for i in range(len(cols)): name, type = cols[i] if tup[i] != '': if type in ('int', 'long'): d[name] = int(tup[i]) elif type == 'double': d[name] = float(tup[i]) else: d[name] = tup[i] print(json.dumps(d)) python3 toJSON.py > documents.json. bzip2 -9 -c documents.json > documents.json.bz2 参考文献: rally官网:http://esrally.readthedocs.io/en/latest/index.html