splunk 测试报告

最新推荐文章于 2023-07-30 21:41:16 发布

wf1982

最新推荐文章于 2023-07-30 21:41:16 发布

阅读量7.5k

点赞数 3

分类专栏：云计算深入java 文章标签：测试 parsing events 负载均衡终端服务器配置管理

本文链接：https://blog.csdn.net/wf1982/article/details/7210326

版权

云计算同时被 2 个专栏收录

42 篇文章 0 订阅

订阅专栏

深入java

26 篇文章 0 订阅

订阅专栏

Splunk使用测试报告

一、技术组件及原理

1． Indexer 将本地或远程日志数据做索引。

工作机制：

可以对具有时间线的任何格式的日志数据做索引。这个索引动作是基于时间戳将数据打乱后放入events中，每个events包含时间戳、host、source、source type属性。一般一行日志就是一个event，如果是xml logs，可能被分解成多个events.当用户搜索时，这些events就被splunk搜索到返回给用户.

Events：A single piece of data in Splunk, similar to a record in a log file or other data input. When Splunk eats data, it breaks the data up into individual pieces and gives each piece a timestamp, host, source, and source type. Often, a single event corresponds nicely to a single line in your inputs, but some inputs have multiline events (for example, XML logs) and some inputs actually have multiple events on a single line. When you run a search, events are what you get back.

Event processing(事件处理):

包含2个阶段：解析及索引。进入splunk的数据会作为一个块（一般10k）放入解析管道中。解析过程中，这些块会被打乱。解析过程包括下面一些动作：

· Extracting a set of default fields for each event, including host, source, and sourcetype.

· Configuring character set encoding.

· Identifying line termination using linebreaking rules. While many events are short and only take up a line or two, others can be long.

· Identifying timestamps or creating them if they don't exist. At the same time that it processes timestamps, Splunk identifies event boundaries.

· Splunk can be set up to mask sensitive event data (such as credit card or social security numbers) at this stage. It can also be configured to apply custom metadata to incoming events.

在索引管道中，还有其他一些操作：

· Breaking all events into segments that can then be searched upon. You can determine the level of segmentation, which affects indexing and searching speed, search capability, and efficiency of disk compression.

· Building the index data structures.

· Writing the raw data and index files to disk, where post-indexing compression occurs.

上图只是简单的描述了重要的流程。Parsing管道包含：parsing, merging, and typing 这个三个管道。

Event data：索引日志数据后得到的索引数据

Event：通常是日志数据的一条记录

索引：索引包含2种类型的数据：

· The raw data in compressed form ("rawdata")

· Indexes that point to the raw data ("index files")

这些数据由目录来存放，目录被称作：bucket，这些目录按照时间来存放，这种方式能更好支持splunk对olddata的处理。

· A Splunk "index" contains compressed raw data and associated indexes.

· A Splunk index resides across many age-designated index directories.

· An index directory is a bucket.

这些bucket按照时间长久来滚动。级别有：

· hot

· warm

· cold

· frozen

As buckets age, they "roll" from one stage to the next. Newly indexed data goes into a hot bucket, which is a bucket that's both searchable and actively being written to. After the hot bucket reaches a certain size, it becomes a warm bucket, and a new hot bucket is created. Warm buckets are searchable, but are not actively written to. There are many warm buckets.

Once Splunk has created some maximum number of warm buckets, it begins to roll the warm buckets to cold based on their age. Always, the oldest warm bucket rolls to cold. Buckets continue to roll to cold as they age in this manner. After a set period of time, cold buckets roll to frozen, at which point they are either archived or deleted. By editing attributes in indexes.conf, you can specify the bucket aging policy, which determines when a bucket moves from one stage to the next.

Bucket stage	Description	Searchable?
Hot	Contains newly indexed data. Open for writing. One or more hot buckets for each index.	Yes.
Warm	Data rolled from hot. There are many warm buckets.	Yes.
Cold	Data rolled from warm. There are many cold buckets.	Yes.
Frozen	Data rolled from cold. Splunk deletes frozen data by default, but you can also archive it.	No.

这些bucket按照stage可以被称为 db。如hotdb，warmdb...

Db下又可以分区。每个bucket size都可以配置.

Index详细目录结果参见：

http://docs.splunk.com/Documentation/Splunk/4.3/Admin/HowSplunkstoresindexes

2． forwarder

部署在终端服务器，负责将日志数据转发到indexer。而且也可以转到其他splunkserver或非splunkserver。有2种类型：

· Universal forwarders. These have a very light footprint and forward only unparsed data.

· Heavy forwarders. These have a larger footprint but can parse, and even index, data before forwarding it.

也可以是三种。详见

http://docs.splunk.com/Documentation/Splunk/4.3/Deploy/Typesofforwarders

接收者一般是indexer.可以接收来自一个或多个forwarder的数据.接收者也可能是其他的forwarder.类似scribe. 不同的是对数据有可能做解析和索引.

这种结构对数据联合、负载均衡、数据路由提供了基础支持。详见：http://docs.splunk.com/Documentation/Splunk/4.3/Deploy/Forwarderdeploymenttopologies

3． Search head

当在进行分布式搜索时，就会用到这个。比如有非常大量的数据，而且有很多用户并发搜索这些数据，这时在不同的indexer上做索引加载，就会有利于在不同服务器上完成搜索查询，达到分流减轻负载的效果。这些将搜索查询请求分散到不同的indexers上的组件就是search head.

详细请见

http://docs.splunk.com/Documentation/Splunk/4.3/Deploy/Whatisdistributedsearch

几个图：

This diagram shows a simple distributed search scenario for horizontal scaling, with one search head searching across three peers:

In this diagram showing a distributed search scenario for access control, a "security" department search head has visibility into all the indexing search peers. Each search peer also has the ability to search its own data. In addition, the department A search peer has access to both its data and the data of department B:

Finally, this diagram shows the use of load balancing and distributed search to provide high availability access to data:

4．Deployment Server

中心配置管理器。每个splunk实例都可以作为一个deployment server.

The deployment sever handles configuration and content updates to existing Splunk installations. You cannot use it for initial or upgrade installations of Splunk components

Splunk instances that are remotely configured by deployment servers are called deployment clients. A Splunk instance can be both a deployment server and client at the same time.

术语：

Term	Meaning
deployment server	A Splunk instance that acts as a centralized configuration manager. It pushes configuration updates to other Splunk instances.
deployment client	A remotely configured Splunk instance. It receives updates from the deployment server.
server class	A deployment configuration category shared by a group of deployment clients. A deployment client can belong to multiple server classes.
deployment app	A unit of content deployed to one or more members of a server class or classes.
multi-tenant environment	A deployment environment involving multiple deployment servers.

二、splunk重要功能

1 数据导入：支持使用数据类型来导入文件、目录、或者apache日志等等，也可以导入其他文件，并使用正则定义时间戳格式。同时支持udp、 tcp或自定义脚本收集数据。

2 索引库：

支持创建索引库，默认使用默认库结构：

主目录：$SPLUNK_DB/INDEX_NAME/db

Colddb：$SPLUNK_DB/INDEX_NAME/colddb

Thawed DB路径（解冻数据）：$SPLUNK_DB/INDEX_NAME/thaweddb

总量有上限设置

对索引库有统计（总资料库大小、比数、首笔及末笔时间、所属应用）

3 转发接收

有管理界面配置转发与接收（即上文指的forwarder及indexer），通过ip:端口执行

4 支持分布式搜索：需要提供节点的主机ip:port，管理员用户的用户名、密码使splunk可以通过远程搜索验证

5 支持搜索框方式中的任意搜索，直接搜索应用的日志数据，并支持交互式搜索（通过鼠标突出显示日志的一些词语，就会自动加入到搜索条件中）

6 支持字段搜索。默认提供了host、source、sourcetype。可以自定义添加。在搜索页面中也可以随时添加显示字段。这个地方为啥不是在用户上传时指定呢。

7 支持搜索时按时间轴显示，时间轴显示搜索命中的频率高低。并可调整时间轴做进一步搜索。

8 搜索支持运算支持 http代码等，使用非常方便。做的很细致

9 提供知识库，支持用户添加eventtype、字段、标签等

10 支持搜索排程，计划搜索。

11 支持报警通知

12 结果展现多样，不仅支持显示源数据，同时支持字段表格显示，图形显示。功能设置人性化

13 支持动态创建仪表盘等视图、报表等，支持在线打印。

14 支持用户入门指导，方便用户熟悉系统

三、优缺点

1 优点：

搜索日志内容变的非常方便；

支持分布式搜索，扩展性、容错、负载均衡能力强大

支持图形化搜索结果，支持报表

支持排程方式搜索

支持报警通知

时间轴方式只管展现搜索趋势及结果

2 缺点

统计日志数据方面是弱项

索引数据及搜索数据存放到indexers中，indexer挂掉会丢失数据，倒是forwarder可以选择在本地备份

看了官方技术体系，现在还不是很确定，如果是日志产生量超大情况下，因为forwarder会转发到一个indexer或者多个，但是搜索性能会不会显著下降未可知，虽然他内部有search head分散到多个indexer中。

wf1982

关注

3
点赞
踩
3

收藏

觉得还不错? 一键收藏
1
评论
splunk 测试报告

Splunk使用测试报告一、技术组件及原理1． Indexer 将本地或远程日志数据做索引。工作机制：可以对具有时间线的任何格式的日志数据做索引。这个索引动作是基于时间戳将数据打乱后放入events中，每个events包含时间戳、host、source、source type属性。一般一行日志就是一个event，如果是xml logs，可能被分解成多个events.当用户搜索时，
复制链接

扫一扫