elastic search

最新推荐文章于 2024-07-09 20:12:07 发布

shu2020

最新推荐文章于 2024-07-09 20:12:07 发布

阅读量371

点赞数

分类专栏： search

本文链接：https://blog.csdn.net/shu2020/article/details/46521371

版权

search 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

INTRODUCTION

what is?

document oriented
schema free
distributed
multi-tenancy
api centric & RESTful
unstructured & structured search
analytics & combine
realtime

glossary

cluster
node
index: collection of documents, logical namespace which maps to shards
shard: single Apache Lucene instance
- primary: not possible to change after index creation
- replica: increase high availability, increase read throughput

in a distribution

executable scripts
node config file
data storage
- data path
internal libs
log

setting

discovery: multicast / unicast
network
- network.host: bind address, publish address
HTTP: ports [9200-9300)
transport: ports [9300-9400)

plugin

DATA IN DATA OUT

data structure

document as JSON object
metadata fields:
- id, type, source, timestamp, size…
- field data type: core data type, complex type, others

APIs

Create index API: -XPUT + target index name
Index API: create & update doc
- -XPUT + … document type
- operation response: 201 or 200
- _id url is optional
- index auto generated
- timestamp
- TTL: expritation time
- distributed execution: 1) index request, 2) reroute, 3) replicate
Get API
- -XGET+ retrive doc from index using _type and _id
- realtime
- operation response: 200 or 404
- request specific fiends using _source
- distributed execution: 1) get, 2) execute
Exists API: -XHEAD
Delete API: -XDELETE
Document Versioning:
- concurrency control: read-then-write
- creation, reindex, update, delete ops
- can be from external system
Update API
- -XPOST + partial data or scripts
- internal get-then-reindex ops min conflicts: retry_on_conflict
- named parameters, index related parameter
- upsert
Multi Get API
- get multiple doc: _mget
Bulk API
- minimize round trips when bulk ops
- line break
Search API:
- finding doc based on free text search: -XGET
- query DSL
- control search context

ELASTIC SEARCH & LUCENE

Lucene index

memory buffer
flush: issue Lucene commit & clear translog
refresh
N segments: immutable inverted index
- segments API
transaction log
delete doc/seg
merge segments: throttle
- optimize API: explicit merge

Detour

writes are sequential

TEXT ANALYSIS

need: stop word, uppercase, plural form, synonym…
anatomy of analyzer: tokenizer, token filter
- ICU plugins
- pre-built
Analyze API

MAPPINGS

mapping

index based on doc and fields
dynamic mapping
when
config

mapping API

basic mapping
type: text, numeric, date, boolean, object, common attribute
dynamic nature
multi field, metadata
customize

SEARCH

pagination
sorting
search types
- query then fetch
- dfs query then fetch
- count
- scan
query DSL
- query: match, multi-match, bool, range, match all, query strings …
- filters: warmer API
highlighting

SUGGEST

suggester: term, phrase, completion, fuzzy

RELEVANCY & BOOSTING

relavancy

vector space model
TF-IDF
lucene similarity

boosting

function score: boost factor, decay function, script score, random score

AGGREGATIONS

facet
scope
- query scope
- filtered_query, post_filter
categories of aggregation
- Buckets
  - filter
  - sub-aggregations
  - missing
  - terms
  - range, *_range
  - histogram, date_histogram
- Metrics
  - extended_stats

DOCUMENT RELATIONS

inner objects, nested, parent/child

GEO LOCATION

geo point
geo shape: geohashes, quad tree

PERCOLATOR

registering a query
_percolate API, _mpercolate API
routing, filtering, scoring and sorting, highlighting, aggregation, storing

DISTRIBUTED MODEL

finding nodes, elect master node
- cluster meta API
cluster state
- cluster state API
- shard allocation: create index, add nodes, remove node, filtering, awareness
node types: data node, master node, client node, tribe node
routing
replication

INDEX MANAGEMENT

create index, update index settings, deleting index, open/close index
index template
snapshot/restore - backup mechanism for indices
index aliase

DATA MANAGEMENT

overallocation
replica
multiple indices
capacity planning
user data flow
time data flow

MOVING TO PRODUCTION

shu2020

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
elastic search

INTRODUCTIONwhat is?document orientedschema freedistributedmulti-tenancyapi centric & RESTfulunstructured & structured searchanalytics & combinerealtimeglossaryclusternodeindex: collection
复制链接

扫一扫