elastic search

INTRODUCTION

what is?

  • document oriented
  • schema free
  • distributed
  • multi-tenancy
  • api centric & RESTful
  • unstructured & structured search
  • analytics & combine
  • realtime

glossary

  • cluster
  • node
  • index: collection of documents, logical namespace which maps to shards
  • shard: single Apache Lucene instance
    • primary: not possible to change after index creation
    • replica: increase high availability, increase read throughput

in a distribution

  • executable scripts
  • node config file
  • data storage
    • data path
  • internal libs
  • log

setting

  • discovery: multicast / unicast
  • network
    • network.host: bind address, publish address
  • HTTP: ports [9200-9300)
  • transport: ports [9300-9400)

plugin

DATA IN DATA OUT

data structure

  • document as JSON object
  • metadata fields:
    • id, type, source, timestamp, size…
    • field data type: core data type, complex type, others

APIs

  • Create index API: -XPUT + target index name
  • Index API: create & update doc
    • -XPUT + … document type
    • operation response: 201 or 200
    • _id url is optional
    • index auto generated
    • timestamp
    • TTL: expritation time
    • distributed execution: 1) index request, 2) reroute, 3) replicate
  • Get API
    • -XGET+ retrive doc from index using _type and _id
    • realtime
    • operation response: 200 or 404
    • request specific fiends using _source
    • distributed execution: 1) get, 2) execute
  • Exists API: -XHEAD
  • Delete API: -XDELETE
  • Document Versioning:
    • concurrency control: read-then-write
    • creation, reindex, update, delete ops
    • can be from external system
  • Update API
    • -XPOST + partial data or scripts
    • internal get-then-reindex ops min conflicts: retry_on_conflict
    • named parameters, index related parameter
    • upsert
  • Multi Get API
    • get multiple doc: _mget
  • Bulk API
    • minimize round trips when bulk ops
    • line break
  • Search API:
    • finding doc based on free text search: -XGET
    • query DSL
    • control search context

ELASTIC SEARCH & LUCENE

Lucene index

  • memory buffer
  • flush: issue Lucene commit & clear translog
  • refresh
  • N segments: immutable inverted index
    • segments API
  • transaction log
  • delete doc/seg
  • merge segments: throttle
    • optimize API: explicit merge

Detour

  • writes are sequential

TEXT ANALYSIS

  • need: stop word, uppercase, plural form, synonym…
  • anatomy of analyzer: tokenizer, token filter
    • ICU plugins
    • pre-built
  • Analyze API

MAPPINGS

mapping

  • index based on doc and fields
  • dynamic mapping
  • when
  • config

mapping API

  • basic mapping
  • type: text, numeric, date, boolean, object, common attribute
  • dynamic nature
  • multi field, metadata
  • customize
  • pagination
  • sorting
  • search types
    • query then fetch
    • dfs query then fetch
    • count
    • scan
  • query DSL
    • query: match, multi-match, bool, range, match all, query strings …
    • filters: warmer API
  • highlighting

SUGGEST

suggester: term, phrase, completion, fuzzy

RELEVANCY & BOOSTING

relavancy

  • vector space model
  • TF-IDF
  • lucene similarity

boosting

function score: boost factor, decay function, script score, random score

AGGREGATIONS

  • facet
  • scope
    • query scope
    • filtered_query, post_filter
  • categories of aggregation
    • Buckets
      • filter
      • sub-aggregations
      • missing
      • terms
      • range, *_range
      • histogram, date_histogram
    • Metrics
      • extended_stats

DOCUMENT RELATIONS

inner objects, nested, parent/child

GEO LOCATION

geo point
geo shape: geohashes, quad tree

PERCOLATOR

  • registering a query
  • _percolate API, _mpercolate API
  • routing, filtering, scoring and sorting, highlighting, aggregation, storing

DISTRIBUTED MODEL

  • finding nodes, elect master node
    • cluster meta API
  • cluster state
    • cluster state API
    • shard allocation: create index, add nodes, remove node, filtering, awareness
  • node types: data node, master node, client node, tribe node
  • routing
  • replication

INDEX MANAGEMENT

create index, update index settings, deleting index, open/close index
index template
snapshot/restore - backup mechanism for indices
index aliase

DATA MANAGEMENT

  • overallocation
  • replica
  • multiple indices
  • capacity planning
  • user data flow
  • time data flow

MOVING TO PRODUCTION

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值