A Comprehensive Guide to OpenSearch and Elasticsearch Architecture

Elasticsearch Architecture

Lucene

Elasticsearch Functionality

Distributed Framework

Scalability

High Availability

Database-Like Capabilities

Faceting/Aggregations

User Interface

Security and Access Control

Elasticsearch Cluster

Elasticsearch Cluster

Master Node

Master-Eligible Nodes

Data Nodes

Client Nodes

Ingest Nodes

Data Organization

Elasticsearch Index

Index

Document

Fields

Internal Data Structures

Shards

Primary and Replica Shards

Segments

Segment Merging

Translog

Document Indexing

Data Analysis

Character Filters

Tokenizers

Token Filters

Normalizer

Field Data Types

Inverted Index

Term Dictionary

  • Term: Term produced by the analyzer/normalizer
  • Document Count: Count of documents that contain the term. Required for scoring.
  • Frequencies: Number of times the term appeared in the documents. Required for scoring.
  • Positions: Position at which the term appears in the field. Required to support phrase and proximity queries.
  • Offsets: Character offsets to the original text. Required for providing faster search highlighting.

Term Index

Document Searching

Query Phase

Fetch Phase

Document Scoring

Aggregations

Metrics Aggregations

Numeric Metric Aggregations

GET book_store_orders/_search{ “size”: 0, “aggs”: { “total_orders”: { “sum”: { “field”: “order_price” } } }}

Non-Numeric Metric Aggregation

GET book_store_orders/_search{ “size”: 0, “aggs”: { “category”: { “terms”: { “field”: “category.keyword”, “size”: 5 }, “aggs”: { “top_orders_by_price”: { “top_hits”: { “size”: 10, “sort”: [ { “order_price”: { “order”: “desc” } } ] } } } } }}

Bucket Aggregations

GET book_store_orders/_search{ “size”: 0, “aggs”: { “sale_by_category”: { “terms”: { “field”: “category.keyword”, “size”: 10 }, “aggs”: { “sales_stats”: { “stats”: { “field”: “order_price” } } } } }}

Pipeline Aggregations

GET book_store_orders/_search{ “size”: 0, “aggs”: { “sale_by_category”: { “terms”: {“field”: “category.keyword”}, “aggs”: { “max_order_price”: { “max”: {“field”: “order_price”} } } }, “max_order_price_across_categories”: { “max_bucket”: { “buckets_path”: “sale_by_category>max_order_price” } } }}

Conclusion

A Comprehensive Guide to OpenSearch and Elasticsearch Architecture

--

--

Managed platform for open source technologies including Apache Cassandra, Apache Kafka, Apache ZooKeepere, Redis, Elasticsearch and PostgreSQL

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Instaclustr

Instaclustr

Managed platform for open source technologies including Apache Cassandra, Apache Kafka, Apache ZooKeepere, Redis, Elasticsearch and PostgreSQL