Elasticsearch
February 5, 2025 ยท View on GitHub
NOT PORTED YET
Distributed ring clustered search service.
- Docs
- Sample Sizing
- Docker Quickstart
- Heap
- Linux Limits
- Monitoring
- Performance Tuning Tips
- Diagram
- Elasticsearch on Kubernetes
- OpenSearch
- Bulk Indexing from Hadoop MapReduce (Pig) - Performance
- Client Libaries
Docs
https://logz.io/learn/complete-guide-elk-stack/
https://aws.amazon.com/opensearch-service/resources/
Sample Sizing
Cluster from a bank: 470TB 88 nodes 30k events/sec
Docker Quickstart
docker run --rm --network host -p 9200:9200 elasticsearch
X-Pack 30 days trial with authentication:
docker run --rm --network host -p 9200:9200 -e ELASTIC_PASSWORD=password docker.elastic.co/elasticsearch/elasticsearch-platinum:6.2.2
Heap
Used 30% of 6 nodes at 10g, increased to 75% with 30g, odd failed task retried, some dups.
/etc/sysconfig/elasticsearch:
ES_HEAP_SIZE=30g # was doing Full GCs at 10g
Get rid of stupid Marvel superhero names for easier debugging, export HOSTNAME for use in elasticsearch.yml:
export HOSTNAME=$(hostname -s)
Linux Limits
To allow mlockall: true to work in elasticsearch config:
/etc/security/limits.d/elasticsearch:
* - memlock unlimited
* - nproc 5000
* - nofile 32000
Monitoring
-
Marvel
Performance Tuning Tips
Single tenant dedicated servers more appropriate for Elasticsearch workloads.
- 600-750 shards per node max per 30GB instance, best <= 500
- shard max = 50GB (bigger makes replication slower)
- too much data = slow recovery shard migration (2-2.5h for 5TB nodes!!)
Segregate roles for best performance:
- master node - manages shard routing table + distributes to nodes - send cluster health checks here
- data node - stores data + fullfils queries - send writes here
- client node - non-master non-data handles merge-sort (CPU + RAM intensive) aggregation from distributed shard queries to data nodes - point LBs + send queries to client nodes
Diagram
Elasticsearch Queries
From the HariSekhon/Diagrams-as-Code repo:
Elasticsearch on Kubernetes
In tests Elasticsearch performance on Kubernetes was significantly lower than on bare metal servers.
This is to be expected, due to both virtualization and lower spec disk storage.
If trying to use over-spec'd servers while avoiding large heap JVM issues by running multiple Elasticsearch instances on same host, set this to check not all shard replicas go on same host:
cluster.routing.allocation.same_shard.host = true
- query:
- node (becomes query gateway node)
node2returns top 10node3returns top 10
- node (becomes query gateway node)
Gateway node does merge-sort (CPU + RAM intensive) - this should mean that theoretically it doesn't scale quite linearly because of gateway node's merge-sort = more nodes = more overhead
OpenSearch
Open source fork of Elasticsearch and Kibana caused by their license change in early 2021.
OpenSearch Index State Management (ISM)
ISM policies automate OpenSearch routine maintenance like marking index read only or deleting it after retention period, without needing to use add-on tools like Curator.
OpenSearch Snapshot Management
Automates taking snapshots including index data and Elasticsearch persistent settings to an S3 bucket eg. daily snapshot with 30 days of snapshots retention, see this doc.
Bulk Indexing from Hadoop MapReduce (Pig) - Performance
14.5B docs, 18TB data across 6 high spec servers
- data spread was not even across nodes
- Linux
loadcommand shows uneven load across nodes during bulk insertions from Pig
Dynamic settings per index:
index.refresh_interval: -1
index.number_of_replicas: 0
10GigE - increase recovery speed - PUT /_cluster/setting into persistent
indices.recovery.max_bytes_per_sec = '400mb'
Files of shard are spread across disks like Raid 0 - this will change to one data path per shard in Elasticsearch 2.0
path.data- 3 disks or 1 disk similar very high CPU 800-2000%, went back to using 12 disks to at least spread the I/O and stop raising dozens % I/O on the fewer disks
Client Libaries
The Java Elasticsearch client that is routing table aware to skip one hop, equivalent to a client-side Load Balancer. Use this instead of a Load Balancer as it's more efficient 1 less level of network indirection.
Perl CPAN Search::Elasticsearch
Partial port from private Knowledge Base page 2013+