Improving query performance in ElasticSearch at scale

blogs.halodoc.io

Saved by Andrés

RelatedHighlightsCollectionsImages

The number of partitions is fixed at the time of the index creation. Each partition is an indivisible chunk of data which can grow in size but not be split into smaller chunks. As the data grows, more nodes may be added to the cluster, but the number of partitions which eventually move to the newer nodes are limited (Fig 3)

In some sense, as the da

Improving query performance in ElasticSearch at scale

The "Painless" data migration

Another challenge was to incorporate the above changes for the existing data i.e. increasing the number of shards and defining a custom routing key for each of the document. The existing index had close to 33 MN documents at the time of the change

Given that the routing key and the number of shards needed to changes, a

Improving query performance in ElasticSearch at scale

The choice of the routing key in our case was a combination of the unique identifier (UUID) and type of the primary participant of each event in a calendar. This choice was made based on the querying patterns of the consuming application and to maintain a degree of uniqueness.

Improving query performance in ElasticSearch at scale

Pros

The query speeds improve as, instead of instructing Elasticsearch to Get all the data matching... , you say Go there and get all the data matching... ( Fig 6 ). If you are querying data with multiple unique routing keys, this guarantees better or equivalent query performance as that of the default routing strategy (reducing the scatter and gat

Improving query performance in ElasticSearch at scale

In Elasticsearch, these partitions are known as shards . Each node of the Elasticsearch cluster can serve queries independently for the partition it manages, thereby increasing query throughput.