The number of partitions is fixed at the time of the index creation. Each partition is an indivisible chunk of data which can grow in size but not be split into smaller chunks. As the data grows, more nodes may be added to the cluster, but the number of partitions which eventually move to the newer nodes are limited (Fig 3)
Another challenge was to incorporate the above changes for the existing data i.e. increasing the number of shards and defining a custom routing key for each of the document. The existing index had close to 33 MN documents at the time of the change
Given that the routing key and the number of shards needed to changes, a
The choice of the routing key in our case was a combination of the unique identifier (UUID) and type of the primary participant of each event in a calendar. This choice was made based on the querying patterns of the consuming application and to maintain a degree of uniqueness.
The query speeds improve as, instead of instructing Elasticsearch to Get all the data matching... , you say Go there and get all the data matching... ( Fig 6 ). If you are querying data with multiple unique routing keys, this guarantees better or equivalent query performance as that of the default routing strategy (reducing the scatter and gat
In Elasticsearch, these partitions are known as shards . Each node of the Elasticsearch cluster can serve queries independently for the partition it manages, thereby increasing query throughput.