Improve ElasticSearch CCS Performance With Minimize Roundtrips

Starting with the ElasticSearch 5.3 release, it became possible to federate queries between clusters using the cross cluster search (CCS) capability instead of using Tribe nodes.

The original implementation of CCS would essentially treat remote shards as if they were part of the local cluster. This would cause multiple requests to the remote cluster that was approximately related to the number of remote shards included in the search. If the network latency between clusters is negligible, then this is not noticeable. Unfortunately, if the network latency between clusters is significant then the effect of the multiple round-trip requests can cause queries to become drastically slow.

Since it is a natural to use CCS to connect distant clusters together across a wide-area-network (WAN) this became a significant bottle neck in CCS query performance. With the ElasticSearch 7.0 release the default behavior has been changed to minimize network round-trips which should drastically improve the query.

We are going to test the improved CCS by federating queries between clusters in AWS regions that are very far apart, such as Ohio and Sydney where the round-trip ping time is around 190ms. Our test configuration uses two t2.medium nodes that are running the standard ElasticSearch 7.2.0 Docker images. NOTE: RUNNING ELASTICSEARCH IN THIS MANNER PROVIDES NO SECURITY AND YOUR CLUSTER IS OPEN TO EVERYBODY.

We will make a search request from our host computer to the Ohio cluster and have it send the query to the Sydney cluster using CCS.

For testing purposes, each cluster has one index called 'test_data' that holds 500,000 records that look like this.

Our test query simulates using the Kibana map to display grid aggregation results, which looks approximately like this:

Because we want to test what happens when a large number of shards are invoked, the indexes are set to use 200 shards, which is definitely overkill for the amount of data used in this test.

Measurements are made by executing the query via `curl` and using the "took" time that was reported in the query and the total execution time as reported by the `time` command. The query is executed once to warm up the caches before executing the query three more times to take measurements.

When the query is executed directly against the Ohio cluster, the reported "took" time ranges between 40-70ms and the total time to execute the curl command is around 130-170ms.

When the query is executed directly against the Sydney cluster, the reported "took" time ranges between 40-70ms and the total time to execute the curl command is around 520-550ms. We can see the impact of the network latency because the query itself (measured by "took") is about the same speed regardless of what cluster it runs on.

When the query is executed against the Ohio cluster, but using CCS to query the Sydney cluster the reported "took" time ranges between 240-270ms and the total time to execute the curl command is around 330-360ms. When using CCS the "took" time has increased because it now includes network latency between clusters to make the query. Surprisingly, the overall execution of curl is lower. While this seems counter intuitive, the reason is that CCS is keeping an established TCP connection between the clusters but when we make a direct `curl` request to the Sydney cluster there is an entire TCP setup that needs to occur over the high latency link.

However, when the exact same query is executed with `ccs_minimize_roundtrips` set to false (i.e. the original CCS implementation) the query is very slow. The "took" time is now in the range of 8800-8900ms and the overall execution is around 8900-9000ms!

These results demonstrate the benefit of upgrading to ElasticSearch 7.0 or higher when using CCS over a high-latency network.


Recent Posts

See All