Scaling CoreDNS in Kubernetes Clusters

A guide for tuning CoreDNS resources/requirements in Kubernetes clusters

I’m sharing the results of some tests I ran with CoreDNS (1.2.5) in Kubernetes (1.12) to provide some reference points for tuning CoreDNS to your cluster. In addition to testing CoreDNS in its default configuration, I tested CoreDNS with the optional autopath plugin enabled. The autopath plugin is an optimization that helps transparently mitigate the DNS performance penalties Pods incur due to Kubernetes' infamous ndots:5 issue. These tests quantify the memory/performance trade when enabling autopath.

The guides and fomulas in this post are based on a set of tests of clusters in GCE, your mileage may vary. This blog post is a excerpt of the complete results, you can see more detail here.

Memory and Pods

In large scale Kubernetes clusters, CoreDNS’s memory usage is predominantly affected by the number of Pods and Services in the cluster.

CoreDNS in Kubernetes Memory Use

With default CoreDNS settings

To estimate the amount of memory required for a CoreDNS instance (using default settings), you can use the following formula:

MB required (default settings) = (Number of Pods + Services) / 1000 + 54

With the autopath plugin

The autopath plugin is an optional optimization that improves performance for queries of names external to the cluster (e.g. infoblox.com). Enabling the autopath plugin requires CoreDNS to use significantly more memory to store information about Pods.
Enabling the autopath plugin also puts additional load on the Kubernetes API, since it must monitor all changes to Pods.

To estimate the amount of memory required for a CoreDNS instance (using the autopath plugin), you can use the following formula:

MB required (w/ autopath) = (Number of Pods + Services) / 250 + 56

CPU and QPS

Max QPS was tested by using the kubernetes/perf-tests/dns tool, on a cluster using CoreDNS. The two types of queries used were internal queries (e.g. kubernetes), and external queries (e.g. infoblox.com).

With default CoreDNS settings

Single instance of CoreDNS (default settings) on a GCE n1-standard-2 node:

Query Type QPS Avg Latency (ms)
external 67331 12.021
internal 33669 2.608

1 From the server perspective it is processing 33667 QPS with 2.404 ms latency, but from the client perspective, each single name lookup actually comprised 5 serial lookups.

With the autopath plugin

The autopath plugin in CoreDNS is an option that mitigates the ClusterFirst search list penalty. When enabled, it reduces the number of DNS queries a client makes when looking up an external name.

Single instance of CoreDNS (with the autopath plugin enabled) on a GCE n1-standard-2 node:

Query Type QPS Avg Latency (ms)
external 31428 2.605
internal 33918 2.62

Note that the numbers for external queries are much improved here. This is due to the autopath plugin optimization.

The server perspective latency for external queries goes up slightly when autopath is enabled (+8%).
This is because it’s doing the extra work of checking each search domain on the server side.
But since it can answer in one round trip instead of five, the overall client perspective performance is much improved.

More…

For more information about the test environments and how the data was collected, see the full results here.

Chris O'Haver
Published: and tagged Deployment, Discovery, DNS, Documentation, Kubernetes and Service using 510 words.