This document provides guidance for capacity planning and how to choose the right configuration–standalone, high availability, or tiered–for the Chef Infra Server. This document provides guidance and not hard/fast rules. This is because some requests to the Chef Infra Server API are more computationally expensive than others. In general, it’s better to start small and then scale the Chef Infra Server as needed. Premature optimization can hinder more than help because it may introduce unnecessary complexity.
The Chef Infra Server itself is highly scalable. A single virtual machine running the Chef Infra Server can handle requests for many thousands of nodes. As the scale increases, it’s a straightforward process to expand into a tiered front-end, back-end architecture with horizontally scaled front-ends to relieve pressure on system bottlenecks.
That said, it’s best to isolate failure domains with their own Chef Infra Server, rather than trying to run every node in an infrastructure from a single central, monolithic Chef Infra Server instance/cluster.
For instance, if there are West coast and East coast data centers, it is best to have one Chef Infra Server instance in each datacenter. Deploys to each Chef Infra Server can be synchronized upstream by CI software. The primary limiting bottleneck for Chef Infra Server installations is almost always input/output operations per second (IOPS) performance for the database filesystem.
The key unit of measure for scaling the Chef Infra Server is the number of Chef Infra Client runs per minute: CCRs/min. For example, 500 nodes set to check in every 30 minutes is equivalent to 16.66 CCRs/min.
Typically, the Chef Infra Server does not require a high availability or tiered topology until the number of CCRs/min is higher than 333/min (approximately 10k nodes).
While synthetic benchmarks should be taken with a grain of salt, as they don’t typically represent real-world performance, internal synthetic benchmarks at Chef have seen a standalone Chef Infra Server installed on a c3.2xlarge Amazon Web Services (AWS) instance handle more than 1,000 CCRs/min (30k nodes).
Several factors may influence server scalability. All server sizing recommendations are based on these assumptions:
partial_search are utilized, but not heavilyThe following sections describe the host specifications for various sizes of CCRs/min and help show when to consider moving from a standalone topology to a high availability or tiered topology.
UP TO 33 CCRs/Min (approx. 1,000 nodes):
m3.large instanceUP TO 167 CCRs/Min (approx. 5,000 nodes):
m3.xlarge instanceUP TO 333 CCRs/Min (Approx. 10,000 nodes):
m3.2xlarge instanceUP TO 667 CCRs/Min (Approx. 20,000 nodes):
m3.2xlarge instanceScaling beyond 20,000 nodes on a single cluster:
© Chef Software, Inc.
Licensed under the Creative Commons Attribution 3.0 Unported License.
The Chef™ Mark and Chef Logo are either registered trademarks/service marks or trademarks/servicemarks of Chef, in the United States and other countries and are used with Chef Inc's permission.
We are not affiliated with, endorsed or sponsored by Chef Inc.
https://docs.chef.io/server/capacity_planning/