Kubernetes v1.10 [stable]
kubeadm init
and kubeadm join
together provides a nice user experience for creating a best-practice but bare Kubernetes cluster from scratch. However, it might not be obvious how kubeadm does that.
This document provides additional details on what happen under the hood, with the aim of sharing knowledge on Kubernetes cluster best practices.
The cluster that kubeadm init
and kubeadm join
set up should be:
kubeadm init
export KUBECONFIG=/etc/kubernetes/admin.conf
kubectl apply -f <network-of-choice.yaml>
kubeadm join --token <token> <endpoint>:<port>
In order to reduce complexity and to simplify development of higher level tools that build on top of kubeadm, it uses a limited set of constant values for well-known paths and file names.
The Kubernetes directory /etc/kubernetes
is a constant in the application, since it is clearly the given path in a majority of cases, and the most intuitive location; other constants paths and file names are:
/etc/kubernetes/manifests
as the path where kubelet should look for static Pod manifests. Names of static Pod manifests are: etcd.yaml
kube-apiserver.yaml
kube-controller-manager.yaml
kube-scheduler.yaml
/etc/kubernetes/
as the path where kubeconfig files with identities for control plane components are stored. Names of kubeconfig files are: kubelet.conf
(bootstrap-kubelet.conf
during TLS bootstrap)controller-manager.conf
scheduler.conf
admin.conf
for the cluster admin and kubeadm itselfca.crt
, ca.key
for the Kubernetes certificate authorityapiserver.crt
, apiserver.key
for the API server certificateapiserver-kubelet-client.crt
, apiserver-kubelet-client.key
for the client certificate used by the API server to connect to the kubelets securelysa.pub
, sa.key
for the key used by the controller manager when signing ServiceAccountfront-proxy-ca.crt
, front-proxy-ca.key
for the front proxy certificate authorityfront-proxy-client.crt
, front-proxy-client.key
for the front proxy clientThe kubeadm init
internal workflow consists of a sequence of atomic work tasks to perform, as described in kubeadm init
.
The kubeadm init phase
command allows users to invoke each task individually, and ultimately offers a reusable and composable API/toolbox that can be used by other Kubernetes bootstrap tools, by any IT automation tool or by an advanced user for creating custom clusters.
Kubeadm executes a set of preflight checks before starting the init, with the aim to verify preconditions and avoid common cluster startup problems. The user can skip specific preflight checks or all of them with the --ignore-preflight-errors
option.
--kubernetes-version
flag) is at least one minor version higher than the kubeadm CLI version./etc/kubernetes/manifest
folder already exists and it is not empty/proc/sys/net/bridge/bridge-nf-call-iptables
file does not exist/does not contain 1/proc/sys/net/bridge/bridge-nf-call-ip6tables
does not exist/does not contain 1.conntrack
, ip
, iptables
, mount
, nsenter
commands are not present in the command pathebtables
, ethtool
, socat
, tc
, touch
, crictl
commands are not present in the command pathPlease note that:
kubeadm init phase preflight
commandKubeadm generates certificate and private key pairs for different purposes:
ca.crt
file and ca.key
private key fileca.crt
as the CA, and saved into apiserver.crt
file with its private key apiserver.key
. This certificate should contain following alternative names: 10.96.0.1
if service subnet is 10.96.0.0/12
)kubernetes.default.svc.cluster.local
if --service-dns-domain
flag value is cluster.local
, plus default DNS names kubernetes.default.svc
, kubernetes.default
, kubernetes
--apiserver-advertise-address
ca.crt
as the CA and saved into apiserver-kubelet-client.crt
file with its private key apiserver-kubelet-client.key
. This certificate should be in the system:masters
organizationsa.key
file along with its public key sa.pub
front-proxy-ca.crt
file with its key front-proxy-ca.key
front-proxy-ca.crt
as the CA and saved into front-proxy-client.crt
file with its private keyfront-proxy-client.key
Certificates are stored by default in /etc/kubernetes/pki
, but this directory is configurable using the --cert-dir
flag.
Please note that:
/etc/kubernetes/pki/ca.{crt,key}
, and then kubeadm will use those files for signing the rest of the certs. See also using custom certificates
ca.crt
file but not the ca.key
file, if all other certificates and kubeconfig files already are in place kubeadm recognize this condition and activates the ExternalCA , which also implies the csrsigner
controller in controller-manager won't be started--dry-run
mode, certificates files are written in a temporary folderkubeadm init phase certs all
commandKubeadm generates kubeconfig files with identities for control plane components:
system:nodes
organization, as required by the Node Authorization modulesystem:node:<hostname-lowercased>
/etc/kubernetes/controller-manager.conf
; inside this file is embedded a client certificate with controller-manager identity. This client cert should have the CN system:kube-controller-manager
, as defined by default RBAC core components roles
/etc/kubernetes/scheduler.conf
; inside this file is embedded a client certificate with scheduler identity. This client cert should have the CN system:kube-scheduler
, as defined by default RBAC core components roles
Additionally, a kubeconfig file for kubeadm itself and the admin is generated and saved into the /etc/kubernetes/admin.conf
file. The "admin" here is defined as the actual person(s) that is administering the cluster and wants to have full control (root) over the cluster. The embedded client certificate for admin should be in the system:masters
organization, as defined by default RBAC user facing role bindings. It should also include a CN. Kubeadm uses the kubernetes-admin
CN.
Please note that:
ca.crt
certificate is embedded in all the kubeconfig files.--dry-run
mode, kubeconfig files are written in a temporary folderkubeadm init phase kubeconfig all
commandKubeadm writes static Pod manifest files for control plane components to /etc/kubernetes/manifests
. The kubelet watches this directory for Pods to create on startup.
Static Pod manifest share a set of common properties:
kube-system
namespacetier:control-plane
and component:{component-name}
labelssystem-node-critical
priority classhostNetwork: true
is set on all static Pods to allow control plane startup before a network is configured; as a consequence: address
that the controller-manager and the scheduler use to refer the API server is 127.0.0.1
etcd-servers
address will be set to 127.0.0.1:2379
Please note that:
--dry-run
mode, static Pods files are written in a temporary folderkubeadm init phase control-plane all
commandThe static Pod manifest for the API server is affected by following parameters provided by the users:
apiserver-advertise-address
and apiserver-bind-port
to bind to; if not provided, those value defaults to the IP address of the default network interface on the machine and port 6443service-cluster-ip-range
to use for servicesetcd-servers
address and related TLS settings (etcd-cafile
, etcd-certfile
, etcd-keyfile
); if an external etcd server is not be provided, a local etcd will be used (via host network)--cloud-provider
is configured, together with the --cloud-config
path if such file exists (this is experimental, alpha and will be removed in a future version)Other API server flags that are set unconditionally are:
--insecure-port=0
to avoid insecure connections to the api server--enable-bootstrap-token-auth=true
to enable the BootstrapTokenAuthenticator
authentication module. See TLS Bootstrapping for more details--allow-privileged
to true
(required e.g. by kube proxy)--requestheader-client-ca-file
to front-proxy-ca.crt
--enable-admission-plugins
to: NamespaceLifecycle
e.g. to avoid deletion of system reserved namespacesLimitRanger
and ResourceQuota
to enforce limits on namespacesServiceAccount
to enforce service account automationPersistentVolumeLabel
attaches region or zone labels to PersistentVolumes as defined by the cloud provider (This admission controller is deprecated and will be removed in a future version. It is not deployed by kubeadm by default with v1.9 onwards when not explicitly opting into using gce
or aws
as cloud providers)DefaultStorageClass
to enforce default storage class on PersistentVolumeClaim
objectsDefaultTolerationSeconds
NodeRestriction
to limit what a kubelet can modify (e.g. only pods on this node)--kubelet-preferred-address-types
to InternalIP,ExternalIP,Hostname;
this makes kubectl logs
and other API server-kubelet communication work in environments where the hostnames of the nodes aren't resolvable--client-ca-file
to ca.crt
--tls-cert-file
to apiserver.crt
--tls-private-key-file
to apiserver.key
--kubelet-client-certificate
to apiserver-kubelet-client.crt
--kubelet-client-key
to apiserver-kubelet-client.key
--service-account-key-file
to sa.pub
--requestheader-client-ca-file
tofront-proxy-ca.crt
--proxy-client-cert-file
to front-proxy-client.crt
--proxy-client-key-file
to front-proxy-client.key
--requestheader-username-headers=X-Remote-User
--requestheader-group-headers=X-Remote-Group
--requestheader-extra-headers-prefix=X-Remote-Extra-
--requestheader-allowed-names=front-proxy-client
The static Pod manifest for the controller manager is affected by following parameters provided by the users:
--pod-network-cidr
, the subnet manager feature required for some CNI network plugins is enabled by setting: --allocate-node-cidrs=true
--cluster-cidr
and --node-cidr-mask-size
flags according to the given CIDR--cloud-provider
is specified, together with the --cloud-config
path if such configuration file exists (this is experimental, alpha and will be removed in a future version)Other flags that are set unconditionally are:
--controllers
enabling all the default controllers plus BootstrapSigner
and TokenCleaner
controllers for TLS bootstrap. See TLS Bootstrapping for more details--use-service-account-credentials
to true
--root-ca-file
to ca.crt
--cluster-signing-cert-file
to ca.crt
, if External CA mode is disabled, otherwise to ""
--cluster-signing-key-file
to ca.key
, if External CA mode is disabled, otherwise to ""
--service-account-private-key-file
to sa.key
The static Pod manifest for the scheduler is not affected by parameters provided by the users.
If the user specified an external etcd this step will be skipped, otherwise kubeadm generates a static Pod manifest file for creating a local etcd instance running in a Pod with following attributes:
localhost:2379
and use HostNetwork=true
hostPath
mount out from the dataDir
to the host's filesystemPlease note that:
k8s.gcr.io
by default. See using custom images for customizing the image repository--dry-run
mode, the etcd static Pod manifest is written in a temporary folderkubeadm init phase etcd local
commandkubeadm waits (upto 4m0s) until localhost:6443/healthz
(kube-apiserver liveness) returns ok
. However in order to detect deadlock conditions, kubeadm fails fast if localhost:10255/healthz
(kubelet liveness) or localhost:10255/healthz/syncloop
(kubelet readiness) don't return ok
within 40s and 60s respectively.
kubeadm relies on the kubelet to pull the control plane images and run them properly as static Pods. After the control plane is up, kubeadm completes the tasks described in following paragraphs.
kubeadm saves the configuration passed to kubeadm init
in a ConfigMap named kubeadm-config
under kube-system
namespace.
This will ensure that kubeadm actions executed in future (e.g kubeadm upgrade
) will be able to determine the actual/current cluster state and make new decisions based on that data.
Please note that:
kubeadm init phase upload-config
commandAs soon as the control plane is available, kubeadm executes following actions:
node-role.kubernetes.io/master=""
node-role.kubernetes.io/master:NoSchedule
Please note that:
kubeadm init phase mark-control-plane
commandKubeadm uses Authenticating with Bootstrap Tokens for joining new nodes to an existing cluster; for more details see also design proposal.
kubeadm init
ensures that everything is properly configured for this process, and this includes following steps as well as setting API server and controller flags as already described in previous paragraphs. Please note that:
kubeadm init phase bootstrap-token
command, executing all the configuration steps described in following paragraphs; alternatively, each step can be invoked individuallykubeadm init
create a first bootstrap token, either generated automatically or provided by the user with the --token
flag; as documented in bootstrap token specification, token should be saved as secrets with name bootstrap-token-<token-id>
under kube-system
namespace. Please note that:
kubeadm init
will be used to validate temporary user during TLS bootstrap process; those users will be member of system:bootstrappers:kubeadm:default-node-token
group—token-ttl
flag)kubeadm token
command, that provide as well other useful functions for token managementKubeadm ensures that users in system:bootstrappers:kubeadm:default-node-token
group are able to access the certificate signing API.
This is implemented by creating a ClusterRoleBinding named kubeadm:kubelet-bootstrap
between the group above and the default RBAC role system:node-bootstrapper
.
Kubeadm ensures that the Bootstrap Token will get its CSR request automatically approved by the csrapprover controller.
This is implemented by creating ClusterRoleBinding named kubeadm:node-autoapprove-bootstrap
between the system:bootstrappers:kubeadm:default-node-token
group and the default role system:certificates.k8s.io:certificatesigningrequests:nodeclient
.
The role system:certificates.k8s.io:certificatesigningrequests:nodeclient
should be created as well, granting POST permission to /apis/certificates.k8s.io/certificatesigningrequests/nodeclient
.
Kubeadm ensures that certificate rotation is enabled for nodes, and that new certificate request for nodes will get its CSR request automatically approved by the csrapprover controller.
This is implemented by creating ClusterRoleBinding named kubeadm:node-autoapprove-certificate-rotation
between the system:nodes
group and the default role system:certificates.k8s.io:certificatesigningrequests:selfnodeclient
.
This phase creates the cluster-info
ConfigMap in the kube-public
namespace.
Additionally it creates a Role and a RoleBinding granting access to the ConfigMap for unauthenticated users (i.e. users in RBAC group system:unauthenticated
).
Please note that:
cluster-info
ConfigMap is not rate-limited. This may or may not be a problem if you expose your cluster's API server to the internet; worst-case scenario here is a DoS attack where an attacker uses all the in-flight requests the kube-apiserver can handle to serving the cluster-info
ConfigMap.Kubeadm installs the internal DNS server and the kube-proxy addon components via the API server. Please note that:
kubeadm init phase addon all
command.A ServiceAccount for kube-proxy
is created in the kube-system
namespace; then kube-proxy is deployed as a DaemonSet:
ca.crt
and token
) to the control plane come from the ServiceAccountkube-proxy
ServiceAccount is bound to the privileges in the system:node-proxier
ClusterRolekube-dns
. This is done to prevent any interruption in service when the user is switching the cluster DNS from kube-dns to CoreDNS the --config
method described here.kube-system
namespace.coredns
ServiceAccount is bound to the privileges in the system:coredns
ClusterRoleIn Kubernetes version 1.21, support for using kube-dns
with kubeadm was removed. You can use CoreDNS with kubeadm even when the related Service is named kube-dns
.
Similarly to kubeadm init
, also kubeadm join
internal workflow consists of a sequence of atomic work tasks to perform.
This is split into discovery (having the Node trust the Kubernetes Master) and TLS bootstrap (having the Kubernetes Master trust the Node).
see Authenticating with Bootstrap Tokens or the corresponding design proposal.
kubeadm
executes a set of preflight checks before starting the join, with the aim to verify preconditions and avoid common cluster startup problems.
Please note that:
kubeadm join
preflight checks are basically a subset kubeadm init
preflight checks--ignore-preflight-errors
option.There are 2 main schemes for discovery. The first is to use a shared token along with the IP address of the API server. The second is to provide a file (that is a subset of the standard kubeconfig file).
If kubeadm join
is invoked with --discovery-token
, token discovery is used; in this case the node basically retrieves the cluster CA certificates from the cluster-info
ConfigMap in the kube-public
namespace.
In order to prevent "man in the middle" attacks, several steps are taken:
kubeadm init
granted access to cluster-info
users for system:unauthenticated
)--discovery-token-ca-cert-hash
. This value is available in the output of kubeadm init
or can be calculated using standard tools (the hash is calculated over the bytes of the Subject Public Key Info (SPKI) object as in RFC7469). The --discovery-token-ca-cert-hash flag
may be repeated multiple times to allow more than one public key.Please note that:
--discovery-token-unsafe-skip-ca-verification
flag; This weakens the kubeadm security model since others can potentially impersonate the Kubernetes Master.If kubeadm join
is invoked with --discovery-file
, file discovery is used; this file can be a local file or downloaded via an HTTPS URL; in case of HTTPS, the host installed CA bundle is used to verify the connection.
With file discovery, the cluster CA certificates is provided into the file itself; in fact, the discovery file is a kubeconfig file with only server
and certificate-authority-data
attributes set, as described in kubeadm join
reference doc; when the connection with the cluster is established, kubeadm try to access the cluster-info
ConfigMap, and if available, uses it.
Once the cluster info are known, the file bootstrap-kubelet.conf
is written, thus allowing kubelet to do TLS Bootstrapping.
The TLS bootstrap mechanism uses the shared token to temporarily authenticate with the Kubernetes API server to submit a certificate signing request (CSR) for a locally created key pair.
The request is then automatically approved and the operation completes saving ca.crt
file and kubelet.conf
file to be used by kubelet for joining the cluster, whilebootstrap-kubelet.conf
is deleted.
Please note that:
kubeadm init
process (or with additional tokens created with kubeadm token
)system:bootstrappers:kubeadm:default-node-token
group which was granted access to CSR api during the kubeadm init
processkubeadm init
process
© 2022 The Kubernetes Authors
Documentation Distributed under CC BY 4.0.
https://kubernetes.io/docs/reference/setup-tools/kubeadm/implementation-details/