Kubernetes ships with a default scheduler that is described here. If the default scheduler does not suit your needs you can implement your own scheduler. Moreover, you can even run multiple schedulers simultaneously alongside the default scheduler and instruct Kubernetes what scheduler to use for each of your pods. Let's learn how to run multiple schedulers in Kubernetes with an example.
A detailed description of how to implement a scheduler is outside the scope of this document. Please refer to the kube-scheduler implementation in pkg/scheduler in the Kubernetes source directory for a canonical example.
You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:
To check the version, enterkubectl version
. Package your scheduler binary into a container image. For the purposes of this example, you can use the default scheduler (kube-scheduler) as your second scheduler. Clone the Kubernetes source code from GitHub and build the source.
git clone https://github.com/kubernetes/kubernetes.git
cd kubernetes
make
Create a container image containing the kube-scheduler binary. Here is the Dockerfile
to build the image:
FROM busybox
ADD ./_output/local/bin/linux/amd64/kube-scheduler /usr/local/bin/kube-scheduler
Save the file as Dockerfile
, build the image and push it to a registry. This example pushes the image to Google Container Registry (GCR). For more details, please read the GCR documentation.
docker build -t gcr.io/my-gcp-project/my-kube-scheduler:1.0 .
gcloud docker -- push gcr.io/my-gcp-project/my-kube-scheduler:1.0
Now that you have your scheduler in a container image, create a pod configuration for it and run it in your Kubernetes cluster. But instead of creating a pod directly in the cluster, you can use a Deployment for this example. A Deployment manages a Replica Set which in turn manages the pods, thereby making the scheduler resilient to failures. Here is the deployment config. Save it as my-scheduler.yaml
:
admin/sched/my-scheduler.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-scheduler
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: my-scheduler-as-kube-scheduler
subjects:
- kind: ServiceAccount
name: my-scheduler
namespace: kube-system
roleRef:
kind: ClusterRole
name: system:kube-scheduler
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: my-scheduler-as-volume-scheduler
subjects:
- kind: ServiceAccount
name: my-scheduler
namespace: kube-system
roleRef:
kind: ClusterRole
name: system:volume-scheduler
apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ConfigMap
metadata:
name: my-scheduler-config
namespace: kube-system
data:
my-scheduler-config.yaml: |
apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: KubeSchedulerConfiguration
profiles:
- schedulerName: my-scheduler
leaderElection:
leaderElect: false
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
component: scheduler
tier: control-plane
name: my-scheduler
namespace: kube-system
spec:
selector:
matchLabels:
component: scheduler
tier: control-plane
replicas: 1
template:
metadata:
labels:
component: scheduler
tier: control-plane
version: second
spec:
serviceAccountName: my-scheduler
containers:
- command:
- /usr/local/bin/kube-scheduler
- --config=/etc/kubernetes/my-scheduler/my-scheduler-config.yaml
image: gcr.io/my-gcp-project/my-kube-scheduler:1.0
livenessProbe:
httpGet:
path: /healthz
port: 10251
initialDelaySeconds: 15
name: kube-second-scheduler
readinessProbe:
httpGet:
path: /healthz
port: 10251
resources:
requests:
cpu: '0.1'
securityContext:
privileged: false
volumeMounts:
- name: config-volume
mountPath: /etc/kubernetes/my-scheduler
hostNetwork: false
hostPID: false
volumes:
- name: config-volume
configMap:
name: my-scheduler-config
In the above manifest, you use a KubeSchedulerConfiguration to customize the behavior of your scheduler implementation. This configuration has been passed to the kube-scheduler
during initialization with the --config
option. The my-scheduler-config
ConfigMap stores the configuration file. The Pod of themy-scheduler
Deployment mounts the my-scheduler-config
ConfigMap as a volume.
In the aforementioned Scheduler Configuration, your scheduler implementation is represented via a KubeSchedulerProfile.
spec.schedulerName
field in a PodTemplate or Pod manifest must match the schedulerName
field of the KubeSchedulerProfile
. All schedulers running in the cluster must have unique names. Also, note that you create a dedicated service account my-scheduler
and bind the ClusterRole system:kube-scheduler
to it so that it can acquire the same privileges as kube-scheduler
.
Please see the kube-scheduler documentation for detailed description of other command line arguments and Scheduler Configuration reference for detailed description of other customizable kube-scheduler
configurations.
In order to run your scheduler in a Kubernetes cluster, create the deployment specified in the config above in a Kubernetes cluster:
kubectl create -f my-scheduler.yaml
Verify that the scheduler pod is running:
kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE
....
my-scheduler-lnf4s-4744f 1/1 Running 0 2m
...
You should see a "Running" my-scheduler pod, in addition to the default kube-scheduler pod in this list.
To run multiple-scheduler with leader election enabled, you must do the following:
Update the following fields for the KubeSchedulerConfiguration in the my-scheduler-config
ConfigMap in your YAML file:
leaderElection.leaderElect
to true
leaderElection.resourceNamespace
to <lock-object-namespace>
leaderElection.resourceName
to <lock-object-name>
kube-system
namespace. If RBAC is enabled on your cluster, you must update the system:kube-scheduler
cluster role. Add your scheduler name to the resourceNames of the rule applied for endpoints
and leases
resources, as in the following example:
kubectl edit clusterrole system:kube-scheduler
admin/sched/clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: system:kube-scheduler
rules:
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs:
- create
- apiGroups:
- coordination.k8s.io
resourceNames:
- kube-scheduler
- my-scheduler
resources:
- leases
verbs:
- get
- update
- apiGroups:
- ""
resourceNames:
- kube-scheduler
- my-scheduler
resources:
- endpoints
verbs:
- delete
- get
- patch
- update
Now that your second scheduler is running, create some pods, and direct them to be scheduled by either the default scheduler or the one you deployed. In order to schedule a given pod using a specific scheduler, specify the name of the scheduler in that pod spec. Let's look at three examples.
Pod spec without any scheduler name
admin/sched/pod1.yaml
apiVersion: v1
kind: Pod
metadata:
name: no-annotation
labels:
name: multischeduler-example
spec:
containers:
- name: pod-with-no-annotation-container
image: k8s.gcr.io/pause:2.0
When no scheduler name is supplied, the pod is automatically scheduled using the default-scheduler.
Save this file as pod1.yaml
and submit it to the Kubernetes cluster.
kubectl create -f pod1.yaml
Pod spec with default-scheduler
admin/sched/pod2.yaml
apiVersion: v1
kind: Pod
metadata:
name: annotation-default-scheduler
labels:
name: multischeduler-example
spec:
schedulerName: default-scheduler
containers:
- name: pod-with-default-annotation-container
image: k8s.gcr.io/pause:2.0
A scheduler is specified by supplying the scheduler name as a value to spec.schedulerName
. In this case, we supply the name of the default scheduler which is default-scheduler
.
Save this file as pod2.yaml
and submit it to the Kubernetes cluster.
kubectl create -f pod2.yaml
Pod spec with my-scheduler
admin/sched/pod3.yaml
apiVersion: v1
kind: Pod
metadata:
name: annotation-second-scheduler
labels:
name: multischeduler-example
spec:
schedulerName: my-scheduler
containers:
- name: pod-with-second-annotation-container
image: k8s.gcr.io/pause:2.0
In this case, we specify that this pod should be scheduled using the scheduler that we deployed - my-scheduler
. Note that the value of spec.schedulerName
should match the name supplied for the scheduler in the schedulerName
field of the mapping KubeSchedulerProfile
.
Save this file as pod3.yaml
and submit it to the Kubernetes cluster.
kubectl create -f pod3.yaml
Verify that all three pods are running.
kubectl get pods
In order to make it easier to work through these examples, we did not verify that the pods were actually scheduled using the desired schedulers. We can verify that by changing the order of pod and deployment config submissions above. If we submit all the pod configs to a Kubernetes cluster before submitting the scheduler deployment config, we see that the pod annotation-second-scheduler
remains in "Pending" state forever while the other two pods get scheduled. Once we submit the scheduler deployment config and our new scheduler starts running, the annotation-second-scheduler
pod gets scheduled as well.
Alternatively, you can look at the "Scheduled" entries in the event logs to verify that the pods were scheduled by the desired schedulers.
kubectl get events
You can also use a custom scheduler configuration or a custom container image for the cluster's main scheduler by modifying its static pod manifest on the relevant control plane nodes.
© 2022 The Kubernetes Authors
Documentation Distributed under CC BY 4.0.
https://kubernetes.io/docs/tasks/extend-kubernetes/configure-multiple-schedulers/