Good security procedures for Kubernetes operators - Course Monster Blog

What are Kubernetes Operators?

A Kubernetes operator is a way to package, deploy, and manage a Kubernetes application. A Kubernetes application is deployed on Kubernetes and managed using the Kubernetes API. Operators automate the management of applications or service life cycles on behalf of a human operator, allowing for automation at every level of the stack—from managing the platform’s components to managed services.

For faster installation and more frequent, robust upgrades, engineering teams may use the power of Operators, which provide autonomous administration by exposing settings directly through Kubernetes objects. In addition to the platform management automation benefits for Operators, Red Hat OpenShift makes it easy to identify, deploy, and manage Operators operating on clusters.

Using Operators in Red Hat OpenShift

OpenShift is an enterprise Kubernetes platform for managing hybrid cloud installations that include full-stack automated operations.

The integrated OperatorHub, a register of operators from various software companies and open source projects, is included in Red Hat OpenShift. The OperatorHub allows you to browse and install a library of operators that have been confirmed to operate with Red Hat OpenShift and packaged for simpler lifecycle management.

Depending on the objective, OpenShift provides two distinct mechanisms for managing operators:

Platform Operators, They are handled by the Cluster Version Operator (CVO) and are installed by default to conducting cluster functions
Add-on Operators, which are controlled by Operator Lifecycle Manager (OLM) can be made available for usage in user applications.

Furthermore, appropriately privileged individuals can manage operators using other ways, such as YAML files or helm charts.

Why is security important to Operators?

Operators are being used by more engineering teams to deploy in production settings. While the full potential of Operators has yet to be realized, it is critical not to lose sight of the benefits of incorporating more secure code and procedures as early in the development process as feasible.

Good security practices for Operators

Minimize cluster-scope and namespace-scope permissions

There are two types of classification for Operators:

Namespace-scoped – The operator monitors and maintains resources inside a namespace and requires rights to do so.
- There are several subtypes of this:
  - in a single, prenamed namespace chosen by the developer
  - in a single namespace specified upon installation
  - in many namespaces
Cluster-scoped – monitors and maintains resources throughout or across all namespaces inside a cluster, requiring cluster-scoped rights to do so.

In general, you should restrict access as much as feasible while still enabling your Operator to work, according to the Principle of Least Privilege (PoLP). Permissions can be provided by constructing role bindings and cluster role bindings that connect the Operator’s service account to the necessary roles and cluster roles. This is possible using the Operator Lifecycle Manager (OLM) bundle deployment architecture.

An Operator bundle, in addition to the Operator image itself, is an OLM-specified format for storing metadata about an Operator. The metadata contains everything Kubernetes needs to know to utilize the Operator — its custom resource definitions (CRDs), needed role-based access control (RBAC) roles and bindings, dependency tree, and other information, as described in Deploying Operators with OLM bundles.

OLM has the advantage of managing the rights required to install and execute the Operator. OLM installs using the cluster-admin role and isolates the install time needs, such as the APIService and CustomResourceDefinition resources, which are always produced by OLM using the cluster-admin role, lowering the overall rights surface.

Cluster administrators can define a service account for an Operator group using OLM, ensuring that any Operators connected with the group are deployed and operate with the rights allowed to the service account. A service account belonging to an Operator group should never be given the ability to write these resources. Any Operator who is a member of this Operator group is now limited to the rights provided to the specified service account. If the Operator requests rights that are not within the scope of the service account, the installation will fail with the appropriate errors.

Reduce the usage of cluster-scope permissions

It is necessary to justify the employment of cluster-scoped Operators. If not essential, run namespaced-scoped Operators with the bare minimum of permissions.

Cluster-scoped Operators require access to resources throughout the cluster, including the control plane, via permissions gained through cluster roles and cluster role bindings.

Namespace-scoped Operators require access to just resources in a single namespace, which may be acquired through the use of roles and role bindings. A namespace-scoped operator must construct a CustomResourceDefinition (CRD), which is a cluster-scoped resource, as an exception.

If there are static cluster-scoped resources whose definition will not change based on the Operator inputs, you can shift their creation to the Operator Lifecycle Manager (OLM) catalog. For example, because CRD generation does not change over the Operator’s lifespan, you may migrate it to OLM.

RBAC permissions

The Kubernetes and OpenShift systems both provide authorization via role-based access control (RBAC). The security context is a crucial component of Kubernetes pod and container specifications. This is distinct from the OpenShift security feature known as security context limitation (SCC).

Kubernetes Operators additionally establish the Operator’s permissions, which are often defined in a YAML description called role.yaml. Because roles are issued at the namespace level, any escalation of power is intrinsically constrained by the namespace. ClusterRole, on the other hand, should be scrutinized more closely because it applies to the whole cluster.

One method privilege that might be raised is if a non-privileged user (with the system: authenticated role) gains access to the operator’s service account token. A common method for mitigating this risk is to deploy the Operator in a separate namespace from its Operands, where the non-privileged user does not have access to read secrets, or if deployed in a namespace shared with non-privileged users, those users should not have access to read secrets in that namespace. Operators should never be installed in a common namespace, especially one that enables non-privileged users access.

We propose that code evaluations include a search for RBAC roles that can be used to get additional rights.

The Bind verb, which may be used for Roles or ClusterRoles, allows a principal to circumvent a general limitation on (cluster)role binding formation, which prevents users who can make role bindings from increasing their rights by binding to high privilege roles such as cluster admin. This restriction is stated in the Kubernetes documentation: Role binding creation or update restrictions.
Escalate rights on cluster roles: Escalate bypasses the Kubernetes RBAC check, which stops users who can create roles or cluster roles from giving these objects greater rights than they have.
Multiple Roles should be defined to limit the scope of any activities required for containers that the Operator may execute on the cluster. For example, if your component produces a TLS Secret on startup, a Role that permits Create but not Lists on Secrets is safer than a single all-powerful Service Account.
Cluster-admin may update/alter SCCs if you provide it permission.
If you have access to a certain SCC and also have a “create a pod,” you may build a new pod to take advantage of everything the SCC has to offer.
You can change your limitations if you have RBAC that allows RBAC modification.
- For example

resources:
- roles
- rolebindings
verbs:
- patch
- create

Allows the operator to possibly grant unprivileged users access to privileged namespaces by ‘granting’ them the roles when asked. (Note: By default, Operators can only grant rights to others that they have.)

Avoid wildcards

Instead of utilizing the wildcard character as seen in the figure below, it is best to specifically list each verb or resource. Each item in the list may then be checked more readily to ensure where and how the permissions are required, or if they were taken inadvertently as a convenience during previous work.

In the verbs section, for example, instead of using “*,” you may list them out in full, such as: get, list, and watch. If the operator knows the name of the resource to be edited, it may be restricted to merely get/edit and seldom needs a “list.”

Being specific with lists will help protect the rights in the future if the “*” changes to match additional items that are not now present.

Using RBAC to define and apply permissions

The following diagram depicts the links between cluster roles, roles, cluster role bindings, role bindings, user, group, and service accounts.

Descoped Operator

Because operators are operated using a service account in a namespace, anybody with the capacity to generate workloads in that namespace can escalate to the operator’s permissions. To address these problems, the OperatorGroup object introduced the concept of scoping operators. An OperatorGroup specifies a collection of namespaces inside a cluster in which all deployed operators have the same scope. To avoid collisions, the Operator Lifecycle Manager (OLM) guarantees that only one operator inside a namespace owns a certain CRD.

The issue is that cluster-scoped APIs exist in a cluster. They are discoverable by any user who desires to see them. Even Operators who agree on a specific Group, Version, Kind (GVK) may disagree on how those objects should be accepted to a cluster or how conversion between API versions should take place. This raises the possibility that the cluster has more than one “opinion” regarding an API.

Pod and container security Context and Security Context Constraints (SCCs)

When attempting to containerize third-party programs, it may be essential to bend to their expectations and run as specified UIDs, maybe even as root. For container-native operators, you should never set UID expectations and instead accept the standard “billion+” high UID that the OpenShift cluster allocates to the namespace your operator runs in.

Set a numeric USER in the Container file to prevent defaulting to, or assuming, uid=0 as the anticipated user.
To control shared file rights, utilize group id permissions rather than user id.

Similarly, using hostPath volumes allows the container to access files on the host node. If a container is not set up properly and becomes hacked, the attacker may attempt to attack the host and other containers operating on the host.

For hostPath specifically:

Host pathways should never be required by operators unless they are part of the control plane itself.

Other deployment recommendations:

readOnlyRootFilesystem — set it to TRUE
- Write no local files to the root filesystem. Instead, use /tmp or an emptyDir. Pay attention to PID files and any log output that does not travel to STDOUT.
runAsNonRoot — set to TRUE
- This may be specified in the security context of either the podSpec or the containerSpec. By enabling this, the container will refuse to start if other factors suggest that it should run with uid=0.
automount service account token — set to FALSE
- The service account token is mounted as a file within the container by default. In most cases, operators will require access to a SA to function (hence, set this TRUE). However, any pods created by the operator may benefit from the increased security of setting it false.

Security context constraints (SCC) in OpenShift are gatekeepers that limit which pods may be admitted to the cluster. Because the Operator process operates as a pod in a cluster, you can apply the same ideas to improve the security posture of your Operator container.

The Udica tool was designed to make it easier to develop custom SELinux rules that can subsequently be linked to custom SCCs.

Continuous security scans

Continuous scanning aids in the detection of vulnerabilities and the acquisition of the most recent security bug patches in Go, Kubernetes, and the Operator container’s base image.

You may use an Operator to list the vulnerabilities of container images that are running in OpenShift that are retrieved from Red Hat Quay registries. OpenShift may be extended with the Container Security Operator to offer vulnerability reporting for images uploaded to certain namespaces.

The Clair security scanner does container image scanning for Red Hat Quay. Clair may look for and report vulnerabilities in images generated using RHEL, CentOS, Oracle, Alpine, Debian, and Ubuntu operating system software in Red Hat Quay.

Deployment Location

Where should an Operator flee to? We would recommend that the Operator operates in an acceptable area depending on what it is. Tolerations can be used to schedule an Operator that is part of the control plane to execute on control plane nodes.

Even though Operators and Operands are separated by namespaces, if the Operator uses a highly privileged service account to do its Kube API interaction, any breach of that worker node may leak the Service Account credentials. Workload separation per node and namespace is hence advantageous.

Here at CourseMonster, we know how hard it may be to find the right time and funds for training. We provide effective training programs that enable you to select the training option that best meets the demands of your company.

For more information, please get in touch with one of our course advisers today or contact us at training@coursemonster.com