How to Secure an OpenShift multi cluster infrastructure with RHACM

It’s obvious that the more OpenShift clusters you have, the harder it’s going to be to secure them. Without a dedicated platform to manage these clusters, you’ll need to provide special attention to each one of your clusters in order to validate them, and make sure that they did not drift from the organization’s security guidelines and regulations.

Managing OpenShift clusters separately without any management tool is just wrong and could cause breaches easily — both in a management and security perspective. Let’s take the next scenario to understand why.

Let’s say ‘company A’ decided to deploy multiple OpenShift clusters in different sites to mitigate the latency for its customers. All of the clusters run the same application, and are managed by the organization’s administration team.

What are the risks of this setup?

Cluster versioning — In order to keep the clusters aligned in a single version, the administrator will have to go through all of the clusters and manage their versions independently. Doing so will cause a long and difficult procedure that may result in version drifts between the clusters, therefore, making the environment not aligned and open to vulnerabilities.
Application versioning — From time to time a vulnerability can come up in an application’s image. It’s important to patch the application equally in all of OpenShift instances in order to avoid breaches caused by the vulnerability. Without a proper tool, it is much harder to control security patches to applications in a large multi-cluster environment.
RBAC — In a production environment it is crucial to create users and assign them the exact privileges they need to perform the task they were meant to do. Creating over-privileged users can create devastating effects on the environment on the long run. Furthermore, some RBAC policies in some clusters may change with time, while the same RBAC rules in other clusters might not be touched. This would create a drift in RBAC policies between the clusters which is equivalent to a nightmare for both management and security personal in the environment.
Visibility — When we are talking about large scale environments, one of the most important things we need to consider is visibility. Visibility shows us whether a cluster or application is misbehaving, or whether an anomaly or failure has occurred in a cluster. Managing a large scale multi-cluster OpenShift environment without a proper visibility tool is the same as putting a blindfold on your administrator’s eyes which puts your environment in a huge risk.
SCC — Just like in the previous bullets, you need to align your SCC’s between your clusters in order to control the workflow on the clusters correctly. SCCs (Security Context Constraints) are a major part of OpenShift’s security, they restrict the pods to a certain set of capabilities, specific UIDs, SELinux contexts and more. If a certain pod in one cluster drifts from the defined SCC, it could result in a huge breach in the environment.

All of these points focus on one idea — alignment. It has been proven that an unaligned environment is an unsecure environment. If the environment is unaligned, it means that there are holes in certain parts of it. Even if the hall is small, an attacker will be more than happy to use it to gain access to your organization.

RHACM — Red Hat Advanced Cluster Management

So far, we have discussed the security issues that come with a large multi-cluster environment. This section of the article will discuss the solution — RHACM.

As discussed in my previous article (link attached below paragraph), RHACM is a tool that sheds light over multi-cluster OpenShift deployments. It provides advanced management and visibility capabilities over groups of clusters and the applications deployed on top of them.

How to Manage Multiple OpenShift Clusters with RHACM — Hybrid Cloud

Hybrid cloud is an IT architecture that incorporates some degree of workload portability, orchestration, and management…

Alongside the great management capabilities RHACM provides, it allows the administrators in the organization to maintain and promote security capabilities in the environment using RHACM’s ‘Governance Risk and Compliance’ feature.

Governance Risk and Compliance

In order to regulate the large multi-cluster environment, RHACM brings an important tool to the table — Governance Risk and Compliance (GRC). GRC allows RHACM to monitor certain resources on RHACM’s managed clusters, and based on the resource status, it decides whether the clusters fits the security standards declared by security team — or not. Eventually, if a managed cluster violates a security standard, RHACM will be informed, and an alert will be sent to RHACM’s dashboard.

Now that we’ve gone through the basics of GRC, lets understand how RHACM GRC works under the hood —

‍

GRC is defined by sets of policies that conduct a certain rule set for K8S resources to follow. A policy is created on the HUB cluster. The policy declares a certain resource which is going to be monitored on a managed cluster. For example, I can create a policy that monitors a ClusterRole K8S resource, GRC will monitor the defined resource and its contents (In the case of a ClusterRole resource, GRC will monitor that the specific rules, apigroups and verbs are present).
The policy is propagated into the managed cluster from the hub using the PlacementRule and PlacementBinding RHACM CRs (custom resources).
Policy Controllers are deployed on the managed cluster in order to monitor and regulate the propagated policy.
The Policy Controller monitors the resource defined by the GRC Policy. If there is a violation of the policy, the Policy Controller can act in one of two ways — If the Controller is set to inform, a report of the violation is created, but if the Controller is set to enforce, the controller will try to remediate the violation by fixing it with the desired object state.
The status of the Policy is synced into the RHACM hub. The hub will aggregate the status of the Policy Controllers for all managed clusters.
RHACM will display the status of the policies in the GRC dashboard. Any violation will be colored red, and will have the exact description of the resource that violated the Policy alongside the name of the cluster that drifted from the desired state.

Governance Risk and Compliance in Practice — DEMO

Now that we know how Policies work in RHACM and we have gone through the basics, let’s take a look around how can we integrate the GRC Policies in such a way, that will benefit an environment’s security.

In the next demo, we are going to take a preexisting environment, and we are going to further secure it using the RHACM GRC mechanism. The idea of the demo is to show how can we secure both the Platform side of OpenShift, and the Application that runs on top of the cluster. The environment will contain -

1 HUB Cluster — In the demo, we are going to use a cluster that will act as an RHACM HUB. The HUB will be used to import and manage other OpenShift clusters.

1 Managed Cluster — For this demo I’m going to manage only one cluster. Please note that even though I’ll conduct this demo on one cluster, it’s important to know that this demo could have been performed on a large scale multi-cluster OpenShift environment.

1 Application — In the demo, we are going to use one application that is going to run on top of the managed cluster. The application will have one DeploymentConfig resource, and one Secret resource.

PART 1 — OpenShift Platform Security with GRC

In the first part of the demo, we will focus on the security of the OpenShift platform itself. We are going to create a Role policy and a RoleBinding policy in order to maintain an RBAC strategy in our clusters.

In order to demonstrate RBAC policies, I’m going to create a user, named user1. We will allow user1 to only perform rollouts to the DeploymentConfig resource in our application that runs on the managed cluster demo-openshift-cluster.

Let’s go through the resources I’ll need to create on the HUB cluster in order to make this happen.

Namespace — In order to create the RHACM resources we will need to create a separate namespace for them to be in.

Policy x2 — In order to enforce the a role on a user, we will need to create 2 policies — a policy for a Role resource, and another one for a RoleBinding resource.

Note that the Policy is in an enforce state. This means that if the Policy Controller does not find the defined Role resource, it is going to create it by itself. The Role’s rules define basic operations that allow a user to rollout a deploymentconfig, and view pods in the mariadb namespace on the managed cluster.

The RoleBinding Policy binds between the dc-rollout-policy (which is mentioned above) and user1. Note that the Policy Controller is configured to enforce the Policy if its not in the mentioned above state. The rolebinding will be effective in the mariadb namespace on the managed cluster.

PlacementRule — A PlacementRule is created in order to aggregate specific clusters with the same label. It will later be used to specify which cluster is going to be affected by which policy.

A PlacementRule that effects all cluster that are tagged with the “dev” label.

‍

PlacementBinding x2 — A PlacementBinding is created for each Policy in order to assign it to a specific PlacementRule. Practically, It performs as the glue between the Policies and the PlacementRules.

A PlacementBinding resource that binds between the policy-role-mariadb-rollout Policy and the dev-clusters PlacementRule.

All of the resources mentioned above need to be applied in order for GRC to be able to monitor the RBAC Policies.

The files can be found at the next GitLab repository —

‍Testing!

Let’s preform some tests on the managed cluster —

If I were to login to the cluster using user1 before applying the Policies, I would see no projects, and no resources —

Now, let’s check the RHACM dashboard after applying the resources mentioned above! From the main dashboard navigate to Governance Risk & Compliance —

We can see that there are no violations on the created Policies

Now that we’ve seen that the policies are present, let’s take a look again at the resources that user1 has access to —

‍

We can see that the user can perform the defined actions, but it is restriced to the defined role. For example, if I’d like to delete a pod, I will get the next result —

‍

Now that we know that the user is restricted to the defined role, let’s check the Policy Controller’s enforcement mechanism, and try to modify the GRC defined role with a local cluster administrator (local in the managed cluster scope) —

‍

Even though I edited the policy using a local administrator user, the Policy Controller monitors the Role resource, and as soon as it noticed a change in the Role, in modified it back to it’s previous state —

This behavior makes GRC extra powerful since it even overrides local administrator operations against Policy defined K8S objects.

Let’s see how RHACM reacts if I change the policy-role-mariadb-rollout Policy setting from ‘enforce’ to ‘inform’.

As you can see remidiationAction has changed from ‘enforce’ to ‘inform’

‍

After editing the policy yaml file, I’ll apply the changes in the HUB cluster —

‍

Just like in the above points, let’s try to edit the RHACM created role using the local administrator user in the managed cluster, I will try to add the delete verb to the pods resource, thereby, increasing user1’s privileges —

‍

This time, unlike the previous attempt, I can see that the privilege has been added to the RHACM defined role —

‍

Let’s take a look the the RHACM GRC dashboard —

We can see that a violation has been triggered in the Policy which is responsible for the role that has been modified in the previous step. A security personnel will now be able to check the violation out, and take the necessary steps to remediate it.

Different organization might implement Policies differently, but as can be seen from both examples, both enforce and inform bring added value to an administrator that wants to align RBAC in an environment.

PART 2 — OpenShift Application Security with GRC

So far we have talked about how can we secure the OpenShift platform on the managed clusters. In this section I’ll go through a demo that enables security on the application level. The demo will show how can we make applications that run on clusters that are managed by RHACM more secure.

The demo will show how can we scan the images of the deployed applications and send the security report back to RHACM for evaluation. If there are any security vulnerabilities in the image, RHACM will trigger a violation and let the security administrator know that an application causes a security risk in the environment.

The security scans that RHACM provides are done by Clair, an image scanning tool that integrates into Red Hat Quay image registry. In our case, when the policy is deployed on the managed cluster, the container-security-operator will be deployed on it. The container-security-operator will integrate with quay.io for security reports about the container image.

A rough diagram of the application scanning process

The resources that I’ll need to create in this case are a Policy, and a PlacementBinding (Since I’ve created the Namespace and PlacementRule resources in the previous part, I won’t be creating them again) —

Policy — This time, the Policy will be monitoring an object with the ImageManifestVuln type. If the object exists, it means that a certain container image contains vulnerabilities in it. As soon as vulnerabilities pop up, violations will be triggered in the RHACM GRC dashboard.

The Policy will be enforced onto the managed cluster, and will monitor all of the namespaces that do not start with the **openshift** keyword.

‍

‍PlacementBinding — Just as the previous part, the PlacementBinding resource will perform as the “glue” between the Policy and the PlacementRule that defines which clusters are going to be affected by the Policy. Just as in the previous part, I’m going to use the demo-openshift-cluster Cluster and the dev-clusters PlacementRule.

‍

The resources mentioned above need to be applied in order for GRC to be able to monitor the RBAC Policies.

‍

Testing!

As soon as the resources are deployed, the managed cluster starts to work. It installs the container-security-operator and goes through the projects to monitor their images. In this blog we will focus on scanning the mariadb application that I have pre-configured.

After deploying the resources, if I navigate from the main ACM dashboard to Governance Risk & Compliance, I’ll immediately see that there is a violation for the policy-imagemanifestvuln policy.

‍

If I click on demo-openshift-cluster under Decisions, I’ll be able to see that the namespace mariadb is violating the image scanning policy.

‍

Let’s log into the Managed Cluster OpenShift web console. Immediately after logging in as an admin user, we will notice the small square in the dashboard which says Quay Image Security.

‍

If you click on Quay Image Security, it will mention the namespaces that Clair has found vulnerabilities in. In our case, one of them will be the mariadb namespace. Let’s navigate to the mariadb reference.

‍

We can see the reference to the Image Manifest Vulnerabilities that RHACM monitors. The page shows us the image name, the namespace, and number of fixable vulnerabilities. Lets click on the Manifest link to find more information —

‍

The link will redirect you to Clair’s full scan report on quay.io. Here you will be able to see the CVEs behind the vulnerabilities, and whether there is a fix for them.

It’s important to remember that we manage just one cluster in these demos. Just imagine the effect of image scanning in a multi-cluster environment — using RHACM to monitor images in a fleet of OpenShift clusters. It basically means that we are taking the DevSecOps methodology, and we’re powering it up to work in a multi-cluster & multi-site formation — A total game changer in my opinion for security at scale.

Conclusion

I had a really fun time exploring the capabilities of RHACM, and how can we translate them into a security perspective. RHACM is one of the first tools that really takes the multi-cluster management idea into practice.

In my opinion using RHACM will probably be a must soon in most organizations, more and more environments scale into multi OpenShift and K8S clusters. Without proper management tools it’s going to be very hard for these organizations to suffice.

A major consideration that comes with a large amount of clusters is alignment. When we do not take alignment into consideration, we end up with a messy environment which is the perfect treat for an attacker that wants to hurt your organization.

RHACM has much more to offer than discussed in this article. And if you haven’t already, make sure to check it out. RHACM is, and will be a huge game changer both in the world of management and security. It is customizable, easy to navigate, and provides a lot of added value from day 1.

Thanks for reading the article, feel free to add comments and ask questions, I’m always available!

I’ll see you next time!

‍

Michael Kotelnikov

Cloud and Devops enthusiast.

Experienced in designing and implementing large scale cloud infrastructures and Devops solutions.

Providing solutions regarding architecture, network design, monitoring, middleware and security.

Specialized in Linux server management alongside cloud native technologies.