Governance in the Cloud World

‍

‍

More and more customers are migrating large parts of the existing IT infrastructure to the Cloud. But as a result, we are seeing more and more challenges becoming visible.

For example, misconfigurations.

Misconfigurations are the number #1 issue with present Cloud implementations and are usually followed by significant Data Breaches, which you can read frequently about in the News.

But is this really necessary?

Before we directly jump into the recommended approach for Cloud Governance, we will dive into the different areas of Governance to get a common understanding of what Governance is:

Take some time now to think about the areas and try to find answers to the following questions:

Did you already consider all of the mentioned areas?
Can you find good reasons why you would like to implement each of them?
Can you name some good examples for each area?
Do you already know how you want to address these examples?

But - Automation as a Governance area?

I have added Automation on purpose as a Governance area. When you are having a look at Cloud-Native implementations and increasing Cloud Maturity, you will always be hearing about topics like DevOps, Infrastructure as Code, Release Pipelines, Immutable Infrastructure, and Automation. The area "Automation" includes all of these and is, therefore, a Governance area that you should try to reach and to enforce. There is almost no added value to treat the Cloud as a Datacenter extension and to focus only on IaaS. You will need to increase the maturity over time to be able to benefit from the Cloud values, and that is why Automation should be part of your Governance strategy, as you try to establish it.

In the next table, you can find the typical motivations, as well as the primary focus scope for each area.

Typical mistakes I see is that some of those areas are entirely or partially missed out. The reason for that is very often a lack of role diversity in the leading / Architecture teams.

A good recommendation is to do cross-functional ideation sessions to identify all tasks and requirements for all compliance areas. It is essential to have a good understanding and a broad overview of each of those areas, but also to understand the opportunities that could be leveraged by e.g., increasing the level of Automation. After that, you will quickly recognize that the complexity of the whole field is relatively high, and there are way too many topics that need to be addressed.

But do you need to have the whole Governance implemented from the very beginning?

I will say:

"No, but you should define a strategy to set up the right controls at the right time."

This means that you need to identify which Governance implementations are essential and which supplemental and if they might have some dependencies to each other. As a result, you should come up with a roadmap. e.g. Starting first with urgent Technical Security controls, Standardization, and if needed also Regulatory Compliance, and then subsequentially addressing all other areas based on demand.

But what is Governance exactly?

The establishment of Governance is nothing less than the definition of a various number of quality gates. These quality gates are either strict or informal and can be categorized in the following four types:

The main idea is to define a Governance lifecycle and capture all stages:

Proactive Governance will catch all uncompliant configurations before implementation. It is a quality gate that forces your users and teams to obey to defined standards and blocks uncompliant settings technically or procedural. e.g., input validations, deny policies, privileged rights request process, guidelines
Implemented Governance will prevent misconfigurations by using Automation or templates, which already integrate the centrally managed Governance requirements. e.g., Cloud policies with modify actions, automated deployments, solution templates, configuration baseline, DR guidance
Continuous Governance - as per the name - checks continuously on various settings and definitions and prevents wrong configurations after the initial setup. e.g., policies, quality gates in the release pipelines as unit tests and best practice analysers, DSC, Monitoring and Alerting
Reactive Governance tries to identify uncompliant states that were not caught by the previous quality gates and either highlights or fixes the issues automatically. e.g., frequent pentests or scans, regular reports

The higher your Governance maturity is from the very beginning, the less friction you will have with newly and manually created solutions/resources afterward. But, - you should always cross-check if you are not blocking product teams with some of those rules and delaying ramp-up.

But what is actually Governance Maturity?

Governance Maturity

You will start your Governance path with many blind spots, not much Automation, and a lack of technical quality gates. Transitioned to our previous model, I added one additional type, which I name "Missed". The reactive approach means that you will search and find issues on a frequent base. But the "missed" ones mean that you did not even consider those as requirements and would not always identify those with the currently established reactive approaches. You will always have some missed topics that somehow come up reactively, but the amount and the frequency should significantly lower over time.

Let us have a look at an exemplary Governance Maturity evolvement:

Low Maturity

You start with some well-defined rules like required tags and naming conventions, build up some shared services for Networking, but the majority is being missed or comes up on a reactive base. Your security team complains about insecure workloads and privileged rights issues that seem a bit out of control.

Medium Maturity:

Over time you start to build a Policy-Framework, increase the level of Automation, and create documentation, guidelines, and templates that are being used by more and more teams. There are still way too many topics that are being highlighted on a reactive base. e.g., Your security team complains now regularly about minor but sometimes still significant issues. The significant issues come up less frequently, but more often than you like. Topics like Cost Management come up repeatedly now, as the costs are increasing exponentially. Also, more and more teams are evaluating to bring PII or other sensitive data to the Cloud and ask for more robust regulatory Compliance controls.

High Maturity:

The majority of all Governance areas are captured in the whole lifecycle. The essential ones are blocked directly and are continuously revalidated. The level of Automation is high, with most of the other requirements directly implemented into it. You have a good resource base with standard patterns, a knowledge base with guidelines and templates approved by necessary teams and ready for usage for your Product teams. Besides, you have Monitoring and Alerting established to identify any deviations almost instantly. But even if your rules did not catch all deviations, you have a strategy to frequently run reactive scans and reports to identify Governance violations very quickly.

So,- this was one example of a maturity evolvement. Let us now break this down into the key tasks to reach a mature Governance strategy:

Increase proactive and continuous implementations to mitigate deviations from the very beginning.

Make use of policies, desired state implementations, Monitoring, and Alerting - prefer continuously checking policies over single quality gates. A good recommendation is to define a technical Policy-Framework. Your documentation and guidelines should be centralized and well-communicated. Usually, proactive rules are hard requirements that should not be circumvented by anyone. Don´t set up too many strict rules and allow your developing teams the room for flexibility.

Increase the implemented Governance by using templates and an increasing level of Automation. IT should become a partner for your business and enable it for a fast and secure ramp-up instead of being a service provider with strict rules.

Processes and patterns (e.g. for DR) should be standardized and easily reusable. You have centralized repositories available with these templates, including documentation, ready to use for your product teams. By doing so, you can ramp up the teams quickly and securely, and decrease the overhead by redesigning similar requirements from scratch. Repetitive standard requests are fully automated and implemented in the Cloud Architecture.

Run reactive checks frequently and try to transition these findings into the other areas to directly prevent them from happening in the future.

You have a detailed schedule on when to run which reactive tasks and validate the current implementation. Dedicated people own reactive tasks, and transparency is recognized as something positive in your environment. After identifying issues, you validate to include the findings in not reactive approaches to avoid these deviations from happening.

And that´s it. Now it is on you to implement it. Get an overview of all necessary tasks and controls, classify them between proactive, continuous, implemented, and reactive, and plan when to implement which of them. And - don´t miss too many! ;-)

I hope it helped and I am happy to hear about your feedback!

All the best,

David das Neves

💻🏰 | 𝗜𝗧 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁 𝘄𝗶𝘁𝗵 𝟯𝟳𝗸+ 𝗙𝗼𝗹𝗹𝗼𝘄𝗲𝗿 👨👨👧👧 IT Passionate, especially about the future of IT
✍👐🏼 | 𝗕𝗹𝗼𝗴𝗴𝗲𝗿 𝗮𝗻𝗱 𝗦𝗽𝗲𝗮𝗸𝗲𝗿 - sharing about #𝗔𝘂𝘁𝗼𝗺𝗮𝘁𝗶𝗼𝗻, #𝗖𝗹𝗼𝘂𝗱𝗖𝗼𝗺𝗽𝘂𝘁𝗶𝗻𝗴, #𝗦𝗲𝗰𝘂𝗿𝗶𝘁𝘆, #𝗗𝗶𝗴𝗶𝘁𝗮𝗹𝗧𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻
📬🐤 | 𝗧𝘄𝗶𝘁𝘁𝗲𝗿: https://twitter.com/david_das_neves | 👾 📝 | 𝗚𝗶𝘁𝗛𝘂𝗯: https://github.com/ddneves