Cloud Governance and Operational Excellence… What the heck is that? 🤔

Cloud has been the new normal for years now and it has brought three major concepts in regard to building and designing cloud infrastructure which, in turn, can help you understand your cloud maturity and readiness level. Let’s look at these three concepts.

1 — Cloud Migration

Cloud Migration is a transition, not an instant cut-over. For an organization, a hybrid cloud is the norm with physical, virtual, and cloud workloads. Usually, companies begin by migrating their monolithic applications or physical and virtual servers to cloud providers. The main goal here is to start using cloud services like IaaS, PaaS, and SaaS from large cloud providers like AWS, Azure, GCP, and Oracle.

"Cloud Migration is very commonly associated with lifting and shifting of servers and monolithic applications."

Usually, organizations have challenges in adapting their current applications, infosec security, and management systems from the data center to cloud service providers. This happens mainly because those technologies are not adapted very well to these new environments.

2 — Cloud-Native Application

This is the stage of delivering fast, iterating often, using Infrastructure-as-a-Code, DevOps pipelines, code re-use, open-source, containers, and public code repositories. At this stage, organizations start consuming cloud-native solutions like containers-as-a-services, serverless, cloud storage, CI/CD tools, and Kubernetes, sometimes across multiple cloud service providers to try to get the best performance, cost, and functionality out of major cloud providers.

Usually, there is a change in application architecture, from monolithic to micro-services, for a more agile, highly scalable, and independently deployable design, which makes it easier to release new features and maintain the current code.

A great way to find more technologies under this topic is to look at the CNCF (Cloud Native Computing Foundation) landscape: https://landscape.cncf.io/

3 — Cloud Governance and Operational Excellence

At this stage, organizations are forming teams with the goal of achieving cloud governance and centers of operational excellence. Typically at this level, companies are leveraging cloud-native applications but are “NOT” optimized or following the best practices recommended by frameworks from cloud providers or global standards like PCI DSS, NIST, and HIPAA.

They build these teams to standardize processes making the environment more repeatable, consistent, and to avoid errors during the creation or updating of new applications for their business units. Also, they want to optimize infrastructure costs, support consistent security checks across multiple cloud providers, and manage business risks to multiple geographic regions, and services in highly complex environments.

They need assurance that cloud services are not left insecure, that they are meeting internal governance requirements, and complying with specific standards and compliance frameworks.

"The cloud offers innumerable benefits for your business and organization, as long you have the right policies and guardrails to protect it from possible mistakes or misconfigurations in the cloud services."

Why is Cloud Governance and Operational Excellence important?

The old model of having systems only in the on-premise environment allows companies to have control over who has access to specific data, servers, networks, and to measure the security risk and costs. This is because all the infrastructure, networking, and storage systems happen under one team, the "Infrastructure and Networking Team" (this could be different depending on the company). In the cloud model, you can easily lose that control, because most of the time Business Units have their own DevOps, SRE, and Cloud Architects teams, which in turn, may use different cloud providers for the application that they are working on.

Nevertheless, to create rules for access, cost, security, and compliance for data and applications in the cloud are a little tricky. If not done correctly, businesses can lose the agility and advantages of cloud services.

Cloud governance ensures that everything from asset deployment to systems interactions to data security is properly considered, examined, monitored, secured, and managed. The shift from a regular data center to a cloud environment adds layers of complexity to your architecture, which need to be considered and validated. Cloud operational excellence, on the other hand, includes the ability to support the development of cloud workloads effectively while also helping the business gain insight into their operations. The result will be continuous improvement in processes and procedures which will deliver true business value.

How can organizations create a Cloud Governance and Operational Excellence strategy?

Below you will find a few guidelines that I recommend to begin to build the strategy. They are based on the organizations I’ve had the opportunity to work with cloud security and cloud projects for the last 6 years.

1- Awareness and Visibility

For organizations in the early stages of cloud adoption — or even mature stages it is very important to have visibility within the cloud environment and applications in order to begin to understand and identify potential business risks. Organizations need to be aware of their applications’ dynamics, cloud teams processes, and security risk across multiple cloud platforms in order to enable their DevOps/Cloud teams to drive business value through desired outcomes.

2- Establish And Maintain Security (Security Risks and Compliance)

Every time cloud is mentioned, one of the top concerns is "security" and how we can deliver the same security capabilities that we have in our datacenters. into the cloud providers. It is quite a bit of a challenge to use the same tools that you have in your datacenter in the cloud, that is why is important to review if those tools can achieve your goal in this new environment.

“It is important to remember that security should be considered a shared responsibility, if one is to keep data safe in the cloud.”

The figure above describes how shared responsibility works across the cloud service models according to Microsoft

It is important to remember the regular tools that are used in datacenter sometimes, depending on what services from the cloud providers you use, are not the perfect fit for the cloud environment you are running.

In the last six years that I have spent talking with companies in the United States, Canada, Latin America, Brazil, and the Middle East, I have noticed misconfigurations becoming a big issue more and more. This is also becoming a headache for security professionals concerned with data leaks from misconfigurations. As Gartner states, “through 2022, at least 95% of cloud security will be the customer’s fault.”

“Soon, most of the attacks in the cloud environment will be the result of misconfigurations, lack of customizable security profiles, and auto-remediation by organizations in their day-to-day applications.”

Here are some interesting security findings from Cloud One — Conformity (based in the AWS Well-Architected Framework) — Links for more details:

  • S3 Bucket Public Access Via Policy — Link
  • Canary Access Token — Link
  • AWS IAM Server Certificate Size — Link
  • AWS Root user has signed in without MFA — Link
  • ECR Repository Exposed — Link
  • AWS AMI Encryption — Link
  • EC2 Instance Not In Public Subnet (Ensure that no backend EC2 instances are provisioned in public subnets) — Link

Be sure to enable the built-in security tools provided by cloud service providers, use Cloud Secure Posture Management (CSPM) tools to help you with visibility and identification of possible drift in your best practices and compliance, introduce the security solution to be implemented into your DevOps pipeline ( here is an article talking about the possible ways to add security into your CI/CD pipelines), and integrate all those logs with a SIEM solution to bring you much better visibility within all security layers.

3- Classify And Structure Your Data

An important, but far from a simple task, is introducing guidelines for the data classification in the cloud. Today, we have important compliance frameworks such as GDPR, CCPA, LGPD, among others, allowing consumers to request access from data collected as well as data deletion.

Imagine how complicated it could be to go through this process without a good process of data classification, cleaning, and organization for customers.

  • Check out this document from Amazon Web Services for help with data classification — Link
  • Also check out this document from Microsoft Azure for help with data classification — Link

4- Controlled Access and Access Management

After the process of data classification, it is important to understand who has access to the data in your cloud environments. Identify and Access Management (IAM) helps organizations manage user governance applying user policies and protecting companies against unauthorized access to data and application.

Many organizations across the globe use Active Directory for this task or, Cloud Native IAM tools from your cloud provider, also Okta, Centrify, Ping, and others can help you build great access management guidelines within your organization.

Here are some tips on how to implement IAM in your company:

5- Manage Your Cloud Costs

Without good control measures in place around access to create cloud environments and/or validate that services are cost-optimized (following best practice frameworks), it is easy to unnecessarily costs spike tens of thousands to hundreds of thousands of dollars ($10,000’s — >$100,000’s ). Here are some great examples of links associated with extra costs:

  • Idle EC2 Instance (Low used of CPU and Network Utilization) — Link
  • Reserved Instance Lease Expiration In The Next 7 Days — Link
  • Unused EBS Volumes — Link
  • Idle Elastic Load Balancer — Link
  • Idle RDS Instance — Link
  • RDS Reserved DB Instance Lease Expiration In The Next 7 Days -Link
  • Unused Redshift Reserved Nodes — Link

Findings from Cloud One — Conformity (based in the AWS Well-Architected Framework) — Link for more details

Management overhead can get even more complicated when you are using multi-cloud services and multi-cloud accounts in each of the providers. I had an experience with a customer recently where they had over 1,500 cloud accounts making management very complicated and without an automated solution for it.

It is critical that you track, predict, and monitor your cloud spend. Whether you are a smaller organization or a larger organization, it can help you gain visibility, understand cloud usage, facilitate better decisions around cloud strategy, and save on costs going forward. Please see below for solution examples.

  • APPTIOLink (check out the Apptio Cost Transparency feature, very cool)
  • CLOUDYNLink

Conclusion

Once you create the governance, strategy, and set the policies that you’ve defined, review the rules regularly to make sure they are up to date with any policy changes, compliance updates, and make sure they follow the best practices. This will help your business realize the true benefits and value of cloud services and DevOps practices. Fine-tuning it frequently is important to achieve balance and the desired risk tolerance for the organization with respect to access, control, data privacy, costs, security, and compliance.

Acknowledgment

I want to say a BIG thank you for some people that helped me with fantastic feedback to improve this article:

  • David Clement
  • Russ Cahoon
  • Stephanie Laranjeira
  • Yama Saadat
  • Tabitha Doyle
  • Ian Messiter
  • Ben Masso

If this post was helpful, please click the clap 👏 button below a few times 😉👍! ⬇

I'm a Computer Engineer 👨‍💻 with a passion for Cybersecurity, DevOps, and Cloud. When I'm not at my 💻 , I'm traveling and taking photos across the globe 🌎

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store