Getting a grip on your AWS permission policies

Recently I’ve been getting a lot of questions from our customers about how to deal with the different layers in the IAM permissions model. For example: how to find the right balance between service control policies, identity-based policies and a permissions boundary, and how does a central cloud platform team protect their platform resources given the different policy limitations?

In this post I’ll briefly recap the theory, make you aware of some of the caveats, and share some of our guidelines when implementing these policies. At the end I will send you off with a list of useful articles in case you want to get a bit more in-depth on the topic.

Refreshing the theory

Before we start refreshing the theory on service control policies (SCPs), identity-based-policies (IAM policies) and the permissions boundary I want to note that I’ve purposefully left out resource-based policies and session policies from this blog post. Even though these policies are of utmost importance in the overall permission structure in AWS, I feel the demarcation of what goes into these policies is a lot more clear compared to the policies that we discuss as the topic of this post.

At the center of the AWS permission placement challenge is the order in which policies are evaluated. The flow chart below shows how the decision is made whether an action is permitted or not. This flow chart excludes the impact of resource-based policies as well as implicit denies in the other policy types.

https://docs.aws.amazon.com/IAM/latest/UserGuide/images/PolicyEvaluationHorizontal111621.png

The explicit ‘deny’ is king, as it overrides any ‘allow’ statement made in any of the policies that apply to a principal. This is especially important to keep in mind when you start explicitly denying actions in generic policies you want to apply to multiple principals such as service control policies. When you decide to add explicit deny statements in those generic policies, you have to be very sure that that is what you want for all of those principals. Luckily there are ways to exclude principals from these statements, which will be covered later in this post.

If you do not explicitly deny an action, the request is implicitly denied, setting the stage for adhering to the principle of least privilege. No principal is able to do anything in our AWS account unless we explicitly allow it.

From that point forward it’s a matter of an overlap of explicit allow statements that determine the effective permissions of a principal. Only actions that have been explicitly allowed by all three policy types (assuming all 3 are in place here, because they should be!) will be permitted:

https://docs.aws.amazon.com/IAM/latest/UserGuide/images/EffectivePermissions-scp-boundary-id.png

One of the biggest challenges

I think it goes without saying that one of the biggest challenges we have to deal with while implementing permissions policies is policy document size. Regardless of what policy type we’re dealing with, we have a limited amount of space in the documents we used to define those policies. This forces us to be creative with the content of those documents, while making sure we aren’t accidentally over-permissive.

Service control policies in particular are difficult to deal with in terms of size. We have to make sure we fit everything we want into a 5120 bytes policy document, which can get tricky when you want to make them a little more complex. A thing to keep in mind is that whitespaces (spaces and line breaks) are included in that character limit. Something we often do at Xebia is to transform a readable SCP in a code repository to a less-readable SCP during deployment by removing all those characters. You could apply this to every policy document you deploy, but if you’re using native CloudFormation that makes your committed code particularly difficult to read.

Especially with the broad adoption of AWS SSO space becomes a challenge. As we are not able to directly attach a permissions boundary to a permissions set, we now have to move whatever statements we’ve defined in our permissions boundary to the permissions set – which obviously uses more space.

However, a lack of space in policies often indicates we are implementing less-than-optimal policies. Overly complex policies often require more space, so let’s have a look at how we can keep the policies clean, simple and short.

Our guidelines

In order to construct an effective combination of policies, while keeping in mind the caveats we have to deal with, we adhere to a set of guidelines when designing these policies. It is important to emphasize these are guidelines and should not be treated as strict rules. This allows us to customize the implementation based on customer requirements.

When restricting API activities for the majority of principals on a/the platform, use service control policies and use the aws:PrincipalArn policy condition to create exclusions from that policy

I see a lot of customers using their precious character space in identity-based policies or permission boundaries to implement preventive guardrails to protect their landing zone / platform-related resources. As much as platform administrators/operators think they should have some freedom in that area, there’s no reason not to include these entities and force any changes to be made to these resources through infrastructure-as-code. Other than the IAM principal that is used by the automation there is no entity that should be able to change these resources, which makes this kind of policy a perfect match for a service control policy.

Because we want to exclude the IAM role used to provision platform components, we need to use the aws:PrincipalArn policy condition like so:

{
   "Statement": [
       {
           "Sid": "BlacklistedLandingZoneActions",
           "Effect": "Deny",
           "Action": [
               "ec2:CreateTraffic*",
               "ec2:CreateTransit*",
               "ec2:CreateVpc",
               "ec2:CreateVpcPeer*",
               "ec2:DeleteNetworkAcl*",
               "ec2:DeleteRoute*",
               "ec2:DeleteSubnet",
               "ec2:DeleteTraffic*",
               "ec2:DeleteTransit*"
               ....
           ],
           "Resource": [
               "*"
           ],
           "Condition": {
               "ForAnyValue:StringNotLike": {
                   "aws:PrincipalARN": [
                       "arn:aws:iam::*:role/customer-platform-provisioning"
                   ]
               }
           }
       }
   ]
}

Use the service control policies to protect platform resources, otherwise resort to permissions boundaries

As I mentioned before in this post I believe we should use SCPs to protect platform resources, but sometimes you’ll find yourself in a situation where that’s just not possible for a variety of reasons (one might think about document size limits :)). In those cases I think it’s perfectly fine to resort to permission boundaries, as long as you enforce usage of them.

Use a clear naming convention for AWS resources

Please, do yourself and your colleagues a big favor and implement a clear naming convention for the AWS resources you deploy. By adhering to a clear standard and using prefixes, let’s say customer-, it allows you to leverage wildcards in policies, greatly reducing the policy space you need. This also sets the stage for a separation of duties between different entities that use your platform. We can use prefixes to facilitate access control to certain resources. Generally, a naming convention works better than using tags to control access to resources as not all resources support tagging.

Let’s imagine this: we have a few CloudFormation stacks deployed by the cloud platform team that are not using a certain naming convention:

route53-resolvers
config-bucket
vpc
ssm-patch-management

In order to protect these CloudFormation stacks, you’d have to specify the following policy:

{
  "Statement": [
      {
          "Sid": "DontTouchOurCloudFormationStacks",
          "Effect": "Deny",
          "Action": [
              "cloudformation:*"
          ],
          "Resource": [
            "arn:aws:cloudformation:region:accountid:stack/route53-resolver-rules",
            "arn:aws:cloudformation:region:accountid:stack/config-bucket",
            "arn:aws:cloudformation:region:accountid:stack/vpc",
            "arn:aws:cloudformation:region:accountid:stack/ssm-patch-management"
          ]
      }
  ]
}

With a consistent naming convention for your stacks, you’d be able to shorten that policy by a significant amount of characters:

{
  "Statement": [
      {
          "Sid": "DontTouchOurCloudFormationStacks",
          "Effect": "Deny",
          "Action": [
              "cloudformation:*"
          ],
          "Resource": [
              "arn:aws:cloudformation:region:accountid:stack/customer-*"
          ]
      }
  ]
}

Although the amount of characters gained right now seems limited, imagine having to specify all CloudFormation stacks that make up the landing zone in an account.

Use conscious wildcarding while defining actions in policies

When specifying actions in your policy document it doesn’t hurt to be aggressive in your usage of wildcarding, but be careful not to be overly permissive by accident.

Let’s say we want to prevent our development teams from accepting invitations/attachments related to our VPC, we could do so with the following policy:

{
   "Statement": [
       {
           "Sid": "ProtectOurVPC",
           "Effect": "Deny",
           "Action": [
               "ec2:AcceptReservedInstancesExchangeQuote",
               "ec2:AcceptTransitGatewayMulticastDomainAssociations",
               "ec2:AcceptTransitGatewayPeeringAttachment",
               "ec2:AcceptTransitGatewayVpcAttachment",
               "ec2:AcceptVpcEndpointConnections",
               "ec2:AcceptVpcPeeringConnection"
           ],
           "Resource": [
               "*"
           ]
       }
   ]
}

This policy is significantly shortened by the aforementioned wildcarding:

{
  "Statement": [
      {
          "Sid": "ProtectOurVPC",
          "Effect": "Deny",
          "Action": [
               "ec2:Acc*"
          ],
          "Resource": [
              "*"
          ]
      }
  ]
}

Leverage nested OUs to apply more service control policies

OUs can be nested up to five levels deep. Given that SCPs are inherited from parent OUs, it allows us to layer SCPs on top of each other. When adding the additional levels of permissions introduced with identity-based policies and permissions boundaries you’re guaranteed to be able to effectuate the desired permission set.

Conclusion

In this blog post I’ve outlined the main challenges you face while implementing service control policies, identity-based policies and a permissions boundary. I gave you some insights into how Xebia approaches these challenges and things to be aware of while implementing your different layers of policies.

Although a lot of these guidelines seem trivial, we regularly see people struggling with implementing policies in a clear and consistent manner. Even though every customer case is unique, hopefully the pointers I shared within this article are enough to comfortably get you going. However, if you feel you could still use some help with this – you know where to find us!