In the last few years the costs for running security went skyrocketing. We’ve seen an explosion of security solutions, all fighting for their own place in cybersecurity. I’m pretty sure each solution has its purpose, but from a CISO perspective the landscape is getting more complex and more expensive each year, or friendly phrased: less cost-effective.
Public cloud is the main driver for innovation. With the arrival of new thinking, new capabilities and reinventing how we run IT things have changed tremendously.
I would like to take the opportunity and mix two hot topics in a series of blogs: Security and Cost Optimization. And yes (spoiler), the two can strengthen each other. In this and coming blogs I’m sharing our experience gained over the last years and tens of migrations/implementations.
One of the most painful topics during a migration is Active Directory integration. Not only from a financial perspective, but also from a technical perspective. For decades we’ve been implementing Active Directory as our primary user directory and subsequently let all our resources join the directory. This way we had the feeling that we had some kind of fine grained control over who had access to which resource.
The main technical problem with AD is tightly integrated with infrastructure. Another example: it replaces your DNS-resolver and it relies on IP communication. This forces you to create an attack surface which more or less spans all your networks, since you have to expose AD to different types of networks. Exposing your private AD to resources hosted in a public subnet, for example. And also, if you have multiple domain masters, they all need to be reachable by every resource. This will lead to a very complex routing setup as your environment starts to grow and especially when running hybrid mode. This is something you want to avoid when using a platform with very strong isolation capabilities.
Another challenge is the pets versus cattle analogy. Resources in public cloud (if done the correct way) are dynamic and short-living. They come and go as the demand changes. I’m aware that you can script domain joining, but when looking at AD’s design you notice that this is never built to facilitate such a dynamic environment. After a while you end up with thousands of orphaned resources. Nor it is highly scalable. Don’t forget that AD preferably wants all clients to use AD as their resolvers for DNS-resolving.
Probably I can talk for hours on why extending your AD setup for administrative purposes to a capable platform is not a very good idea in general. Let’s focus on how to solve this in a more effective way. One of the key features of AD is to provide RDP/SSH access to instances for remote desktop/shell access. AD is taking care of the authentication and authorization process. So for each connection attempt the user is prompted (or passed via Kerberos) to login. Based on policies a user/group is allowed to enter an instance. This is not a scalable model, since each instance or group of instances needs to have a Group Policy Object allowing the user identity for remote access. Also the process of domain joining is cumbersome: think of hacky PowerShell scripts and very odd Linux configuration (different configuration per distribution).
To simplify remote management AWS launched two features: SSM Session Manager (EC2 instances) and ECS Exec (containers). It works for Linux, Windows and MacOS. Both features have strong integration with the platform and are easy to implement and maintain. It works via the AWS Systems Manager Session Manager Agent, which is provided via recent AWS provided AMIs. In AWS it’s best practice to attach an instance profile to each compute instance. This profile can be granted to access AWS resources. In order to allow the SSM Agent to ‘call home’ and register itself at the SSM service, simply attach the AWS managed policy
AmazonSSMManagedInstanceCore to the instance profile. By default it will use the public API endpoints of SSM to connect to, but of course you can deploy VPC endpoints to avoid traffic traveling across the Internet. Now launch your instance and the preparations on the platform are completed. Easy, isn’t it?
Now it’s time to allow your IAM users/roles to access the instances. Depending on your needs you can decide yourself who is granted access to call the
ssm:StartSession action. This can be combined with your AWS Account Structure, federation, and IAM conditions such as tags, time, source IP, etc. to make it as fine-grained as needed.
Note: this also works for hybrid environments. You can install the SSM Agent on your on-premises resources and manage your fleet of instances via AWS Session Manager. It is also compatible with outbound/forward proxies. So it technically fits almost any environment.
From an audit perspective all actions are recorded in AWS CloudTrail, session data to S3 or Amazon CloudWatch Logs. If you want real-time event response you can use Amazon EventBridge in conjunction with Amazon Simple Notification Service to forward events to your Slack, SIEM, ITSM, or Lambda ( for auto-remediation) etc.
As a final step you shouldn’t forget to remove the ingress rules in your Security Group, since direct access via RDP and SSH is no longer required. If, for any reason, you use SSH for port-forwarding: don’t worry. This is supported by SSM. Read more about it here. AWS Systems Manager Session Manager is providing access via the web console and via AWS Command Line Interface, so you can choose which one to use.
Note: ECS Exec currently only supports access via awscli.
The running costs for the solution above are probably way below the price of your lunch (even when working from home). SSM Session Manager is free of charge, for CloudWatch Logs and S3 storage costs are charged and for VPC endpoint there is an hourly charge.
Introducing AWS Systems Manager will unleash a number of new capabilities while the security footprint such as operational management, running costs, number of tools etc. will be reduced to a minimum. The beauty of this solution is also that you don’t have to expose ports or establish cross network/environment connectivity which normally would harm your level of isolation and increase the blast radius. I strongly advise reading about the other capabilities provided by SSM here, as it will give you a lot of powerful features. Due to the lightweight nature of this agent it is also possible to run this on architectures with less resources, such as IoT devices.