This is a guest post from Prodigal DevSecOps Engineer Prashant Banthia. It originally appeared on Medium - follow him there or on LinkedIn!
In the fast-paced environment of startups, cloud costs often get overlooked in the rush to build and deploy products, leading to inflated expenses that become a burden for management. This was the case for us at Prodigal, where our cloud costs ballooned and became unsustainable for our scale.
Taking inspiration from Benjamin Franklin’s quote, “A penny saved is a penny earned,” recently, we started cloud cost optimization measures and tackled our growing AWS bill to successfully cut our monthly cloud expenses by 30%.
In this blog post, I’ll share the strategies we implemented to optimize our AWS services, which could help other companies manage their cloud infrastructure more efficiently.
S3 storage can quickly become costly if not managed properly. We used various storage classes and lifecycle policies to optimize costs.
S3 is the cheapest and go-to storage option in AWS, but over time if not taken care it can become a dump yard with a lot of data, including critical customer data, logs and test data, etc. lying around in hundreds of S3 buckets, leaving you clueless on what to do with it.
To tackle this, first target the buckets that are highest contributors in costs and classify data stored in these S3 buckets based on for how long the data is used actively and if it can be deleted or archived.
Based on the analysis, you can utilize S3 storage classes to save costs. Amazon S3 offers various storage classes designed to cater to different storage needs and usage patterns. Each storage classes (Standard, Standard-Infrequent Access, One Zone — Infrequent Access, Intelligent Tiering, Glacier, Glacier Deep Archive) may have variations in storage costs, retrieval fees, data transfer costs, and minimum storage durations which is published on AWS S3 Pricing.
We implemented lifecycle policies to transition objects between storage classes and eventually delete them if they were no longer needed. This ensured that we were only paying for the storage we actually used.
Tip- Before directly applying lifecycle policies on the data, calculate the lifecycle transition costs, as it is based on the number of objects and can be significantly high in case of large number of objects. In that case you can implement methods like downloading the data on an EC2, zip it as an archive file and upload it to different storage classes.
We analyzed our EC2 instances and identified those that were over-provisioned. By right-sizing them, we ensured that we only paid for the resources we actually needed.
Steps taken:
Also consider changing the instance families if applicable, for instance if you are using t or m type instances and the memory utilization of the instance is more as compared to CPU and switch to r type instances.
If your workloads supports ARM-based CPUs, consider moving to AWS Graviton instances as they provide better price performance, leading to significant cost savings.
If your workloads does not support ARM based CPUs then you can consider using instance types with AMD processors like r5a, t3a which are significantly cheaper than standard instance types like r5, t3 respectively.
Regular audits helped us identify and remove unused or idle resources such as AMIs and snapshots older than a year, unattached EBS volumes, and load balancers and Elastic IPs. This step alone contributed to a significant reduction in our monthly AWS spend.
We evaluated all active services and realized some, like AWS Config and AWS Inspector, weren’t essential for our current operations and were significant part of our AWS bill. Disabling these services cut down our costs without impacting our performance or security.
Also, see if you have two Management CloudTrail running in different AWS regions. AWS provides one Management CloudTrail free per account and charges twice if you create a second one. If you are using multiple Management CloudTrails, consider keeping only one of them.
A Spot Instance is an instance that uses spare EC2 capacity that is available for less than the On-Demand price. Because Spot Instances enable you to request unused EC2 instances at steep discounts, you can lower your Amazon EC2 costs significantly. The only downside is that they can be interrupted by AWS at any time.
Spot instances when used along with autoscaling groups or with EKS/ECS can help you save a lot of costs. In our case we were using EKS, and we significantly reduced cost by leveraging solutions like Karpenter and Pod Disruption Budgets to utilize spot instances without affecting availability. Check out my blog on EKS cost savings for more details.
We optimized data transfer costs by keeping data transfers within the same AWS region and using AWS CloudFront to reduce costs associated with data transfers out of AWS. Keeping high bandwidth workloads in public subnets to bypass NAT gateways can also help in some cases. To reduce inter-Az data transfers, you can also create a NAT Gateway Per Availability Zone, so that the internet data stays in same Availability Zone. Utilize VPC endpoints as much as possible as they directly talk with AWS.
Also, try to leverage VPC Endpoints as much as possible, as they bypass NAT gateway. VPC endpoint enables creation of a private connection between VPC to supported AWS services like S3, DynamoDB, etc., and VPC endpoint services powered by PrivateLink using its private IP address. Traffic between VPC and AWS service does not leave the Amazon network.
Once you have the EC2 instance optimized, do capacity planning on the capacity you need to run the workloads, based on the assessment procure savings plan and reservations.
Reserved Instances provide you with significant savings on your Amazon EC2 costs compared to On-Demand Instance pricing. Reserved Instances are not physical instances, but rather a billing discount applied to the use of On-Demand Instances in your account when you commit the capacity of a certain instance to AWS for 1 or 3 year period.
Savings Plans offer significant savings over On-Demand Instances, just like EC2 Reserved Instances, in exchange for a commitment to use a specific amount of compute power (measured in $/hour) for a 1 or 3 year period.
AWS recommends Savings Plans over Reserved Instances as with Reserved Instances, you make a commitment to a specific instance configuration, whereas with Savings Plans, you have the flexibility to use the instance configurations that best meet your needs
Amazon CloudWatch is an essential tool for monitoring and managing your AWS resources and applications. However, its costs can quickly add up if not managed properly. Here’s a breakdown of the different CloudWatch costs and some strategies to optimize and save on these expenses.
CloudWatch charges are primarily based on three factors:
i. Data ingestion
ii. Storage
iii. Queries
Data ingestion costs are incurred when you send custom metrics, logs, or events to CloudWatch. The pricing for data ingestion is as follows:
To save on data ingestion costs:
Storage costs are incurred for retaining your logs and metrics in CloudWatch. The pricing is:
To save on storage costs:
Query costs are incurred when you query your metrics and logs in CloudWatch. The pricing is:
To save on query costs:
If you are seeing high query costs, the possibility is that there is a log group with large amount of data and a lot of queries are being run on the log group. To mitigate this, try to reduce the retention period of log group, and if possible, try to break the large log group in smaller ones to reduce the size and hence the query costs.
Search the cost explorer and group Cloudwatch costs by API operation to identify which factor is attributing to the costs.
Optimizing AWS costs requires a continuous and strategic approach. By understanding your current spending, rightsizing resources, leveraging cost-saving plans, and regularly reviewing your infrastructure, you can significantly reduce your AWS expenses without compromising on performance or scalability.
Effective AWS cost management not only helps in reducing expenses but also in reinvesting savings into other business areas. Stay proactive, use the tools and services AWS provides, and make cost optimization a fundamental part of your cloud strategy. By doing so, you’ll ensure your organization gets the maximum value from its AWS investment.