
The cloud has fundamentally changed infrastructure. Instead of buying servers and waiting weeks for them to arrive, you provision them in seconds via API. Instead of managing physical networks, you define them in code. DevOps engineers need to understand cloud platforms deeply -- not just how to click through a console, but how to architect, automate, and operate cloud infrastructure at scale.
# Compute
EC2 - Virtual machines (instances)
ECS - Container service (Docker at scale)
EKS - Managed Kubernetes
Lambda - Serverless functions
Fargate - Serverless containers (no EC2 management)
# Storage
S3 - Object storage (static files, backups, artifacts)
EBS - Block storage attached to EC2 instances
EFS - Shared file system for multiple instances
ECR - Docker image registry
# Database
RDS - Managed relational databases (PostgreSQL, MySQL)
DynamoDB - NoSQL key-value store
ElastiCache - Managed Redis/Memcached
# Networking
VPC - Virtual private cloud (your isolated network)
ALB/NLB - Load balancers
Route 53 - DNS management
CloudFront - CDN
# DevOps Services
CodePipeline - CI/CD
CodeBuild - Build service
CloudWatch - Monitoring and logging
IAM - Identity and access management
# Never use root account for daily work
# Create IAM users with minimum required permissions
# Use IAM roles for EC2 instances and CI/CD (not access keys)
# Example IAM policy for S3 read-only access
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:ListBucket"],
"Resource": [
"arn:aws:s3:::my-bucket",
"arn:aws:s3:::my-bucket/*"
]
}]
}
# AWS CLI -- use named profiles
aws configure --profile production
aws s3 ls --profile production
# Check current identity
aws sts get-caller-identity
# EC2 cost savings
Reserved Instances -- 1-3 year commitment, 30-60% savings
Spot Instances -- 70-90% savings, can be interrupted
Auto Scaling -- Scale down when load is low
# Right-sizing
# Monitor CPU/memory utilisation
# Downsize underutilised instances
aws cloudwatch get-metric-statistics --metric-name CPUUtilization --namespace AWS/EC2 --statistics Average --period 3600 --start-time 2026-01-01T00:00:00Z --end-time 2026-01-07T00:00:00Z --dimensions Name=InstanceId,Value=i-1234567890
# S3 cost savings
# Set lifecycle rules to move old data to Glacier
# Delete incomplete multipart uploads
# Enable S3 Intelligent Tiering for access pattern uncertainty
AWS has the largest market share (~32%) and the most job listings. Learn AWS first. GCP is strong for data/ML workloads. Azure dominates enterprise Microsoft environments. Once you know AWS deeply, GCP and Azure concepts transfer easily -- they solve the same problems with different interfaces.
AWS Solutions Architect Associate is the most recognised entry-level certification. AWS DevOps Engineer Professional is specifically relevant to DevOps roles. For cloud engineers, SAA-C03 (Solutions Architect Associate) is the standard starting point and widely respected by hiring managers.
A VPC (Virtual Private Cloud) is your isolated network in AWS. Resources inside a VPC can communicate with each other but are isolated from other customers' resources. Use public subnets for resources that need internet access and private subnets for databases and internal services.
ECS is AWS's proprietary container service -- simpler and cheaper to operate. EKS is managed Kubernetes -- more complex but portable to other Kubernetes environments. Use ECS for simple containerised applications. Use EKS when you need Kubernetes features or portability.
Set AWS Budget alerts for 80% and 100% of monthly budget. Enable Cost Anomaly Detection. Tag all resources with environment and project tags for attribution. Use Cost Explorer to analyse spending. Restrict who can create expensive resources with IAM policies.
In Part 12, we build a complete DevOps project bringing together everything from this series: CI/CD, Docker, Kubernetes, Terraform, and monitoring.
# 1. Use Spot Instances for non-critical workloads
# 70-90% cheaper than On-Demand
# Good for: CI/CD runners, batch processing, dev environments
resource "aws_spot_fleet_request" "ci_runners" {
allocation_strategy = "diversified"
spot_price = "0.05"
iam_fleet_role = aws_iam_role.fleet.arn
target_capacity = 2
launch_specification {
instance_type = "t3.medium"
ami = data.aws_ami.ubuntu.id
}
}
# 2. Reserved Instances for stable workloads
# 1-year commitment: 30-40% discount
# 3-year commitment: 50-60% discount
# Use Compute Savings Plans for flexibility
# 3. S3 lifecycle policies for storage cost
resource "aws_s3_bucket_lifecycle_configuration" "logs" {
bucket = aws_s3_bucket.logs.id
rule {
id = "archive-and-expire"
status = "Enabled"
transition {
days = 30
storage_class = "STANDARD_IA"
}
transition {
days = 90
storage_class = "GLACIER"
}
expiration {
days = 365
}
}
}
# RDS Multi-Region Read Replica (becomes primary on failover)
aws rds create-db-instance-read-replica --db-instance-identifier mydb-dr --source-db-instance-identifier mydb --source-region us-east-1 --region ap-south-1
# S3 Cross-Region Replication
aws s3api put-bucket-replication --bucket source-bucket --replication-configuration file://replication.json
# Route 53 health check failover
resource "aws_route53_record" "primary" {
failover_routing_policy { type = "PRIMARY" }
health_check_id = aws_route53_health_check.primary.id
set_identifier = "primary"
}