A server you cannot see is a server you cannot trust. Monitoring tells you what is happening inside your EC2 instances and across your AWS infrastructure — CPU usage, memory, disk, network traffic, application errors, and more. AWS CloudWatch is the built-in monitoring service that integrates with every AWS service. Learning to use it well is what separates reactive operations from proactive ones.
CloudWatch is AWS's monitoring and observability platform. It collects metrics from AWS services automatically, stores logs from your applications and OS, lets you set alarms that notify you or trigger automated actions, and provides dashboards to visualize your infrastructure health. Every EC2 instance automatically sends basic metrics to CloudWatch — CPU utilization, network in/out, disk I/O. For more detailed metrics like memory usage, you need to install the CloudWatch agent.
Without any setup, CloudWatch automatically tracks these EC2 metrics every 5 minutes (1 minute with detailed monitoring enabled):
# Get CPU metrics for the last hour
aws cloudwatch get-metric-statistics --namespace AWS/EC2 --metric-name CPUUtilization --dimensions Name=InstanceId,Value=i-0abc123 --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) --period 300 --statistics Average
To get memory usage, disk usage by path, and custom metrics, install the CloudWatch agent:
# Install on Amazon Linux 2023
sudo dnf install amazon-cloudwatch-agent -y
# Create agent configuration
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
# Start the agent
sudo systemctl start amazon-cloudwatch-agent
sudo systemctl enable amazon-cloudwatch-agent
The configuration wizard asks what metrics you want to collect. Choose to collect memory, disk, and optionally Nginx access logs.
Alarms notify you when a metric crosses a threshold. Let us create an alarm for high CPU usage:
# First, create an SNS topic for notifications
aws sns create-topic --name ec2-alerts
# Subscribe your email
aws sns subscribe --topic-arn arn:aws:sns:ap-south-1:123456:ec2-alerts --protocol email --notification-endpoint your@email.com
# Create alarm: alert if CPU > 80% for 2 consecutive 5-min periods
aws cloudwatch put-metric-alarm --alarm-name "High-CPU-EC2" --alarm-description "CPU above 80%" --metric-name CPUUtilization --namespace AWS/EC2 --dimensions Name=InstanceId,Value=i-0abc123 --period 300 --evaluation-periods 2 --threshold 80 --comparison-operator GreaterThanThreshold --statistic Average --alarm-actions arn:aws:sns:ap-south-1:123456:ec2-alerts
You can send your application logs to CloudWatch Logs for centralized storage and searching:
# Add to CloudWatch agent config file
# /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
{
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/var/log/nginx/access.log",
"log_group_name": "/ec2/nginx/access",
"log_stream_name": "{instance_id}"
},
{
"file_path": "/var/log/nginx/error.log",
"log_group_name": "/ec2/nginx/error",
"log_stream_name": "{instance_id}"
}
]
}
}
}
}
Go to CloudWatch → Dashboards → Create dashboard. Add widgets for your key metrics — CPU, memory, network, and any custom metrics. A good dashboard gives you at-a-glance health visibility for your entire infrastructure without having to click through multiple pages. In production, dashboards are displayed on monitors in operations centers so the team can immediately spot issues.
In Part 12, we bring everything together — deploying a complete application on EC2 end to end.
CloudWatch Logs stores log data from EC2 instances, Lambda functions, containers, and other AWS services. The CloudWatch agent on EC2 instances collects application and system logs and sends them to CloudWatch. Configure the agent by editing /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json to specify which log files to collect and which log groups to create. CloudWatch Log Insights provides a query language for analyzing log data at scale — querying across millions of log events in seconds. For example, finding the top error messages in your application logs: fields @timestamp, @message | filter @message like /ERROR/ | stats count(*) by @message | sort by count desc | limit 20.
CloudWatch alarms monitor metrics and trigger actions when thresholds are breached. Effective alarm design requires choosing the right metric (CPU utilization is rarely the right thing to alarm on directly — request error rate or latency is more meaningful), setting appropriate thresholds (alarm on sustained problems, not brief spikes — use evaluation periods of several consecutive data points), and choosing the right actions (SNS notification for human-response alarms, Auto Scaling actions for capacity management, Lambda for automated remediation). Avoid alarm fatigue — too many alarms, especially noisy ones that fire frequently without requiring action, cause teams to ignore all alarms including important ones.
Set up basic monitoring for your EC2 instance: install and configure the CloudWatch agent to send the instance's system metrics (CPU, memory, disk) and a log file to CloudWatch. Create a CloudWatch dashboard showing these metrics. Create an alarm that triggers when CPU utilization exceeds 80% for five consecutive minutes, sending a notification to an SNS topic. Subscribe your email to the SNS topic and simulate high CPU with stress --cpu 2 to trigger the alarm and receive the notification.
Cloud computing is a domain where deep intuition — the ability to make good architectural decisions quickly, to diagnose problems efficiently, and to anticipate how systems will behave under load — develops through accumulated hands-on experience. Every project you build on cloud infrastructure teaches you something that cannot be learned from documentation alone. The cost surprises, the permission errors, the networking debugging sessions, the performance investigations — these are not obstacles to learning, they are the learning. The engineers who have built genuinely deep cloud intuition have usually accumulated it through many projects over several years, not from any single course or certification. Start building things, make mistakes safely in learning environments, and accumulate that experience deliberately.