Day-16 | AWS CLOUD WATCH DEEP DIVE | DEMO - LIVE EC2 CPU ALERTING THROUGH SNS | #aws #devops

Day-16 | AWS CLOUD WATCH DEEP DIVE | DEMO - LIVE EC2 CPU ALERTING THROUGH SNS | #aws #devops

Brief Summary

Okay, so this video by Abhishek Veeramalla is all about AWS CloudWatch. He explains what CloudWatch is, its features, and how to use it for monitoring, alerting, and logging. He also does a demo on setting up alarms for CPU utilization. Key takeaways include understanding CloudWatch as a gatekeeper for your AWS account, the importance of metrics and alarms, and how to use CloudWatch for cost optimization and scaling.

  • CloudWatch is a gatekeeper for your AWS account.
  • Metrics and alarms are key to monitoring and alerting.
  • CloudWatch can be used for cost optimization and scaling.

Intro

Abhishek welcomes everyone to Day 16 of the AWS Z to Hero series, focusing on CloudWatch. The video includes both theory and practical demos, with two demos planned for better understanding. He encourages viewers to perform the demos themselves to answer real-life interview questions, which often revolve around practical scenarios and issues faced while using the service. While many follow the theory, few do the demos, which is essential for tackling interview questions about real-world usage.

Agenda and Concepts

Abhishek outlines the agenda, starting with the fundamentals of CloudWatch: what it is, the problems it solves, and its features. He will cover key concepts like metrics, alarms, and custom metrics. The session includes two demos: one using default AWS metrics to set up CPU utilization alarms and another on custom metrics, which he notes has less content available online.

What is CloudWatch?

CloudWatch is described as a gatekeeper or watchman for AWS, monitoring activities on the cloud. It tracks most activities, such as creating EC2 instances or uploading content to S3. Users can query CloudWatch to understand what happened on their AWS account, similar to asking a gatekeeper about events on a secure property. It's an AWS service accessible for tracking activities.

Features of CloudWatch

CloudWatch's primary functions are monitoring, alerting, reporting, and logging. Monitoring is crucial for DevOps engineers, covering both infrastructure and applications. CloudWatch provides real-time metrics, such as API requests, CPU utilization, and memory consumption, which help in understanding and communicating about AWS service usage. Metrics are easy ways to communicate utilization.

Metrics and Alarms

Metrics and alarms are closely linked. Metrics collect data, like CPU utilization, while alarms trigger actions based on metric outcomes. For example, an alarm can send a notification if CPU utilization reaches a certain threshold. This combination allows for proactive responses to potential issues.

Log Insights and Custom Metrics

CloudWatch offers log insights, providing logs of service access activities. While some logging happens automatically, custom metrics can enhance CloudWatch's capabilities. For instance, CloudWatch tracks CPU utilization by default but not memory utilization, requiring custom metrics to monitor memory usage.

Cost Optimization and Scaling

CloudWatch plays a critical role in cost optimization by integrating with services like Lambda functions to identify unused resources. It also aids in scaling by informing autoscaling groups about CPU utilization, triggering scaling actions. While CloudWatch doesn't directly perform these tasks, it integrates with other services to achieve them.

Real-Life Metrics Demo

Abhishek logs into his AWS account and navigates to the CloudWatch service, which AWS describes as a tool to monitor resources and applications. He highlights the top features: logs, metrics, alarms, and dashboards. He starts with log groups, which CloudWatch automatically creates for logs from services like CodeBuild, making it easier to track activities within specific projects.

Exploring Log Groups

CloudWatch automatically creates log groups for each project, such as those in CodeBuild. For example, a project named "sample python flash service" has its activities logged, including build details and errors. This feature allows users to retrieve information even after a project is deleted.

Metrics Feature

CloudWatch tracks a vast amount of information using metrics, which can include CPU utilization, disk usage, and network input/output. It has over 1000 default metrics, constantly collecting data from AWS services. Abhishek navigates to the EC2 metrics to demonstrate the available data.

EC2 Instance Metrics

Abhishek explores EC2 instance metrics, such as CPU utilization. He mentions that CloudWatch constantly gathers this information, allowing users to track performance over time. To demonstrate this in real-time, he creates a new EC2 instance.

Creating an EC2 Instance for Demo

Abhishek creates an EC2 instance named "Cloud watch demo" using Ubuntu and a T2 micro instance type. He also points viewers to his GitHub repository, where he has a Python script to simulate CPU spikes for demonstration purposes.

GitHub Repository and CPU Spike Script

Abhishek directs viewers to his GitHub repository (github.com) where he has a folder named "day 16" containing the theory and demos. He highlights the "CPU spike.py" script, which is designed to increase and decrease CPU usage on the EC2 instance for demonstration purposes.

Connecting to the EC2 Instance

Abhishek connects to the newly created EC2 instance via SSH. Initially, the CPU usage is normal because no processes are running. He then prepares to run the CPU spike script.

Tracking Metrics in CloudWatch

Abhishek navigates back to CloudWatch to track the metrics of the EC2 instance. He notes that by default, EC2 instances send metrics every 5 minutes. To get more real-time data, he enables detailed monitoring, which sends metrics every 1 minute.

Running the CPU Spike Script

Abhishek copies the "CPU spike.py" script to the EC2 instance and runs it. He cautions that this script can increase CPU usage significantly, potentially causing unresponsiveness, so it should only be used on demo instances.

Monitoring CPU Utilization

After running the script, Abhishek refreshes the CloudWatch metrics and observes the CPU utilization spiking. He explains that CloudWatch takes some time to reflect the changes. He also demonstrates how to view the metrics in different formats, such as pie charts and bar graphs.

Average vs. Maximum Metrics

Abhishek explains the difference between average and maximum metrics. Organizations typically use average metrics to avoid reacting to short-term spikes. He adjusts the metrics to show the maximum CPU utilization over the last 5 minutes.

Configuring Alarms

Abhishek transitions to configuring alarms based on the metrics. Alarms are used to take action when metrics reach certain thresholds. He explains that alarms can notify engineers of issues, even when they are not actively monitoring the system.

Creating an Alarm for CPU Utilization

Abhishek creates an alarm in CloudWatch that triggers when the CPU utilization of the EC2 instance reaches 50%. He sets the evaluation period to 1 minute and configures the alarm to send a notification via SNS (Simple Notification Service).

Configuring SNS Topic

Abhishek configures an SNS topic to send email notifications. He creates a new topic named "Cloud watch topic" and provides an email address for the notifications. He also adds a custom message to the notification.

Activating the Alarm

Abhishek explains that the alarm is not yet activated because the email address needs to be confirmed. He refreshes his email and confirms the subscription. After confirming, the alarm status changes to "OK."

Triggering the Alarm

Abhishek reruns the CPU spike script to trigger the alarm. He monitors the metrics and waits for the CPU utilization to reach the 50% threshold.

Verifying SNS Configuration

While waiting for the notification, Abhishek verifies the SNS topic configuration to ensure everything is set up correctly. He checks that the email address is in the confirmed status.

Receiving the Notification

Abhishek receives the email notification in the promotions tab of his inbox. The email contains the custom message and details about the CPU spike that triggered the alarm.

Recap of the Demo

Abhishek recaps the demo, highlighting the use of default metrics and alarms. He covered log groups, default metrics (CPU metrics), and dashboards. He also mentions that he didn't cover custom metrics due to the video's length but will address it in a future video.

Conclusion

Abhishek concludes the video, encouraging viewers to try the demo themselves. He emphasizes the importance of hands-on experience and suggests experimenting with different metrics. He thanks viewers for watching and promises to cover custom metrics in a future video.

Share

Summarize Anything ! Download Summ App

Download on the Apple Store
Get it on Google Play
© 2024 Summ