AWS SAP Notes 05 - Monitoring, Logging and Cost Management
aws sap

Nguyễn Huy Hoàng viết ngày 10/10/2021


alt text

  • Provides services to ingest, store and manage metrics
  • It is a public service - provides public space endpoints
  • Many services have native management plan integration with CloudWatch, for example EC2. Also, EC2 provides external gathered information only, for metrics from inside an EC2 we can use CloudWatch agent
  • CloudWatch can be used from on-premises using the agent or the CloudWatch API
  • CloudWatch stores data in a persistent way
  • Data can be viewed from the console, CLI or API, but also CloudWatch also provides dashboards and anomaly detection
  • CloudWatch Alarms: react to metrics, can be used to notify or to perform actions

CloudWatch - Data

alt text

  • Namespace: container for metrics
  • Data point: timestamp, value, unit of measure (optional)
  • Metric: time ordered set of data point. Example of builtin metrics: CPUUtilization, NetworkIn, DiskWriteBytes for EC2
  • Every metric has a MetricName and a namespace
  • Dimension: name/value pair, example: a dimension is the way for CPUUtilization metric to be separated from one instance to another
  • Dimensions can be used to aggregate data, example aggregate data for all instances for an ASG
  • Resolution: standard (60 second granularity) or high (1 second granularity)
  • Metric resolution: minimum period that we can get one particular data point for
  • Data retention:
    • sub 60s granularity is retained for 3 hours
    • 60s granularity retained for 15 days
    • 5 min retained for 63 days
    • 1 hour retained for 455 days
  • As data ages, its aggregated and stored for longer period of time with less resolution
  • Statistics: get data over a period and aggregate it in a certain way
  • Percentile: relative standing of a value within the dataset

CloudWatch Alarms

  • Alarm: watches a metric over a period of time
  • States: ALARM or OK based on the value of a metric against a threshold over time
  • Alarms can be configured with one or more actions, which can initiate actions on our behalf. Actions can be: send notification to an SNS topic, attempt an auto scaling policy modification or use Event Bridge to integrate with other services
  • High resolution metrics can have high resolution alarms

CloudWatch Logs

alt text

  • CloudWatch Logs provides two type of functionalities: ingestion and subscription
  • CloudWatch Logs is a public service designed to store, monitor and provide access logging data
  • Can provide logging ingestion for AWS products natively, but also for on-premises, IOT or any application
  • CloudWatch Agent: used to provide ingestion for custom applications
  • CloudWatch can also ingest log streams from VPC Flow Logs or CloudTrail (account events and AWS API calls)
  • CloudWatch Logs is regional service, certain global services send their logs to us-east-1
  • Log events consist of 2 parts:
    • Timestamp
    • Raw message
  • Log Events can be collected into Log Streams. Log Streams are sequence of log events sharing the same source
  • Log Groups: are collection of Log Streams. We can set retention, permissions and encryption on the log groups. By default log groups store data indefinitely
  • Metric Filter: can be defined on the log group and will look for pattern in the log events. Essentially creates a metric from the log streams by looking at occurrences of certain patterns defined by us (example: failed SSH logs in events)
  • Export logs from CloudWatch:
    • S3 Export: we can create an export task (Create-Export-Task) which will take up to 12 hours
    • Subscription: deliver logs in real time. We should create a subscription filter for the following destination: Kinesis Data Firehose (near real time), Elastic Search using Lambda or custom Lambda, Kinesis Data Streams (any KCL consumer)
  • Subscription filters can be used to create a logging aggregation architecture
    alt text
    alt text


  • Is a product which logs API actions which affects AWS accounts (example: stop EC2 instance, delete S3 bucket)

  • It logs API calls/activities as a CloudTrail Event, actions taken by user, role or service

  • CloudTrail by default logs the last 90 days in Event History. It is enabled by default at no cost

  • Trails: used to customize CloudTrail history

  • Trails can be of 2 types:

    • Management Events: provide information about management operations performed on resources in AWS accounts (control plane operation). Example: create an EC2 instance, terminate an EC2 instance
    • Data Events: contain information about resource operations performed on or in a resource, example: objects uploaded to S3, Lambda function being invoked
  • By default AWS logs only management events

  • CloudTrail is a regional service, but when we create a Trail, it can operate in one region or in all regions

  • All region trail: collection of log events in all regions

  • Most services log events in the region where the event occurred

  • A small number of services (IAM, STS, CloudFront) log events globally (us-east-1). For a trail to accept global events, it has to be all region trail (has to be enabled for the trail)

  • The default event history is limited to 90 days, with a trail we can be much more flexible

  • A trail can store the events in a defined S3 buckets indefinitely, and it also can be parsed by other tooling

  • CloudTrail can be integrated with CloudWatch Logs, where we can use Metric Filters for example

  • Organizational Trail: a trail created in the master account storing all events across every account in the organization

  • CloudTrail does not offer real time logging!


  • It is a distributed tracing application. It can track sessions through an application
  • X-Ray takes data from many services (API Gateway, Lambda, DynamoDB) as part of an application and gives on single overview of the session flow
  • Tracing Header: when an user connects to an application with X-Ray enabled, a tracing ID is generated an embedded into a tracing header. This header is used to track the request across all supported services
  • Segments: supported services send data to X-Ray using segments. Segments are data blocks containing host/ip, request, response, work done (times), issues information
  • Subsegments: segments can contain subsegments for more granularity. This can contain details to other services as part of the application component
  • Service Graph: JSON document detailing services and resources which make up the application
  • Service Map: visual representation of a service graph by the X-Ray console alt text
  • In order to provide X-Ray data to the AWS X-Ray service we can do the following:
    • EC2: install X-Ray Agent
    • ECS: install agent in a task
    • Lambda: enable X-Ray
    • Beanstalk: agent is preinstalled
    • API Gateway: per stage option
    • SNS and SQS: can be enabled
  • Services require IAM permission in order ot send data to X-Ray service

Cost Allocation Tags

  • Are tags that we can enable to provide additional information for any billing report in AWS
  • Cost Allocation Tags needs to enabled individually per account or per organization from the master account
  • There 2 different form of Cost Allocation Tags:
    • AWS generated - example: aws:createdBy or aws:cloudformation:stack-name. These are added automatically by AWS if cost allocation tags are enabled
    • User defined tags: user:something
  • Both type of tags will be visible in AWS cost reports and can be used as a filter
  • Cost Allocation Tags appear only int he Billing Console
  • After enabling Cost Allocation Tags, it can take up to 24 hours to be visible and active
  • Cost Allocation Tags are not added retroactively

AWS Trusted Advisor

  • Provides real time guidance to provision resources against AWS best practices
  • It is an account level product, requires no agent to be installed
  • Provides a number of checks and recommendations in 5 major areas:
    • Cost Optimization & Recommendations
    • Performance
    • Security
    • Fault Tolerance
    • Service Limit
  • Trusted Advisor is not a free service, at least if we want to get out the most of it
  • The free version is available if the account has basic or developer support plans
  • The free version provides 7 basic core checks:
    • S3 bucket permissions (open access permissions)
    • Security Groups - specific ports unrestricted
    • IAM use
    • MFA on Root Account
    • EBS Public Snapshots
    • RDS Public Snapshots
    • 50 service limit checks
  • Anything beyond these basic checks requires business or enterprise support plans
  • With business and enterprise support we get further 115 checks
  • We also get access to the AWS Support API
  • AWS Support API allows for programmatic access for AWS support functions:
    • We can get the names and identifiers for the checks AWS offers
    • We can request a Trusted Adviser check runs against accounts and resources
    • Allows to get summaries and detailed information programmatically
    • Allows request for Truster Advisor refresh
  • AWS Support API allows to programmatically open support ticket, and manage them
  • With business and enterprise support we get CloudWatch Integration

AWS Support Plans

  • Basic Support:
    • Is included for AWS customers and it is free
    • For Trusted Advisor with this support plan we get 7 core checks (see them above)
  • Developer:
    • For Trusted Advisor we get the same 7 base core checks (see them above)
  • Business:
    • We ge the full set of checks and recommendation
    • We get programmatic access to Trusted Advisor
  • Enterprise:
    • Same as business

Good to Know

  • We can check if an S3 bucket is made public, but we cannot check if objects are public in a bucket. Monitoring this we might use CloudWatch Events/S3 Events instead
  • Service Limits:
    • Limits can only be monitored in Trusted Advisor
    • Cases have to be created manually in AWS Support Centre to increase limits

AWS Billing and Cost Management

Cost Explorer

  • Tracks and analyzes your AWS usage. It is free for all accounts
  • Includes a default report that helps visualize the costs and usage associated with our TOP FIVE cost-accruing AWS services, and gives you a detailed breakdown on all services in the table view
  • We can view data for up to last 12 months, forecast how much we are likely to spend for the next tree months and get recommendations on what Reserved Instances to purchase
  • Cost Explorer must be enabled before it can be used. The owner of the account can enable it

AWS Cost and Usage Reports

  • AWS Cost and Usage report provides information about our usage of AWS resources and estimated costs for that usage
  • The AWS Cost and Usage report is a .csv file or a collection of .csv files that is stored in an S3 bucket. Anyone who has permissions to access the specified S3 bucket can see the billing report files
  • We can use the Cost and Usage report to track your Reserved Instance Utilization, charges, and allocations
  • For time granularity, we can choose one of the following:
    • Hourly: if we want our items in the report to be aggregated by the hour
    • Daily: if we want our items in the report to be aggregated by the day
    • Monthly: if we want our items in the report to be aggregated by month
  • Report can be automatically uploaded into AWS Redshift and/or AWS QuickSight for analysis

AWS Budgets

  • Allows us to set custom budgets that will alert us when our costs or usage exceed or are forecasted to exceed your budgeted amount
  • With Budgets, we can view the following information:
    • How close our plan is to our budgeted amount or to the free tier limits
    • Our usage to date, including how much you have used of your Reserved Instances and purchased Savings Plans
    • Our current estimated charges from AWS and how much your predicted usage will incur in charges by the end of the month
    • How much of our budget has been used
  • Budget information is updated up to three times a day
  • Types of Budgets:
    • Cost budgets: plan how much we want to spend on a service
    • Usage budgets: plan how much we want to use one or more services
    • RI utilization budgets: define a utilization threshold and receive alerts when your RI usage falls below that threshold
    • RI coverage budgets: define a coverage threshold and receive alerts when the number of your instance hours that are covered by RIs fall below that threshold
  • Budgets can be tracked at the daily, monthly, quarterly, or yearly level, and we can customize the start and end dates
  • Budget alerts can be sent via email and/or Amazon SNS topic
  • First two budgets created are free of charge


Bình luận

{{ }}
Bỏ hay Hay
Male avatar
{{ comment_error }}

Hiển thị thử

Chỉnh sửa


Nguyễn Huy Hoàng

17 bài viết.
10 người follow
{{userFollowed ? 'Following' : 'Follow'}}
Cùng một tác giả
11 4
(Ảnh) Tại hội nghị Build 2016 diễn ra từ ngày 30/3 đến hết ngày 1/4 ở San Francisco, Microsoft đã đưa ra 7 thông báo lớn, quan trọng và mang tầm c...
Nguyễn Huy Hoàng viết hơn 4 năm trước
11 4
7 0
Viết code chạy một cách trơn tru ngay lần đầu tiên là một việc rất khó, thậm chí là bất khả thi. Do đó debug là một kỹ năng vô cùng quan trọng đối ...
Nguyễn Huy Hoàng viết hơn 4 năm trước
7 0
Bài viết liên quan
0 0
FSx FSx For Windows File Servers FSx for Windows are fully managed native Windows file servers/file shares Designed for integration with Wind...
Nguyễn Huy Hoàng viết 7 ngày trước
0 0


{{ comment_count }}

bình luận

{{liked ? "Đã kipalog" : "Kipalog"}}

{{userFollowed ? 'Following' : 'Follow'}}
17 bài viết.
10 người follow

 Đầu mục bài viết

Vẫn còn nữa! x

Kipalog vẫn còn rất nhiều bài viết hay và chủ đề thú vị chờ bạn khám phá!