AWS SAP Notes 12 - Disaster Recovery and Business Continuity in AWS
aws sap

Nguyễn Huy Hoàng viết ngày 12/10/2021

DR/BC Architecture

  • Effective DR/BC costs money all of the time
  • We need some type of extra resources which will increase costs
  • Executing disaster recover/business continuity process takes time. How long it takes depends on the type of DR/BC in usage
  • DR/BC is trade-off between the time and costs

Types of Disaster Recovery

alt text
alt text
alt text
alt text

  • Backup and Restore: alt text
    • Data is constantly backup up at the primary site
    • The only costs are backup media and management, no ongoing space infrastructure costs
    • Has little or no upfront costs, but implies a significant time for recovery alt text
  • Pilot Light: alt text
    • Primary site is running at full
    • Pilot Light implies running a secondary environment only having the absolute minimum services running
    • In the event of a disaster the shutdown services can be spined up, no costs are expected to be inquired if there is no need for DR alt text
  • Warm Standby: alt text
    • Primary site is running at full, everything is replicated on the backup site at a smaller scale
    • Ready to be increased in size when failover is required
    • It is faster than pilot light approach and cheaper than active/active approach alt text
  • Active/Active (Multi-site): alt text
    • Primary site is entirely replicated on a secondary site
    • Data is constantly replicated from the primary site to the backup
    • Costs are generally 200%
    • There is no concept of recovery time
    • Additional benefits:
      • Load balancing across environments
      • Improved HA and performance alt text
  • Summary:
    • Backups: cheap and slow
    • Pilot Light: fairly cheap but faster
    • Warm Standby: costly, but quick to recover
    • Active/Active: expensive, 0 recovery time

DR Architecture - Storage

  • Instance Store volumes:
    • Most high risk form of storage available
    • If the host fails, the instance store volumes will also fail
    • Should be viewed as temporary and unreliable storage
  • EBS:
    • Volumes are created an run in a single AZ (failure of AZ means failure of EBS volumes)
    • Snapshots of EBS are stored in S3, will increase reliability
  • S3:
    • Data is replicated across multiple AZs
    • One-Zone: not regionally resilient
  • EFS:
    • EFS file systems are replicated across multiple AZs
    • They are by default regionally resilient - failure of a region means failure of EFS volumes
  • DR Architecture - Storage: alt text

DR Architecture - Compute

  • EC2:
    • If the host fails, EC2 instances fails as well
    • An EC2 instance by itself is not resilient in any way
    • If the failure is limited to one host, the instance can move to another host in the AZ. The EBS volume can be presented to the new instance
    • Auto Scaling Group: can be placed in multiple AZs, if the instance fails in one AZ, the ASG's role is to recreate them in another
  • ECS:
    • Can run it 2 modes: EC2 and Fargate
    • EC2 mode: DR architecture is similar as above
    • Fargate mode: containers are running on a cluster host managed by AWS being injected in VPCs
    • Fargate can provide automatic HA by running things in different AZs
  • Lambda:
    • By default runs in public mode
    • In VPC mode (private) Lambdas are injected in VPCs. If an AZ fails, Lambda can be automatically injected in another subnet in a different AZ
    • It will take the failure if a region in order for Lambda to be impacted
  • DR Architecture - Compute: alt text

DR Architecture - Databases

  • Running databases on EC2 should be done in certain cases only!
  • DynamoDB:
    • Data is replicated between multiple nodes in different AZs
    • Failure can occur only if the entire region fails
  • RDS:
    • Requires creation of a subnet group which specifies which subnet can be used in a VPC for a DB
    • Normal RDS (not Aurora) involves a single instance or primary and standby instance running in different AZs
    • Data is stored in local storage for each instance, data is replicated asynchronously to the standby
    • If the primary instance fails, automatic fallback is done to the standby
    • In case of Aurora, we can have one or more replicas in each AZs
    • Aurora uses a cluster storage architecture, storage is shared between running DB instances
    • Aurora can resist failures up to the entire region failure (not using Aurora Global)
  • Global Databases:
    • DynamoDB Global Table: multi master replication between regional replicas.
    • Aurora Global Databases: read-write cluster in one region, secondary read cluster in other regions. Replication happens at the storage layer, no additional load placed on the DB
    • Cross Region Read Replicas for RDS: asynchronous replication but not done on the storage layer
  • DR Architecture - Databases: alt text

DR Architecture - Networking

  • Networking at local level:
    • VPC are regionally resilient
    • Certain gateway objects like VPC Router and IGW are also regionally resilient
    • Subnets are tied to AZ they are located in, if the AZ fails, the subnet fails as well
    • LB: regional services, nodes are deployed into each AZ we select
    • By using a LB we can route traffic to AZs which are healthy
  • Interface endpoint:
    • Are tied to an AZ
    • Multiple interface endpoints can be deployed into different AZs
  • DR Architecture - Networking: alt text
  • Global Networking: alt text alt text
    • Route53 can route globally to different regions (failover routing)
Bình luận

{{ }}
Bỏ hay Hay
Male avatar
{{ comment_error }}

Hiển thị thử

Chỉnh sửa


Nguyễn Huy Hoàng

17 bài viết.
10 người follow
{{userFollowed ? 'Following' : 'Follow'}}
Cùng một tác giả
11 4
(Ảnh) Tại hội nghị Build 2016 diễn ra từ ngày 30/3 đến hết ngày 1/4 ở San Francisco, Microsoft đã đưa ra 7 thông báo lớn, quan trọng và mang tầm c...
Nguyễn Huy Hoàng viết hơn 4 năm trước
11 4
7 0
Viết code chạy một cách trơn tru ngay lần đầu tiên là một việc rất khó, thậm chí là bất khả thi. Do đó debug là một kỹ năng vô cùng quan trọng đối ...
Nguyễn Huy Hoàng viết hơn 4 năm trước
7 0
1 0
MultiFactor Authentication (MFA) Factor: different piece of evidence which proves the identity Factors: Knowledge: something we as users know: ...
Nguyễn Huy Hoàng viết 4 tháng trước
1 0
Bài viết liên quan
0 0
FSx FSx For Windows File Servers FSx for Windows are fully managed native Windows file servers/file shares Designed for integration with Wind...
Nguyễn Huy Hoàng viết 4 tháng trước
0 0
0 0
CloudFront It is a content deliver network (CDN) Its job is to improve the delivery of content from its original location to the viewers of the...
Nguyễn Huy Hoàng viết 4 tháng trước
0 0
0 0
Kinesis Is a scalable streaming service, designed to ingest lots of data Producers send data into a Kinesis stream Streams can scale from low...
Nguyễn Huy Hoàng viết 4 tháng trước
0 0


{{ comment_count }}

bình luận

{{liked ? "Đã kipalog" : "Kipalog"}}

{{userFollowed ? 'Following' : 'Follow'}}
17 bài viết.
10 người follow

 Đầu mục bài viết

Vẫn còn nữa! x

Kipalog vẫn còn rất nhiều bài viết hay và chủ đề thú vị chờ bạn khám phá!