AWS SAP Notes 09 - Caching, Delivery and Edge
aws sap

Nguyễn Huy Hoàng viết ngày 10/10/2021


  • It is a content deliver network (CDN)
  • Its job is to improve the delivery of content from its original location to the viewers of the content
  • It is accomplishing this by caching and by using an efficient global network

CloudFront Terms and Architecture

  • Origin: the source location of the content, can be S3 or custom origin (publicly routable IPv4 address)
  • Distribution: unit of configuration within CloudFront, which gets deployed out to the CloudFront network. Almost everything is configured within the distribution
  • Edge Location: pieces of global infrastructure where the content is cached. They are smaller than AWS regions, but there are way more in number. Can be used to distribute static data only
  • Regional Edge Cache: larger version of an edge location. Provides another layer of caching
  • CloudFront Architecture: alt text
  • If we are using S3 origins, the region edge location is not used if there is a cache miss for an edge location. Only custom origin can use the regional edge cache!
  • Origin fetch: the content is fetched from the origin in case of a cache miss on the edge location
  • Behavior: it is configuration within a distribution. Origins are directly linked to behaviors, behaviors are linked to distributions alt text

CloudFront Behaviors

  • Distributions are units of configuration in CF, lots of high level options are configured on the distribution level:
    • Price class
    • Web Application Firewall attachment
    • Alternate domain names
    • Type of SSL certificate
    • SNI configuration
    • Security policy
    • Supported HTTP versions
    • etc.
  • A single distribution can have one (default behavior) or multiple behaviors
  • Any incoming request is pattern matched against behavior's pattern
  • Once a request is pattern matched against a behavior, it will become subject ot the behavior's configurations which can be the following:
    • Origin or origin group
    • Viewer protocol policy (redirect HTTP to HTTPS)
    • Allowed HTTP methods
    • Field level encryption
    • Cache directives
    • TTL (min, max, default)
    • Restrict viewer access to a behavior (Trusted Signers)
    • Compress objects automatically
    • Associate Lambda@Edge function

TTL and Invalidations

alt text

  • And edge location views an object as not expired when it is within its TTL period
  • More frequent cache hits = lower origin load
  • Default validity period of an object (TTL) is 24 hours. This is defined in the behavior
  • Minimum TTL, maximum TTL: set lower or upper values which an individual object can have
  • Object specific TTL values can be set by the origins using different headers:
    • Cache-Control max-age (seconds): TTL value in seconds for an object
    • Cache-Control s-maxage (seconds): same as max-age
    • Expires (Date and Time): expiration date and time
  • For all of these headers if they specify a value outside of minimum, maximum range, the min/max value will be used
  • Custom headers for S3 origins can be configured in object's metadata
  • Cache invalidations are performed in a distribution and it applies to all edge locations (it takes time)
  • Cache invalidation invalidates every object regardless of the TTL value, based on the invalidation pattern
  • There is a cost allocated when invalidation is applied
  • Instead of invalidation we may consider versioned file names
  • Versioned file names also help to:
    • Avoid using local browser cache in case of a newer file
    • Help improve logging
    • Reduce cost, no need for manual invalidation
  • S3 object versioning and versioned file names should not be confused!

CloudFront and SSL

  • Each CF distribution receives a default domain name (CNAME)
  • HTTPS can be enabled by default for this address
  • CF allows alternate domain names (CNAME)
  • In case of HTTPS we have to add our own matching certificate to CF
  • In case of HTTP, CF should be able to verify that we own the DNS, which is accomplished by also adding an SSL certificate
  • SSL certificates are imported using ACM (AWS Certificate Manager). ACM is a regional service, because of this the certificate for global services (such as CF) needs to be imported in us-east-1 region
  • Handling HTTP and HTTPS:
    • We can allow both HTTP and HTTPS on a distribution
    • We can redirect HTTP to HTTPS
    • We can restrict to only allow HTTPS (any HTTP will fail)
  • There are two sets of connections when using CF:
    • Viewer => CF (viewer protocol)
    • CF => Origin (origin protocol)
  • Both connections need valid public certificates (self-signed certificates will not work)

CloudFront and SNI

  • Historically every SSL enabled site needed its own IP
  • Encryption for HTTP/HTTPS happens on the TCP connection level
  • Host header happens after that at Layer 7. Allows to specify to which application we want to connect in case multiple applications run on the same server
  • TLS encryption happens before deciding which application to access
  • 2003 extension was added to TLS: SNI - allowing to specify which host to be used
  • Older browser do not necessary support SNI. CF needs to allocate dedicated IP addresses for these users, charging extra from us
  • CF can be used in SNI mode (free) or allocating extra IP addresses ($600 per month)
  • CloudFront SSL/SNI architecture: alt text
  • For S3 origin, we don't need to apply certificates for the origin protocol. For ALB/EC2/on-prem we can have to apply public certificates which needs to match the DNS name of the origin

Origin Types and Architecture

  • Origins are the locations from where CF goes to get content
  • If there is a cache miss in case of a request, than an origin fetch occurs
  • Origin groups allow us to add resiliency. We can group origins together an have an origin group used by the behavior
  • Categories of origins:
    • Amazon S3 buckets
    • AWS media package channel endpoint
    • AWS media store container endpoint
    • everything else (web-servers) - custom origins
  • If S3 is configured to be used as a web-server, CF views it as a custom origin
  • S3 origin configurations:
    • Origin Path: use a path instead of the top level of the bucket
    • Origin Access Identity: allows to give CF a virtual identity and use this to access the bucket
    • Origin Custom Headers
    • Viewer protocol policy is also used for the origin protocol
  • Custom origin configurations:
    • Origin Path: point to an origin but use a sub-path
    • Minimum Origin SSL Protocol: best practice always to select the latest
    • Origin Protocol Policy: HTTP, HTTPS or Match Viewer
    • HTTP/HTTPS Port: we can use arbitrary port instead of 80 or 443
    • Origin Custom Headers: can be used for security to restrict access only from CF

Caching Performance and Optimization

  • Cache Hit: object is available in the cache in the ede location
  • Cache Miss: object is not available in the cache, origin fetch is required
  • Content retrieval techniques:
    • When we require an object from CF, we usually request it using its name
    • We can use query string parameters as well, example index.html&lang=en
    • Cookies
    • Request Headers
  • When using CF all of this data reaches CloudFront first and than can be forwarded to the origin
  • We can configure CF to cache data based on some or all of these request properties
  • When using CF forward only the headers needed by the application and cache data based only on what can change the object
  • The more things are involved in caching, the less efficient the process is

CloudFront Security

OAI and Custom Origins

alt text

  • S3 origins: alt text
    • OAI - Origin Access Identity: is a type of identity, it can be associated with CloudFront distributions
    • Essentially the CloudFront distributions "becomes" the OAI, meaning that this identity can be used it S3 bucket policies
    • Common pattern is to lock the S3 bucket to be only accessible to CloudFront
    • The edge location gain the attached OAI identity, meaning they will be able to access the bucket
    • Direct access from the end-user to the bucket content can be disabled
  • Custom origins: alt text
    • We can not use OAI to control access
    • We can utilize custom headers, which will be protected by the HTTPS protocol. CloudFront will be configured to send this custom header
    • Other way to handle CloudFront security from custom origins is to determine the IP ranges from which the request is coming from. CloudFront IP ranges are publicly available

Private Distributions

alt text

  • CloudFront can run in 2 different modes:
    • Public: can be accessed by any viewer
    • Private: requests to CloudFront needs to be made with a signed url or cookie
  • If the CloudFront distribution has only 1 behavior the whole distribution is considered to be either public or private
  • In case of multiple behaviors: each behavior can be either public or private
  • In order to enable private distribution of content, we need to create a CloudFront Key by an Account Root User. That account is added as a Trusted Signer
  • Signed URLs provide access to one particular object. They are also used for legacy RTMP distributions which can not use cookies
  • Signed cookies can provide access to groups of objects or all files of a particular type

CloudFront Geo Restriction

  • Gives a way to restrict content to a particular location
  • They are 2 types of restriction:
    • CloudFront Geo Restriction: alt text
      • Whitelist or Blacklist countries
      • Only works with countries!
      • Uses a GeoIP database with 99.8% accuracy
      • Applies to the entire distribution
    • 3rd Party Geolocation: alt text
      • Completely customizable, can be used to filter on lots of other attributes, example: username, user attributes, etc.
      • Requires an application server in front of CloudFront, which controls weather the customer has access to the content or not
      • The application generates a signed url/cookie which is returned to the browser. This can be sent to CloudFront for authorization

Field-Level Encryption

  • Field-Level encryption happens at the edge
  • We can configure encryption using a public key for certain fields from the request
  • Field-Level encryption happens separately from the HTTPS tunnel
  • A private key is needed to decrypt individual fields
  • Field-Level encryption architecture: alt text


alt text

  • Lambda@Edge allows us to run lightweight Lambda functions at the edge locations
  • These Lambda functions allow us to adjust data between the viewer and the origin
  • They don't have the full Lambda feature set:
    • Currently only NodeJS and Python are supported
    • Functions don't have access to any resources in a VPC, they run in AWS public space
    • Lambda Layers are not supported
  • They have different size and duration limits compared to classic Lambda functions:
    • Viewer side: 128MB/5seconds
    • Origin side: same as classic Lambda/30seconds
  • Lambda@Edge use cases:


  • It is an in-memory database for application which need high-end performance
  • It is orders of magnitude faster than a classic DB, but is not persistence
  • ElastiCache provides 2 types of databases: Managed Redis and Memcached as a service
  • ElastiCache can be used for read heavy workloads with low latency requirements
  • Reduces database workloads, by this reducing cost accumulated by heavy database usage
  • Can be used to store session date, making stateful applications stateless
  • Using ElastiCache requires application code changes!

Redis vs Memcached

  • Both offer sub-millisecond access to data
  • Memcached supports simple data structures (string), while Redis can support more advanced type of data: lists, sets, sorted sets, hashes, bit arrays, etc.
  • Redis supports replication of data across multiple AZs, Memcached supports multiple nodes with manual sharding, but it does not supports "true" replication across AZs
  • Redis supports backups and restores, Memcached does not support persistance
  • Memcached is multi-threaded by design, can offer better performance
  • Redis supports transactions (multiple operations at once)
  • Both of these engines can support a ranges of instance types


Bình luận

{{ }}
Bỏ hay Hay
Male avatar
{{ comment_error }}

Hiển thị thử

Chỉnh sửa


Nguyễn Huy Hoàng

17 bài viết.
10 người follow
{{userFollowed ? 'Following' : 'Follow'}}
Cùng một tác giả
11 4
(Ảnh) Tại hội nghị Build 2016 diễn ra từ ngày 30/3 đến hết ngày 1/4 ở San Francisco, Microsoft đã đưa ra 7 thông báo lớn, quan trọng và mang tầm c...
Nguyễn Huy Hoàng viết hơn 4 năm trước
11 4
7 0
Viết code chạy một cách trơn tru ngay lần đầu tiên là một việc rất khó, thậm chí là bất khả thi. Do đó debug là một kỹ năng vô cùng quan trọng đối ...
Nguyễn Huy Hoàng viết hơn 4 năm trước
7 0
1 0
MultiFactor Authentication (MFA) Factor: different piece of evidence which proves the identity Factors: Knowledge: something we as users know: ...
Nguyễn Huy Hoàng viết 3 tháng trước
1 0
Bài viết liên quan
0 0
FSx FSx For Windows File Servers FSx for Windows are fully managed native Windows file servers/file shares Designed for integration with Wind...
Nguyễn Huy Hoàng viết 3 tháng trước
0 0


{{ comment_count }}

bình luận

{{liked ? "Đã kipalog" : "Kipalog"}}

{{userFollowed ? 'Following' : 'Follow'}}
17 bài viết.
10 người follow

 Đầu mục bài viết

Vẫn còn nữa! x

Kipalog vẫn còn rất nhiều bài viết hay và chủ đề thú vị chờ bạn khám phá!