15. Amazon S3

最新推荐文章于 2024-02-27 09:54:02 发布

JessicaWind

最新推荐文章于 2024-02-27 09:54:02 发布

阅读量1k

点赞数

分类专栏： AWS Certification # AWS Storage

本文链接：https://blog.csdn.net/meiyubaihe/article/details/118969085

版权

AWS Certification 同时被 2 个专栏收录

106 篇文章 2 订阅

订阅专栏

AWS Storage

7 篇文章 1 订阅

订阅专栏

Overview

Amazon Simple Storage Service (Amazon S3) is storage for the Internet. It is designed to make web-scale computing easier.
Amazon S3 has a simple web services interface that you can use to store and retrieve any amount of data, at any time, from anywhere on the web

Buckets

A bucket is a container for objects stored in Amazon S3. Every object is contained in a bucket.
Buckets serve several purposes:
- They organize the Amazon S3 namespace at the highest level.
- They identify the account responsible for storage and data transfer charges.
- They play a role in access control.
- They serve as the unit of aggregation for usage reporting.
An Amazon S3 bucket name is globally unique, and the namespace is shared by all AWS accounts.
The AWS account that creates a resource owns that resource.
- For example, if you create an IAM user in your AWS account and grant the user permission to create a bucket, the user can create a bucket. But the user does not own the bucket; the AWS account that the user belongs to owns the bucket.
Amazon S3 Transfer Acceleration is a bucket-level feature that enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket. Transfer Acceleration takes advantage of the globally distributed edge locations in Amazon CloudFront. As the data arrives at an edge location, the data is routed to Amazon S3 over an optimized network path.
- To access an acceleration-enabled bucket, use bucketname.s3-accelerate.amazonaws.com.
After you create a bucket, you can't change its name or Region.
By default, you can create up to 100 buckets in each of your AWS accounts. If you need additional buckets, you can increase your account bucket limit to a maximum of 1,000 buckets by submitting a service limit increase.
There is no limit to the number of objects that you can store in a bucket.
There is no hierarchy of subbuckets or subfolders.
However, you can infer logical hierarchy using key name prefixes and delimiters as the Amazon S3 console does.
If you need server-side encryption for all of the objects that are stored in a bucket, use a bucket policy

Objects

Objects are the fundamental entities stored in Amazon S3.
Objects consist of object data and metadata.
The data portion is opaque to Amazon S3.
The metadata is a set of name-value pairs that describe the object.
A key is the unique identifier for an object within a bucket.
Every object in a bucket has exactly one key.
The combination of a bucket, key, and version ID uniquely identify each object.
Objects stored in a Region never leave the Region unless you explicitly transfer them to another Region
Amazon S3 provides strong read-after-write consistency for PUTs and DELETEs of objects in your Amazon S3 bucket in all AWSRegions.
You can use versioning to keep multiple versions of an object in the same bucket
each object can be up to 5 TB in size
Object metadata is a set of name-value pairs. After you upload the object, you cannot modify object metadata.
There are two kinds of metadata in Amazon S3: system-defined metadata and user-defined metadata.
With a single PUT api operation, you can upload a single object up to 5 GB in size.
Using the multipart upload API, you can upload a single large object, up to 5 TB in size.
With the Amazon S3 Console, you can upload a single object up to 160 GB in size
In general, when your object size reaches 100 MB, you should consider using multipart uploads instead of uploading the object in a single operation.

Multipart upload process

Multipart upload is a three-step process:
- You initiate the upload: If you want to provide any metadata describing the object being uploaded, you must provide it in the request to initiate multipart upload.
- you upload the object parts, and after you have uploaded all the parts,
- you complete the multipart upload.
- Upon receiving the complete multipart upload request, Amazon S3 constructs the object from the uploaded parts, and you can then access the object just as you would any other object in your bucket.
You can create a copy of your object up to 5 GB in a single atomic operation. However, to copy an object that is greater than 5 GB, you must use the multipart upload API.

Presigned URLs

All objects and buckets are private by default. However, you can use a presigned URL to optionally share objects or enable your customers/users to upload objects to buckets without AWS security credentials or permissions.
You can use presigned URLs to generate a URL that can be used to access your S3 buckets.
When you create a presigned URL, you associate it with a specific action.
You can share the URL, and anyone with access to it can perform the action embedded in the URL as if they were the original signing user.
The URL will expire and no longer work when it reaches its expiration time.
In essence, presigned URLs are a bearer token that grants access to customers who possess them.
When you create a presigned URL, you must provide your security credentials and then specify a bucket name, an object key, an HTTP method (GET to download the object，PUT for uploading objects), and an expiration date and time.

S3 Object Lambda

With S3 Object Lambda you can add your own code to Amazon S3 GET requests to modify and process data as it is returned to an application
S3 Object Lambda uses AWS Lambda functions to automatically process the output of a standard S3 GET request.

S3 Object Lambda diagram.

To create an Object Lambda access point, you need the following resources:
- An IAM policy
- An Amazon S3 bucket
- A standard S3 access point
- An AWS Lambda function

Storage classes

Amazon S3 offers a range of storage classes designed for different use cases.
Amazon S3 STANDARD for general-purpose storage of frequently accessed data
Amazon S3 STANDARD_IA for long-lived, but less frequently accessed data
S3 Glacier for long-term archive

Access Control

Bucket policies provide centralized access control to buckets and objects based on a variety of conditions, including Amazon S3 operations, requesters, resources, and aspects of the request (for example, IP address).
Only the bucket owner is allowed to associate a policy with a bucket
You can use AWS Identity and Access Management (IAM) to manage access to your Amazon S3 resources.
You can control access to each of your buckets and objects using an access control list (ACL).
- You can use ACLs to grant basic read/write permissions to other AWS accounts
- It defines which AWS accounts or groups are granted access and the type of access.
- When you create a bucket or an object, Amazon S3 creates a default ACL that grants the resource owner full control over the resource.
You have the following options for protecting data at rest in Amazon S3:
- Server-Side Encryption – Request Amazon S3 to encrypt your object before saving it on disks in its data centers and then decrypt it when you download the objects.
  - Amazon S3-managed keys (SSE-S3)
  - AWS KMS keys stored in AWS Key Management Service (AWS KMS) (SSE-KMS)
    - Amazon S3 Bucket Keys reduce the cost of Amazon S3 server-side encryption using AWS Key Management Service (SSE-KMS).
    - S3 Bucket Keys decrease the request traffic from Amazon S3 to AWS Key Management Service (AWS KMS) and reduce the cost of SSE-KMS.
  - Server-Side Encryption with Customer-Provided Keys (SSE-C)
  - Server-side encryption encrypts only the object data, not object metadata.
Client-Side Encryption – Encrypt data client-side and upload the encrypted data to Amazon S3. In this case, you manage the encryption process, the encryption keys, and related tools.
- Use a customer master key (CMK) stored in AWS Key Management Service (AWS KMS).
- Use a key that you store within your application.

AWS PrivateLink for Amazon S3

With AWS PrivateLink for Amazon S3, you can provision interface VPC endpoints (interface endpoints) in your virtual private cloud (VPC).
These endpoints are directly accessible from applications that are on premises over VPN and AWS Direct Connect, or in a different AWS Region over VPC peering.
Interface endpoints are represented by one or more elastic network interfaces (ENIs) that are assigned private IP addresses from subnets in your VPC.
Requests that are made to interface endpoints for Amazon S3 are automatically routed to Amazon S3 on the Amazon network.
You can use two types of VPC endpoints to access Amazon S3: gateway endpoints and interface endpoints.
- A gateway endpoint is a gateway that you specify in your route table to access Amazon S3 from your VPC over the AWS network.
- Interface endpoints extend the functionality of gateway endpoints by using private IP addresses to route requests to Amazon S3 from within your VPC, on premises, or from a VPC in another AWS Region using VPC peering or AWS Transit Gateway.
Interface endpoints are compatible with gateway endpoints. If you have an existing gateway endpoint in the VPC, you can use both types of endpoints in the same VPC.

Gateway endpoints for Amazon S3	Interface endpoints for Amazon S3
In both cases, your network traffic remains on the AWS network.
Use Amazon S3 public IP addresses	Use private IP addresses from your VPC to access Amazon S3
Does not allow access from on premises	Allow access from on premises
Does not allow access from another AWS Region	Allow access from a VPC in another AWS Region using VPC peering or AWS Transit Gateway
Not billed	Billed

Using interface endpoints to access Amazon S3 without a gateway endpoint or an internet gateway in the VPC

Data flow diagram shows access from on-premises and in-VPC apps to S3 using an interface endpoint and AWS PrivateLink.

Using gateway endpoints and interface endpoints together in the same VPC to access Amazon S3

Data flow diagram shows access to S3 using gateway endpoints and interface endpoints together.

CORS configuration

To configure your bucket to allow cross-origin requests, you create a CORS configuration.
The CORS configuration is a document with rules that identify the origins that you will allow to access your bucket, the operations (HTTP methods) that will support for each origin, and other operation-specific information.
WIth CORS configuration, instead of accessing a website by using an Amazon S3 website endpoint, you can use your own domain, such as example1.com to serve your content.

Managing your Amazon S3 storage

S3 Versioning

Versioning in Amazon S3 is a means of keeping multiple variants of an object in the same bucket.
You can use the S3 Versioning feature to preserve, retrieve, and restore every version of every object stored in your buckets.
With versioning you can recover more easily from both unintended user actions and application failures.
Buckets can be in one of three states:
- Unversioned (the default)
- Versioning-enabled
- Versioning-suspended
You enable and suspend versioning at the bucket level.
After you version-enable a bucket, it can never return to an unversioned state. But you can suspend versioning on that bucket.
You can permanently delete an object by specifying the version you want to delete. Only the owner of an Amazon S3 bucket can permanently delete a version.
You can add more security by configuring a bucket to enable MFA (multi-factor authentication) delete.

S3 Object Lock

With S3 Object Lock, you can store objects using a write-once-read-many (WORM) model.
Object Lock can help prevent objects from being deleted or overwritten for a fixed amount of time or indefinitely.
Object Lock provides two ways to manage object retention: retention periods and legal holds.
- Retention period — Specifies a fixed period of time during which an object remains locked. During this period, your object is WORM-protected and can't be overwritten or deleted.
- Legal hold — Provides the same protection as a retention period, but it has no expiration date. Instead, a legal hold remains in place until you explicitly remove it. Legal holds are independent from retention periods.
Object Lock works only in versioned buckets, and retention periods and legal holds apply to individual object versions.
When you lock an object version, Amazon S3 stores the lock information in the metadata for that object version. Placing a retention period or legal hold on an object protects only the version specified in the request. It doesn't prevent new versions of the object from being created.

S3 storage classes

S3 Standard — The default storage class. If you don't specify the storage class when you upload an object, Amazon S3 assigns the S3 Standard storage class.
Reduced Redundancy — The Reduced Redundancy Storage (RRS) storage class is designed for noncritical, reproducible data that can be stored with less redundancy than the S3 Standard storage class.
- We recommend that you not use this storage class. The S3 Standard storage class is more cost effective.
- For durability, RRS objects have an average annual expected loss of 0.01 percent of objects.
S3 Intelligent-Tiering is an Amazon S3 storage class designed to optimize storage costs by automatically moving data to the most cost-effective access tier, without operational overhead.
- There are no retrieval fees for S3 Intelligent-Tiering.
- For a small monthly object monitoring and automation fee, S3 Intelligent-Tiering monitors the access patterns and moves the objects automatically from one tier to another.
- It works by storing objects in four access tiers: two low latency access tiers optimized for frequent and infrequent access, and two opt-in archive access tiers designed for asynchronous access that are optimized for rare access.
- S3 Intelligent-Tiering works by monitoring access patterns and then moving the objects that have not been accessed in 30 consecutive days to the Infrequent Access tier. S3 Intelligent-Tiering automatically moves objects that haven’t been accessed for 90 consecutive days to the Archive Access tier, and after 180 consecutive days of no access, to the Deep Archive Access tier.
- In order to access archived objects later, you first need to restore them
The S3 Standard-IA and S3 One Zone-IA storage classes are designed for long-lived and infrequently accessed data.
- S3 Standard-IA and S3 One Zone-IA objects are available for millisecond access (similar to the S3 Standard storage class).
- S3 Standard-IA — Amazon S3 stores the object data redundantly across multiple geographically separated Availability Zones (similar to the S3 Standard storage class)
- S3 One Zone-IA — Amazon S3 stores the object data in only one Availability Zone, which makes it less expensive than S3 Standard-IA.
The S3 Glacier and S3 Glacier Deep Archive storage classes are designed for low-cost data archiving.
- These storage classes offer the same durability and resiliency as the S3 Standard storage class.
- S3 Glacier — Use for archives where portions of the data might need to be retrieved in minutes. Data stored in the S3 Glacier storage class has a minimum storage duration period of 90 days and can be accessed in as little as 1-5 minutes using expedited retrieval.
- S3 Glacier Deep Archive — Use for archiving data that rarely needs to be accessed. Data stored in the S3 Glacier Deep Archive storage class has a minimum storage duration period of 180 days and a default retrieval time of 12 hours.
S3 Outposts：With Amazon S3 on Outposts, you can create S3 buckets on your AWS Outposts and store and retrieve objects on-premises for applications that require local data access, local data processing, and data residency.
- Objects stored in the S3 Outposts (OUTPOSTS) storage class are always encrypted using server-side encryption with Amazon S3 managed encryption keys (SSE-S3)
- You can also explicitly choose to encrypt objects stored in the S3 Outposts storage class using server-side encryption with customer-provided encryption keys (SSE-C)

Storage class	Designed for	Durability (designed for)	Availability (designed for)	Availability Zones	Min storage duration	Min billable object size	Other considerations
S3 Standard	Frequently accessed data	99.999999999%	99.99%	>= 3	None	None	None
S3 Standard-IA	Long-lived, infrequently accessed data	99.999999999%	99.9%	>= 3	30 days	128 KB	Per GB retrieval fees apply.
S3 Intelligent-Tiering	Long-lived data with changing or unknown access patterns	99.999999999%	99.9%	>= 3	30 days	None	Monitoring and automation fees per object apply. No retrieval fees.
S3 One Zone-IA	Long-lived, infrequently accessed, non-critical data	99.999999999%	99.5%	1	30 days	128 KB	Per GB retrieval fees apply. Not resilient to the loss of the Availability Zone.
S3 Glacier	Long-term data archiving with retrieval times ranging from minutes to hours	99.999999999%	99.99% (after you restore objects)	>= 3	90 days	40 KB	Per GB retrieval fees apply. You must first restore archived objects before you can access them.
S3 Glacier Deep Archive	Archiving rarely accessed data with a default retrieval time of 12 hours	99.999999999%	99.99% (after you restore objects)	>= 3	180 days	40 KB	Per GB retrieval fees apply. You must first restore archived objects before you can access them.
RRS (not recommended)	Frequently accessed, non-critical data	99.99%	99.99%	>= 3	None	None	None

Archive retrieval options
- Expedited - Quickly access your data stored in the S3 Glacier storage class or S3 Intelligent-Tiering Archive Access tier when occasional urgent requests for a subset of archives are required.
  - data that is accessed using expedited retrievals is typically made available within 1–5 minutes.
  - Not available for objects in S3 Glacier Deep Archive or S3 Intelligent-Tiering Deep Archive Access
- Standard - Access any of your archived objects within several hours.
  - Standard retrievals typically finish within 3–5 hours for objects stored in the S3 Glacier storage class or S3 Intelligent-Tiering Archive Access tier.
  - They typically finish within 12 hours for objects stored in the S3 Glacier Deep Archive or S3 Intelligent-Tiering Deep Archive Access storage class.
  - Standard retrievals are free for objects stored in S3 Intelligent-Tiering.
- Bulk - The lowest-cost retrieval option in Amazon S3 Glacier, enabling you to retrieve large amounts, even petabytes, of data inexpensively.
  - Bulk retrievals typically finish within 5–12 hours for objects stored in the S3 Glacier storage class or S3 Intelligent-Tiering Archive Access tier.
  - They typically finish within 48 hours for objects stored in the S3 Glacier Deep Archive storage class or S3 Intelligent-Tiering Deep Archive Access tier.
  - Bulk retrievals are free for objects stored in S3 Intelligent-Tiering.
Amazon S3 objects that are stored in the S3 Glacier or S3 Glacier Deep Archive storage classes are not immediately accessible. To access an object in these storage classes, you must restore a temporary copy of it to its S3 bucket for a specified duration (number of days).
With the select type of POST Object restore, you can perform filtering operations using simple Structured Query Language (SQL) statements directly on your data that is archived by Amazon S3 to S3 Glacier.
- Archive objects that are queried by select must be formatted as uncompressed comma-separated values (CSV).
- The archive must not be encrypted with SSE-C or client-side encryption.·

S3 storage lifecycle

An S3 Lifecycle configuration is a set of rules that define actions that Amazon S3 applies to a group of objects.
There are two types of actions:
- Transition actions—Define when objects transition to another Using Amazon S3 storage classes.
- Expiration actions—Define when objects expire. Amazon S3 deletes expired objects on your behalf.

If you create an S3 Lifecycle expiration rule that causes objects that have been in S3 Intelligent-Tiering, S3 Standard-IA, or S3 One Zone-IA storage for less than 30 days to expire, you are charged for 30 days. If you create a Lifecycle expiration rule that causes objects that have been in S3 Glacier storage for less than 90 days to expire, you are charged for 90 days. If you create a Lifecycle expiration rule that causes objects that have been in S3 Glacier Deep Archive storage for less than 180 days to expire, you are charged for 180 days.

S3 inventory

Amazon S3 inventory is one of the tools Amazon S3 provides to help manage your storage.
You can use it to audit and report on the replication and encryption status of your objects for business, compliance, and regulatory needs.
Amazon S3 inventory provides comma-separated values (CSV), ORC or Apache Parquet output files that list your objects and their corresponding metadata on a daily or weekly basis for an S3 bucket or a shared prefix
You can query Amazon S3 inventory using standard SQL
The bucket that the inventory lists the objects for is called the source bucket. The bucket where the inventory list file is stored is called the destination bucket.

Replicating objects

Replication enables automatic, asynchronous copying of objects across Amazon S3 buckets.
Buckets that are configured for object replication can be owned by the same AWS account or by different accounts.
Objects may be replicated to a single destination bucket or multiple destination buckets.
Destination buckets can be in different AWS Regions or within the same Region as the source bucket.
By default, replication only supports copying new Amazon S3 objects after it is enabled
You can use replication to copy existing objects and clone them to a different bucket, but in order to do so, you must contact AWS Support Center.

Cost allocation S3 bucket tags

To track the storage cost or other criteria for individual projects or groups of projects, label your Amazon S3 buckets using cost allocation tags.
A cost allocation tag is a key-value pair that you associate with an S3 bucket.
After you activate cost allocation tags, AWS uses the tags to organize your resource costs on your cost allocation report.
AWS provides two types of cost allocation tags, an AWS-generated tag and user-defined tags.
You must activate both types of tags separately in the Billing and Cost Management console before they can appear in your billing reports.

S3 Select

With Amazon S3 Select, you can use simple structured query language (SQL) statements to filter the contents of an Amazon S3 object and retrieve just the subset of data that you need.
Amazon S3 Select works on objects stored in CSV, JSON, or Apache Parquet format. It also works with objects that are compressed with GZIP or BZIP2 (for CSV and JSON objects only), and server-side encrypted objects.

S3 Batch Operations

S3 Batch Operations can perform a single operation on lists of Amazon S3 objects that you specify.
A job is the basic unit of work for S3 Batch Operations.
A job contains all of the information necessary to run the specified operation on a list of objects.
Operations supported by S3 Batch Operations
Copy objects
- Invoke AWS Lambda function
  - must create new Lambda functions specifically for use with S3 Batch Operations. You can't reuse existing Amazon S3 event-based functions with S3 Batch Operations.
  - The Lambda functions that are used with S3 Batch Operations must accept and return messages.
- Replace all object tags
- Delete all object tags
- Replace access control list
- Restore objects
- S3 Object Lock retention
- S3 Object Lock legal hold
You can assign each job a numeric priority, which can be any positive integer.
- S3 Batch Operations prioritize jobs according to the assigned priority.
- Jobs with a higher priority (or a higher numeric value for the priority parameter) are evaluated first.
To help you manage your S3 Batch Operations jobs, you can add job tags.
- With job tags, you can control access to your Batch Operations jobs and enforce that tags be applied when any job is created.

Monitoring Amazon S3

Amazon S3 is integrated with AWS CloudTrail, a service that provides a record of actions taken by a user, role, or an AWS service in Amazon S3
- If you create a trail, you can enable continuous delivery of CloudTrail events to an Amazon S3 bucket, including events for Amazon S3
Server access logging provides detailed records for the requests that are made to a bucket.
- You can use server access logs for security and access audits, learn about your customer base, or understand your Amazon S3 bill
- Both the source and target buckets must be in the same AWS Region and owned by the same account.
- Amazon S3 periodically collects access log records, consolidates the records in log files, and then uploads log files to your target bucket as log objects.
- Amazon S3 uses a special log delivery account, called the Log Delivery group, to write access logs.
- The purpose of server logs is to give you an idea of the nature of traffic against your bucket.
- It is rare to lose log records, but server logging is not meant to be a complete accounting of all requests.
AWS CloudTrail logs provide a record of actions taken by a user, role, or an AWS service in Amazon S3, while Amazon S3 server access logs provide detailed records for the requests that are made to an S3 bucket.

Log properties	AWS CloudTrail	Amazon S3 server logs
Can be forwarded to other systems (CloudWatch Logs, CloudWatch Events)	Yes
Deliver logs to more than one destination (for example, send the same logs to two different buckets)	Yes
Turn on logs for a subset of objects (prefix)	Yes
Cross-account log delivery (target and source bucket owned by different accounts)	Yes
Integrity validation of log file using digital signature/hashing	Yes
Default/choice of encryption for log files	Yes
Object operations (using Amazon S3 APIs)	Yes	Yes
Bucket operations (using Amazon S3 APIs)	Yes	Yes
Searchable UI for logs	Yes
Fields for Object Lock parameters, Amazon S3 Select properties for log records	Yes
Fields for `Object Size`, `Total Time`, `Turn-Around Time`, and `HTTP Referer` for log records		Yes
Lifecycle transitions, expirations, restores		Yes
Logging of keys in a batch delete operation		Yes
Authentication failures1		Yes
Accounts where logs get delivered	Bucket owner2, and requester	Bucket owner only
Performance and Cost	AWS CloudTrail	Amazon S3 Server Logs
Price	Management events (first delivery) are free; data events incur a fee, in addition to storage of logs	No additional cost in addition to storage of logs
Speed of log delivery	Data events every 5 mins; management events every 15 mins	Within a few hours
Log format	JSON	Log file with space-separated, newline-delimited records

Using analytics and insights

By using Amazon S3 analytics Storage Class Analysis you can analyze storage access patterns to help you decide when to transition the right data to the right storage class.
- You can have multiple storage class analysis filters per bucket, up to 1,000, and will receive a separate analysis for each filter.
- Storage class analysis observes the access patterns of a filtered object data set for 30 days or longer to gather enough information for the analysis.
Amazon S3 Storage Lens aggregates your usage and activity metrics and displays the information in the account snapshot on the Amazon S3 console home (Buckets) page, interactive dashboards, or through a metrics export that you can download in CSV or Parquet format.
- You can use the dashboard to visualize insights and trends, flag outliers, and receive recommendations for optimizing storage costs and applying data protection best practices.
- Amazon S3 Storage Lens provides a single view of usage and activity across your Amazon S3 storage.
- It has drilldown options to generate insights at the organization, account, bucket, object, or even prefix level.
- It analyzes storage metrics to deliver contextual recommendations to help you optimize storage costs and apply best practices for protecting your data.

Hosting a static website using Amazon S3

When you configure your bucket as a static website, the website is available at the AWS Region-specific website endpoint of the bucket.
Website endpoints are different from the endpoints where you send REST API requests.
Depending on your Region, your Amazon S3 website endpoint follows one of these two formats.
- s3-website dash (-) Region ‐ http://bucket-name.s3-website-Region.amazonaws.com
- s3-website dot (.) Region ‐ http://bucket-name.s3-website.Region.amazonaws.com
For your customers to access content at the website endpoint, you must make all your content publicly readable.
Amazon S3 website endpoints do not support HTTPS or access points. If you want to use HTTPS, you can use Amazon CloudFront to serve a static website hosted on Amazon S3.
If you have a registered domain, you can add a DNS CNAME entry to point to the Amazon S3 website endpoint.
Instead of accessing the website using an Amazon S3 website endpoint, you can use your own domain registered with Amazon Route 53 to serve your content
When you configure a bucket as a static website, you must:
- enable static website hosting
- configure an index document
- and set permissions

Best practices

You can scale read and write operations by reading from and writing to multiple prefixes.
You can achieve the best performance by issuing multiple concurrent requests to Amazon S3. Spread these requests over separate connections to maximize the accessible bandwidth from Amazon S3.
Using the Range HTTP header in a GET Object request, you can fetch a byte-range from an object, transferring only the specified portion.
Aggressive timeouts and retries help drive consistent latency.
Use Amazon S3 Transfer Acceleration to Minimize Latency Caused by Distance: Transfer Acceleration uses the globally distributed edge locations in CloudFront to accelerate data transport over geographical distances.
To optimize performance, we recommend that you access the bucket from Amazon EC2 instances in the same AWS Region when possible. This helps reduce network latency and data transfer costs.
If a workload is sending repeated GET requests for a common set of objects, you can use a cache such as Amazon CloudFront, Amazon ElastiCache, or AWS Elemental MediaStore to optimize performance.

Amazon S3 on Outposts

With Amazon S3 on Outposts, you can create S3 buckets on your AWS Outposts and easily store and retrieve objects on premises for applications that require local data access, local data processing, and data residency.
S3 on Outposts provides a new storage class, OUTPOSTS, which uses the S3 APIs, and is designed to store data durably and redundantly across multiple devices and servers on your AWS Outposts.
You communicate with your Outposts bucket using an access point and endpoint connection over a virtual private cloud (VPC)
To get started with Amazon S3 on Outposts, you need an Outpost with Amazon S3 capacity deployed at your facility.
in S3 on Outposts, bucket names are unique to an Outpost and require the Outpost-id along with the bucket name to identify them.
- arn:aws:s3-outposts:<region>:<account>:outpost/<outpost-id>/bucket/<bucket-name>

Amazon S3 Glacier

Amazon S3 Glacier is a secure, durable, and extremely low-cost Amazon S3 storage class for data archiving and long-term backup.
S3 Glacier is one of the many different storage classes for Amazon S3.

Data Model

The Amazon S3 Glacier (S3 Glacier) data model core concepts include vaults and archives.
In S3 Glacier, a vault is a container for storing archives.
- You can store an unlimited number of archives in a vault.
- Note that vault operations are Region specific.
- An AWS account can create up to 1,000 vaults per AWS Region.
An archive can be any data such as a photo, video, or document and is a base unit of storage in S3 Glacier.
- S3 Glacier assigns the archive an ID, which is unique in the AWS Region in which it is stored.
- Upload archives in a single operation – In a single operation, you can upload archives from 1 byte to up to 4 GB in size. However, we encourage S3 Glacier customers to use multipart upload to upload archives greater than 100 MB
- Upload archives in parts – Using the multipart upload API, you can upload large archives, up to about 40,000 GB (10,000 * 4 GB).
- AWS Snowball accelerates moving large amounts of data into and out of AWS using Amazon-owned devices, bypassing the internet.
S3 Glacier jobs can perform a select query on an archive, retrieve an archive, or get an inventory of a vault.
- Retrieving an archive and vault inventory (list of archives) are asynchronous operations in S3 Glacier in which you first initiate a job, and then download the job output after S3 Glacier completes the job.
Because jobs take time to complete, S3 Glacier supports a notification mechanism to notify you when a job is complete.

Vault Lock

S3 Glacier Vault Lock allows you to easily deploy and enforce compliance controls for individual S3 Glacier vaults with a vault lock policy. You can specify controls such as “write once read many” (WORM) in a vault lock policy and lock the policy from future edits. Once locked, the policy can no longer be changed.

Querying Archives

With S3 Glacier Select, you can perform filtering operations using simple Structured Query Language (SQL) statements directly on your data in S3 Glacier.
When you provide an SQL query for a S3 Glacier archive object, S3 Glacier Select runs the query in place and writes the output results to Amazon S3.
With S3 Glacier Select, you can run queries and custom analytics on your data that is stored in S3 Glacier, without having to restore your data to a hotter tier like Amazon S3.

Retrieval Policies

No Retrieval Limit policy: no retrieval quota is set and all valid data retrieval requests are accepted.
Free Tier Only policy: you can keep your retrievals within your daily free tier allowance and not incur any data retrieval cost.
Max Retrieval Rate policy to set a bytes-per-hour retrieval rate quota. ensures that the peak retrieval rate from all retrieval jobs across your account in an AWS Region does not exceed the bytes-per-hour quota you set.

Reference

https://docs.aws.amazon.com/amazonglacier/latest/dev/introduction.html

https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html

JessicaWind

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
15. Amazon S3

OverviewAmazon Simple Storage Service (Amazon S3) is storage for the Internet. It is designed to make web-scale computing easier. Amazon S3 has a simple web services interface that you can use to store and retrieve any amount of data, at any time, from
复制链接

扫一扫

专栏目录