对接AWS-S3获取文件
一、Amazon S3 简介
Amazon S3专为从任意位置检索任意数量的数据而构建的对象存储
参考:
- https://aws.amazon.com/cn/s3/?nc=sn&loc=0#
- https://docs.aws.amazon.com/s3/?id=docs_gateway
二、业务场景
对接外部系统,外部系统将接口文件上传到Amazon S3桶指定路径下。SMR读取该路径下文件进行解析。
三、技术设计
触发机制
方式一:
Amazon S3 Event Notifications
Amazon S3 can send event notification messages to the following destinations.
- Amazon Simple Queue Service (Amazon SQS) queues
通过S3事件通知触发Amazon SQS消息队列,使用@SqsListener监听该队列。
import org.springframework.cloud.aws.messaging.listener.SqsMessageDeletionPolicy;import org.springframework.cloud.aws.messaging.listener.annotation.SqsListener;@SqsListener(value = "${cux.sqs.url}", deletionPolicy = SqsMessageDeletionPolicy.ON_SUCCESS)public void handle(String message) throws IOException, ClassNotFoundException { // do something...}
deletionPolicy 删除策略
SqsMessageDeletionPolicy.ON_SUCCESS 成功后删除
多pod重复消费问题:
参考:
- https://docs.aws.amazon.com/cli/latest/reference/sqs/index.html
查看队列属性aws sqs get-queue-attributes --queue-url [value] --attribute-names All
{ "Attributes": { "QueueArn": "queue", "ApproximateNumberOfMessages": "0", "ApproximateNumberOfMessagesNotVisible": "0", "ApproximateNumberOfMessagesDelayed": "0", "CreatedTimestamp": "1606356340", "LastModifiedTimestamp": "1617323005", "VisibilityTimeout": "30", "MaximumMessageSize": "2048", "MessageRetentionPeriod": "86400", "DelaySeconds": "90", "Policy": "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Principal\":{\"AWS\":[\"1XSXMLONEWUOJ\",\"1MYJ202SN8KCP\"]},\"Action\":\"sqs:*\",\"Resource\":\"queue\"},{\"Effect\":\"Allow\",\"Principal\":\"*\",\"Action\":\"sqs:SendMessage\",\"Resource\":\"queue\",\"Condition\":{\"ArnEquals\":{\"aws:SourceArn\":\"interface\"}}}]}", "ReceiveMessageWaitTimeSeconds": "10", "SqsManagedSseEnabled": "false" }}
VisibilityTimeout:返回队列的可见性超时
参考:
- https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-visibility-timeout.html
当使用者接收并处理来自队列的消息时,该消息将保持在队列中。Amazon SQS不会自动删除该消息。因为Amazon SQS是一个分布式系统,所以不能保证使用者实际接收到消息(例如,由于连接问题,或由于使用者应用程序中的问题)。因此,消费者在接收和处理消息后必须从队列中删除消息。
在接收到消息后,该消息立即保留在队列中。为了防止其他消费者再次处理该消息,Amazon SQS设置了一个可见性超时,在此期间Amazon SQS阻止其他消费者接收和处理该消息。消息的默认可见性超时是30秒。最小值为0秒。最长为12小时。
方式二:
调度任务定时去抓取Amazon S3桶指定路径下的对象。
获取对象
public void getContent() throws IOException { AmazonS3 amazonS3 = AmazonS3Client.builder() .withEndpointConfiguration(new AwsClientBuilder.EndpointConfiguration(serviceEndpoint, signingRegion)) .withClientConfiguration(new ClientConfiguration()) .disableChunkedEncoding() .withPathStyleAccessEnabled(true) .build(); ObjectListing objectListing = amazonS3.listObjects(bucketName, path); // exclude empty object objectListing.getObjectSummaries().removeIf(s3ObjectSummary -> s3ObjectSummary.getSize() == 0); // exclude archive path objectListing.getObjectSummaries() .removeIf( s3ObjectSummary -> s3ObjectSummary.getKey().startsWith(frInterface.path() + ARCHIVE_PATH) ); for (S3ObjectSummary s3ObjectSummary : objectListing.getObjectSummaries()) { ObjectMetadata metadata = amazonS3.getObjectMetadata(bucketName, s3ObjectSummary.getKey()); String kmsId = null; String matDesc = metadata.getUserMetadata().get("x-amz-matdesc"); if (StringUtils.isNotBlank(matDesc)) { Map<String,String> map = JSON.parseObject(matDesc, Map.class); String kmsCmkId = map.get("kms_cmk_id"); kmsId = kmsCmkId.split("/")[1]; } S3Object s3Object; // if kms is not null, then call encrypt client if (StringUtils.isNotBlank(kmsId)) { AmazonS3EncryptionClient encryptionClient = (AmazonS3EncryptionClient) AmazonS3EncryptionClientBuilder .standard() .withEncryptionMaterials(new KMSEncryptionMaterialsProvider(kmsId)) .withCryptoConfiguration(new CryptoConfiguration() .withAwsKmsRegion(Region.getRegion(Regions.AP_NORTHEAST_1))) .withRegion(Regions.AP_NORTHEAST_1) .build(); s3Object = encryptionClient.getObject(bucketName, s3ObjectSummary.getKey()); } else { s3Object = amazonS3.getObject(bucketName, s3ObjectSummary.getKey()); } InputStream objectContent = s3Object.getObjectContent(); // do something... }}
接口文件解析完后,移动文件到已处理路径下
private void moveObject(List<S3ObjectSummary> objectSummaries){ // archive file for (S3ObjectSummary s3ObjectSummary : objectSummaries) { String key = s3ObjectSummary.getKey(); int lastIndex = key.lastIndexOf("/"); String archiveKey = key.substring(0, lastIndex) + "/" + ARCHIVE_PATH + "/" + key.substring(lastIndex + 1); // move object to archive path amazonS3.copyObject( bucketName, key, bucketName, archiveKey ); // delete object in original path amazonS3.deleteObject(bucketName, key); }}
四、Amazon S3 CLI常用命令
参考:https://docs.aws.amazon.com/cli/latest/reference/s3/
复制aws s3 ls s3://bucket #列表aws s3 cp s3://bucket . #拷贝aws s3 mv s3://bucket . #移动aws s3 rm s3://bucket #删除