boto3 s3_使用Boto3与Amazon S3存储桶一起使用

本文档介绍了如何使用Python库Boto3与Amazon S3进行交互,包括上传、下载、管理文件等操作。
摘要由CSDN通过智能技术生成

boto3 s3

Amazon Simple Storage Service, or S3, offers space to store, protect, and share data with finely-tuned access control. When working with Python, one can easily interact with S3 with the Boto3 package. In this post, I will put together a cheat sheet of Python commands that I use a lot when working with S3. I hope you will find it useful.

Amazon Simple Storage Service或S3通过微调的访问控制提供了存储,保护和共享数据的空间。 使用Python时,可以轻松地通过Boto3软件包与S3进行交互。 在这篇文章中,我将整理一份使用S3时经常使用的Python命令速查表。 希望您会发现它有用。

Image for post

Let’s kick off with a few words about the S3 data structures. On your own computer, you store files in folders. On S3, the folders are called buckets. Inside buckets, you can store objects, such as .csv files. You can refer to buckets by their name, while to objects — by their key. To make the code chunks more tractable, we will use emojis. Here’s the key to symbols:

让我们从关于S3数据结构的几句话开始。 在您自己的计算机上,您将文件存储在folder中 。 在S3上,这些文件夹称为buckets 。 在存储桶中,您可以存储对象 ,例如.csv文件。 您可以通过存储桶的名称来引用存储桶,而可以通过其来引用对象 为了使代码块更易于处理,我们将使用表情符号。 这是符号的关键:

🗑 — a bucket’s name, e.g. “mybucket”🔑 — an object’s key, e.g. "myfile_s3_name.csv"📄 - a file's name on your computer, e.g. "myfile_local_name.csv"

Both 🗑 and 🔑 can either denote a name already existing on S3 or a name you want to give a newly created bucket or object. 📄 denotes a file you have or want to have somewhere locally on your machine.

🗑和🔑都可以表示S3上已经存在的名称,也可以表示您要赋予新创建的存储桶或对象的名称。 📄表示您已经或希望在计算机上本地存放的文件。

import boto3


🗑 = "mybucket" 
🔑 = "myfile_s3_name.csv"
📄 = "myfile_local_name.csv"
Image for post

设置客户 (Setting up a client)

To access any AWS service with Boto3, we have to connect to it with a client. Here, we create an S3 client. We specify the region in which our data lives. We also have to pass the access key and the password, which we can generate in the AWS console, as described here.

要使用Boto3访问任何AWS服务,我们必须通过客户端连接到该服务。 在这里,我们创建一个S3客户端。 我们指定数据所在的区域。 我们还必须通过访问密钥和密码,我们可以在AWS控制台产生,如所描述这里

s3 = boto3.client("s3", 
                  region_name='us-east-1', 
                  aws_access_key_id=AWS_KEY_ID, 
                  aws_secret_access_key=AWS_SECRET)

值区:清单,建立及删除 (Buckets: listing, creating & deleting)

To list the buckets existing on S3, delete one or create a new one, we simply use the list_buckets(), create_bucket() and delete_bucket() functions, respectively.

要列出S3上现有的存储桶,删除一个存储桶或创建一个新存储桶,我们分别简单地使用list_buckets()create_bucket()delete_bucket()函数。

# List buckets
bucket_response = s3.list_buckets()
buckets = bucket_response["Buckets"]


# Create and delete buckets
bucket = s3.create_bucket(Bucket=🗑)
response = s3.delete_bucket(Bucket=🗑)

对象:列出,下载,上传和删除 (Objects: listing, downloading, uploading & deleting)

Within a bucket, there reside objects. We can list them with list_objects(). The MaxKeys argument sets the maximum number of objects listed; it’s like calling head() on the results before printing them. We can also list only objects whose keys (names) start with a specific prefix using the Prefix argument.We can use upload_file() to upload a file called 📄 to S3 under the name 🔑. Similarly, download_file() will save a file called 🔑 on S3 locally under the name 📄.To get some metadata about an object, such as creation or modification time, permission rights or size, we can call head_object().Deleting an object works the same way as deleting a bucket: we just need to pass the bucket name and object key to delete_object().

在存储桶中,存在对象。 我们可以使用list_objects()列出它们。 MaxKeys参数设置列出的最大对象数。 就像在打印结果之前调用head()一样。 我们还可以使用Prefix参数列出仅键(名称)以特定前缀开头的对象。我们可以使用upload_file()将名为📄的文件上传到S3下,名称为🔑。 同样, download_file()将在本地S3上以📄名称保存一个名为🔑的文件。要获取有关对象的一些元数据,例如创建或修改时间,权限或大小,我们可以调用head_object()与删除存储桶的方法相同:我们只需要将存储桶名称和对象密钥传递给delete_object()

# List objects present in a bucket
response = s3.list_objects(Bucket=🗑,
                           MaxKeys=10, 
                           Preffix="only_files_starting_with_this_string")


# Uploading and downloading files
s3.upload_file(Filename=📄, Bucket=🗑, Key=🔑)
s3.download_file(Filename=📄, Bucket=🗑, Key=🔑)


# Get object's metadata (last modification time, size in bytes etc.)
response = s3.head_object(Bucket=🗑, Key=🔑)


# Delete object
s3.delete_object(Bucket=🗑, Key=🔑)

将多个文件加载到单个数据框中 (Loading multiple files into a single data frame)

Oftentimes, data are spread across several files. For instance, you can have sales data for different stores or regions in different CSV files with matching column names. For analytics or modeling, we might want to have all these data in a single pandas data frame. The following code chunk will do just that: download all data files in 🗑 whose name starts with “some_prefix” and put it into a single data frame.

通常,数据分布在多个文件中。 例如,您可以在具有匹配列名称的不同CSV文件中拥有不同商店或区域的销售数据。 对于分析或建模,我们可能希望将所有这些数据放在一个熊猫数据框中。 下面的代码块将做到这一点:在data中下载所有名称以“ some_prefix”开头的数据文件,并将其放入单个数据帧中。

df_list = []
response = s3.list_objects(Bucket=🗑, Preffix="some_prefix")
request_files = response["Contents"]
for file in request_files:
    obj = s3.get_object(Bucket=🗑, Key=file["Key"])
    obj_df = pd.read_csv(obj["Body"])
    df_list.append(obj_df)
df = pd.concat(df_list)

使用访问控制列表(ACL)公开或私有对象 (Making objects public or private with access control lists (ACLs))

One way to manage access rights on S3 is with access control lists or ACLs. By default, all files are private, which is the best (and safest!) practice. You can specify a file to be "public-read", in which case, everyone can access it, or "private", making yourself the only authorized person, among others. Look here for the exhaustive list of access options. You can set a file’s ACL both when it’s already on S3 using put_object_acl() as well as upon upload via passing appropriate ExtraArgs to upload_file().

在S3上管理访问权限的一种方法是使用访问控制列表或ACL。 默认情况下,所有文件都是私有的,这是最佳(也是最安全的!)做法。 您可以将文件指定为"public-read" ,在这种情况下,每个人都可以访问该文件,也可以将其指定为"private" ,从而使自己成为唯一的授权人。 在此处查看访问选项的详尽列表。 您可以使用put_object_acl()将文件的ACL设置在S3上,也可以通过将适当的ExtraArgs传递到upload_file()来设置文件的ACL。

# Make existing object publicly available
s3.put_object_acl(Bucket=🗑, 
                  Key=🔑, 
                  ACL="public-read")


# Make an object public available on upload
s3.upload_file(Filename=📄, 
               Bucket=🗑, 
               Key=🔑, 
               ExtraArgs={"ACL": "private"})

使用预先签名的URL访问私人文件 (Accessing private files with pre-signed URLs)

You can also grant anyone short-time access to a private file by generating a temporary pre-signed URL using the generate_presigned_url() function. This will yield a string that can be inserted right into pandas’ read_csv(), for instance, to download the data. You can specify how long this temporary access link will be valid via the ExpiresIn argument. Here, we create a link valid for 1 hour (3600 seconds).

您还可以通过使用generate_presigned_url()函数生成一个临时的预签名URL来授予任何人短期访问私有文件的权限。 这将产生一个字符串,可以直接将其插入熊猫的read_csv() ,以下载数据。 您可以通过ExpiresIn参数指定此临时访问链接有效的ExpiresIn 。 在这里,我们创建一个有效期为1小时(3600秒)的链接。

share_url = s3.generate_presigned_url(ClientMethod="get_object", 
                                      ExpiresIn=3600,
                                      Params={"Bucket": 🗑, "Key": 🔑})
pd.read_csv(share_url)
Image for post

Thanks for reading! I hope you have learned something useful that will boost your projects 🚀

谢谢阅读! 我希望您学到了一些有用的东西,可以促进您的项目projects

If you liked this post, try one of my other articles. Can’t choose? Pick one of these:

如果您喜欢这篇文章,请尝试我的其他文章之一 。 无法选择? 选择以下之一:

翻译自: https://towardsdatascience.com/working-with-amazon-s3-buckets-with-boto3-785252ea22e0

boto3 s3

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值