azure blob_Azure和Python:列出容器Blob

azure blob

Connect to Azure using a simple Python script.

使用简单的Python脚本连接到Azure。

Recently, I had come across a project requirement where I had to list all the blobs present in a Storage Account container and store the blob names in a CSV file.

最近,我遇到了一个项目需求,其中我必须列出存储帐户容器中存在的所有blob,并将这些blob名称存储在CSV文件中。

I would like to share the Python script which I had created for this task keeping this tutorial as simple as possible.

我想分享为该任务创建的Python脚本,以使本教程尽可能简单。

I would be dividing this tutorial in 4 parts:

我将本教程分为四个部分:

  1. Prerequisites

    先决条件
  2. Connection to Azure Storage Container

    连接到Azure存储容器
  3. Listing container blobs

    列出容器Blob
  4. Writing the blob names to a CSV file

    将Blob名称写入CSV文件

先决条件 (Prerequisites)

  • Python (and PIP)

    Python(和PIP)
  • A code editor (I use VS Code)

    代码编辑器(我使用VS Code )

  • A Microsoft Azure account (with storage account created)

    一个Microsoft Azure帐户(已创建存储帐户)

Some packages/modules which would be required, these can be installed by running the following command on PowerShell, Command Prompt or Terminal (if on a Linux system):

某些必需的软件包/模块,可以通过在PowerShell,命令提示符或终端(如果在Linux系统上)上运行以下命令来安装:

  1. Azure blob storage module | Read more

    Azure Blob存储模块| 阅读更多

pip install azure-storage-blob

pip install azure-storage-blob

连接到Azure存储容器 (Connection to Azure Storage Container)

There are many ways to connect to a container. I would be covering connecting using a Connection String, SAS Token and SAS URL. Managed Identity and Key Vault connection methods require some configuration on Azure as well which would be beyond the scope of this tutorial (I would discuss it in another tutorial).

有很多方法可以连接到容器。 我将介绍使用连接字符串SAS令牌SAS URL进行连接托管身份密钥保管库连接方法也需要在Azure上进行一些配置,这超出了本教程的范围(我将在另一篇教程中讨论)。

通过Azure门户获取连接字符串/ SAS令牌 (Get Connection String/SAS Token via Azure Portal)

  • Connection string

    连接字符串

Go to your storage account via the portal, on the left hand panel scroll down, click on Access keys and on the right hand side you will find a pair of Account keys and Connection strings.

通过门户进入您的存储帐户,在左侧面板上向下滚动,单击访问键 ,然后在右侧找到一对帐户键和连接字符串。

Azure Storage Account overview page.
  • SAS Token/URL

    SAS令牌/ URL

Go to your storage account via the portal, on the left hand panel scroll down and click on Shared access signature. You will have to generate the tokens by selecting the appropriate check boxes according to your requirements. See the below screenshots for reference.

通过门户转到您的存储帐户,在左侧面板上向下滚动并单击“ 共享访问签名” 。 您将必须通过根据需要选择相应的复选框来生成令牌。 请参阅以下屏幕截图以供参考。

Image for post
Modify checkbox list according the requirements
根据需求修改复选框列表
Image for post
After clicking the ‘Generate’ button the tokens would appear
单击“生成”按钮后,令牌将出现

Note: The connection string generated here can be also be used. The only difference between this string and the one generated in the above section is that the string (token and URL as well) generated here has an expiry date.

注意:也可以使用此处生成的连接字符串。 该字符串与上一节中生成的字符串之间的唯一区别是,此处生成的字符串(以及令牌和URL)都具有到期日期。

(Code)

  • Connection String

    连接字符串
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient


def azure_connect_conn_string(source_container_connection_string, source_container_name):
    try:
        blob_source_service_client = BlobServiceClient.from_connection_string(source_container_connection_string)
        source_container_client = blob_source_service_client.get_container_client(source_container_name)
        print ("Connection String -- Connected.")
        return source_container_client


    except Exception as ex:
        print ("Error: " + str(ex))


def main():
    try:
        azure_connection_string = input ('Please enter Container connection string: ')
        container_name = input ('Please enter Container name: ')


        ## Connection String
        connection_instance = azure_connect_conn_string(azure_connection_string, container_name)


        print ('Done')
        
    except Exception as ex:
        print ('main | Error: ', ex)


if __name__ == "__main__":
    main()

描述 (Description)

In line 5 & 6 the code asks for the connection string and the container name respectively. The reason behind is this is that we would want to establish a connection to a particular container.

在第5和第6行中,代码分别询问连接字符串和容器名称。 其背后的原因是我们要建立到特定容器的连接。

blob_source_service_client = BlobServiceClient.from_connection_string(source_container_connection_string)

blob_source_service_client = BlobServiceClient.from_connection_string(source_container_connection_string)

In the above snippet, in blob_source_service_client the connection instance to the storage account is stored.

在上述片段中,在blob_source_service_client中存储了到存储帐户的连接实例。

source_container_client = blob_source_service_client.get_container_client(source_container_name)

source_container_client = blob_source_service_client.get_container_client(source_container_name)

Here using the connection instance of the storage account, we are establishing a connection to a specific container and storing the instance as well as returning it via source_container_client.

在这里,使用存储帐户的连接实例,我们正在建立到特定容器的连接并存储该实例,并通过source_container_client返回它。

  • SAS Token

    SAS令牌
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient


def azure_connect_sas_token(token, account_url, source_container_name):
    try:
        blob_source_service_client = BlobServiceClient(account_url = account_url, credential = token)
        source_container_client = blob_source_service_client.get_container_client(source_container_name)
        print ("SAS Token -- Connected.")
        return source_container_client


    except Exception as ex:
        print ("Error: " + str(ex))


def main():
    try:
        azure_sas_token = input ('Please enter SAS Token: ')
        azure_acc_url = input ('Please enter Account URL: ')
        container_name = input ('Please enter Container name: ')


        ## SAS Token
        connection_instance = azure_connect_sas_token(azure_sas_token, azure_acc_url, container_name)


        print ('Done')
        
    except Exception as ex:
        print ('main | Error: ', ex)


if __name__ == "__main__":
    main()

描述 (Description)

In this the requirements are a but different, along with the SAS Token, the storage account URL would be required as well. This would change the function parameter list as well as the function call in main(). The significant difference would be

在这种情况下,要求是不同的,但与SAS令牌一起,也将需要存储帐户URL。 这将更改函数参数列表以及main()中的函数调用。 显着的区别是

blob_source_service_client = BlobServiceClient(account_url = account_url, credential = token)

blob_source_service_client = BlobServiceClient(account_url = account_url, credential = token)

  • SAS URL

    SAS URL
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient


def azure_connect_sas_url(source_container_sas_url, source_container_name):
    try:
        blob_source_service_client = BlobServiceClient(source_container_sas_url)
        source_container_client = blob_source_service_client.get_container_client(source_container_name)
        print ("SAS URL -- Connected.")
        return source_container_client


    except Exception as ex:
        print ("Error: " + str(ex))


def main():
    try:
        azure_sas_url = input ('Please enter SAS URL: ')
        container_name = input ('Please enter Container name: ')


        ## SAS URL
        connection_instance = azure_connect_sas_url(azure_sas_url, container_name)


        print ('Done')
        
    except Exception as ex:
        print ('main | Error: ', ex)


if __name__ == "__main__":
    main()

描述 (Description)

blob_source_service_client = BlobServiceClient(source_container_sas_token)

blob_source_service_client = BlobServiceClient(source_container_sas_token)

The only major difference here is in line 5. We are passing the SAS URL directly to BlobServiceClient. Rest all is same as in Connection String section.

唯一的主要不同是在第5行中。我们将SAS URL直接传递给BlobServiceClient 。 其余全部与“连接字符串”部分中的相同。

Read more at docs.microsoft.com

docs.microsoft.com上了解更多信息

列出容器Blob (Listing container blobs)

In the above section we have seen how to establish and return a connection instance. Let’s jump right in to the next section.

在上一节中,我们已经看到了如何建立和返回连接实例。 让我们直接跳到下一部分。

def container_content_list(connection_instance, blob_path):
    try:
        blob_name_list = []
        source_blob_list = connection_instance.list_blobs(name_starts_with=blob_path)
        print (source_blob_list)
        for blob in source_blob_list:
            blob_name = blob.name.rsplit('/',1)[1]
            blob_name_list.append(blob_name)
            print (blob_name)


        create_csv(blob_name_list)


    except Exception as ex:
        print ("Error: " + str(ex))

描述 (Description)

This function accepts two arguments, first the connection instance which we had created earlier and secondly the path for which the blobs have to be listed. So, the above function will print the blobs present in the container for a particular given path.

该函数接受两个参数,第一个是我们之前创建的连接实例,第二个是必须列出斑点的路径。 因此,上述功能将为特定的给定路径打印容器中存在的斑点。

One important thing to take note of is that source_blob_list is an iterable object. The exact type is: <iterator object azure.core.paging.ItemPaged>, and yes, list_blobs() supports pagination as well.

要注意的重要一件事是source_blob_list是一个可迭代的对象。 确切的类型是: <iterator object azure.core.paging.ItemPaged> ,是的, list_blobs()支持分页。

In line 8, I am appending the blob names in a list. Now this list would be passed to the create_csv(blob_list) function. Discussed in the next section.

在第8行中,我将Blob名称附加在列表中。 现在,此列表将传递给create_csv(blob_list)函数。 在下一节中讨论。

If all the container blobs are to be listed then an empty string (i.e. ‘’) can be set to blob_path or the parameter can itself be omitted and the argument also can be removed from list_blobs().

如果要列出所有容器blob,则可以将一个空字符串(即”)设置为blob_path或者可以自行省略参数,也可以从list_blobs()删除参数。

将Blob名称写入CSV文件 (Writing the blob names to a CSV file)

Coming to the last part, this should be relatively simple and self explanatory.

到最后一部分,这应该是相对简单和易于解释的。

def create_csv(blob_list):
    try:
        header = ["blob_name"]                  # csv file header row (first row)
        file_name = "blob_list_output.csv"      # csv filename


        with open(file_name, 'w', newline='', encoding="utf-8") as csv_file:
            writer = csv.writer(csv_file)
            writer.writerow(header)             # write header
            for blob in blob_list:
                writer.writerow([blob])


        print ('Created CSV.')
    except Exception as ex:
        print ('create_csv | Error: ', ex)

A new CSV file would be generated at the location of the script, with the following contents,

将在脚本的位置生成一个新的CSV文件,其内容如下:

Image for post
I had 3 files in my container location
我的容器位置有3个文件

打破; (break;)

I have created a script which comprises of all of the above code neatly jammed up. Find the script here.

我创建了一个脚本,其中包含上述所有代码,整齐地卡住了。 在此处找到脚本。

I hope that this tutorial was helpful. Reach out to me/comment below for any suggestions or queries.

我希望本教程对您有所帮助。 如有任何建议或疑问,请与我联系/评论在下面。

Thanks for reading.

谢谢阅读。

😁

😁

翻译自: https://medium.com/analytics-vidhya/azure-python-listing-container-blobs-e78cedb81935

azure blob

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值