Moving files between Storage Accounts with Azure Functions and Event Grid

Gisela Torres
Cloud Computing
June 9, 2020

Another task that many developers face when working with Azure Storage is moving files between accounts. Many clients upload files to the cloud, process them, and then store the original in cold storage, which makes it cheaper. In this article I’ll tell you how to transfer blobs between accounts with the help of Azure Functions and Event Grid.


Today, there is no “move” action as such, programmatically speaking, but it’s necessary to carry out the two actions that imply a movement per se, that is: copy the file to the destination and, subsequently, remove it at the source.

There can be many reasons for copying a file and that can be achieved in a number of ways. For this example, I'm going to use a script in Python and the trigger for Event Grid so that simply leaving the file in a storage account starts the whole process.


The example code is as follows:



import json
import logging
import os

import azure.functions as func
from azure.storage.blob import BlobServiceClient, generate_blob_sas, AccessPolicy, BlobSasPermissions
from azure.core.exceptions import ResourceExistsError
from datetime import datetime, timedelta

def main(event: func.EventGridEvent):
    result = json.dumps({
        'id': event.id,
        'data': event.get_json(),
        'topic': event.topic,
        'subject': event.subject,
        'event_type': event.event_type,
    })

    logging.info('Python EventGrid trigger processed an event: %s', result)

    blob_service_client = BlobServiceClient.from_connection_string(
        os.environ.get('ARCHIVE_STORAGE_CONNECTION_STRING'))

    # Get the URL and extract the name of the file and container
    blob_url = event.get_json().get('url')
    logging.info('blob URL: %s', blob_url)
    blob_name = blob_url.split("/")[-1].split("?")[0]
    container_name = blob_url.split("/")[-2].split("?")[0]
    archived_container_name = container_name + '-' + os.environ.get('AZURE_STORAGE_ARCHIVE_CONTAINER')

    blob_service_client_origin = BlobServiceClient.from_connection_string(os.environ.get('ORIGIN_STORAGE_CONNECTION_STRING'))

    blob_to_copy = blob_service_client_origin.get_blob_client(container=container_name, blob=blob_name)

    sas_token = generate_blob_sas(
        blob_to_copy.account_name,
        blob_to_copy.container_name,
        blob_to_copy.blob_name,
        account_key=blob_service_client_origin.credential.account_key,
        permission=BlobSasPermissions(read=True),
        start=datetime.utcnow() + timedelta(seconds=1),
        expiry=datetime.utcnow() + timedelta(hours=1))

    logging.info('sas token: %s',sas_token)

    archived_container = blob_service_client.get_container_client(archived_container_name)

    # Create new Container
    try:
        archived_container.create_container()
    except ResourceExistsError:
        pass

    copied_blob = blob_service_client.get_blob_client(
        archived_container_name, blob_name)

    blob_to_copy_url = blob_url + '?' + sas_token

    logging.info('blob url: ' + blob_to_copy_url)

    # Start copy
    copied_blob.start_copy_from_url(blob_to_copy_url)


As you can see, once the event has been captured by the function, the first thing it does is display the information that comes as part of the event. I have installed the azure.storage.blob module to create two clients: one that uses the connection string from the source and one from the destination. I need the one from the origin because I have my blobs in private containers and I need to create a token for their access. The destination client is necessary to make the copy.


The information about the blob you need to move is taken from the url property, which comes as part of the event, and then I create a container in the destination ending in -archived (I have also included this value as another environment variable, in case you want to use another name). Once this is done, I start the copy of the blob, which is done by the Azure Storage service itself asynchronously.

In the file requirements.txt I have added the following modules:



azure-functions
azure-storage
azure-storage-blob
azure-core


These will be used by both the copy function and the delete function, which I will show you later.



Event Grid subscription to the Origin account


Once you deploy your new feature to your Azure Functions service, the next thing you need to do is create an Event Grid subscription associated with the storage account you want to monitor. The easiest way to create this subscription is through the portal, selecting the function that you have just deployed and clicking on the Add Event Grid Subscription link.

Add Event Grid Subscription

In the wizard you must choose the type of resource, in this case Azure Storage Accounts, the subscription, the resource group and the source storage account. On the other hand, select only the Blob Created event type.

Create Event Subscription

You can trigger the event for any blob created within that account, or you can precisely specify addresses through the Filters section. For example, you could make only the blobs created in the processed folder trigger the invocation of this function, using this format / blobServices / default / containers / processed in Subject Begins With.

Filters section

Once the subscription is created, you can prove that uploading a file to the processed container in the source storage account will trigger your new function. You can see all the executions in the Monitor section, although remember that it can take up to 5 minutes for this event to appear.

Monitor section

Occasionally, you may find that the function does not execute correctly for different reasons (the file has the wrong name for a blob, the code is wrong, etc.). This doesn’t allow the captured event to be marked as processed. By default, the number of retries is 30 and this can cause a number of calls to occur for a file or scenario that we have not taken into account. For this reason, it is also possible to control the number of retries per event. During the subscription creation, you can configure it in the Additional Features section, or you can modify the existing subscription through the storage account, in the Events section by selecting the subscription that you created previously. Then in the Features section you can modify the number of retries, among other values:

Additional Features section

Deletion of the original file after copy

If, before continuing with the article, you have tested your function, the uploaded file should have been copied to the storage account you have chosen as the destination, in a container named source_container_name-archived. Ideally, this account should be type cold, if what you want is to store the original files, whilst your intention is to rarely access them again.

The last thing you have left to do, is the deletion of the original file in the source account. To do this, I have used Event Grid again as a mechanism to detect when a new blob has been created:



import json
import logging
import os

import azure.functions as func
from azure.storage.blob import BlobServiceClient, RetentionPolicy

def main(event: func.EventGridEvent):
    result = json.dumps({
        'id': event.id,
        'data': event.get_json(),
        'topic': event.topic,
        'subject': event.subject,
        'event_type': event.event_type,
    })

    logging.info('Python EventGrid trigger processed an event: %s', result)

    blob_service_client = BlobServiceClient.from_connection_string(os.environ.get('ORIGIN_STORAGE_CONNECTION_STRING'))

    # Create a retention policy to retain deleted blobs
    delete_retention_policy = RetentionPolicy(enabled=True, days=1)

    # Set the retention policy on the service
    blob_service_client.set_service_properties(delete_retention_policy=delete_retention_policy)

    # Blob info to delete
    blob_url = event.get_json().get('url')
    container_name = blob_url.split("/")[-2].split("?")[0].split("-")[0]
    blob_name = blob_url.split("/")[-1].split("?")[0]

    blob_to_delete = blob_service_client.get_blob_client(container=container_name,blob=blob_name)

    blob_to_delete.delete_blob()


In this case, you must follow the same procedure to subscribe to the blob creation event, but in the destination account. In the code, the first thing I do is establish a one-day retention on the source account so that when I delete it, on the following lines, a soft blob occurs instead of a total deletion, in case it is necessary to recover it in a graceful period.


The example code is in my GitHub Account


Cheers!

Gisela Torres

Gisela Torres works at Microsoft as Cloud Solution Architect. It is a technical position whose mission is to support and advise on cloud solutions and architectures using Microsoft Azure as a platform. Prior to that I work as a software architect and application developer in several companies. During those years he received several awards such as Most Valuable Professional in Microsoft Azure. He loves programming and technology in general.

Keep Reading

Newsletter EuropeClouds.com

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form