Connect to s3 with AWS Lambda Python

AWS Lambda supports Python, so I tried using it. This time I used it for copying files between buckets of S3, but I would like to share it because there were various points of interest.

Thing you want to do

  1. I want to copy a file that exists in the s3 bucket to another bucket
  2. Copying with a single process is slow, so I want to copy buckets with multiple processes at the same time
  3. I want to use AWS Lambda Python

I tried it mainly for the third reason.

What i did

I created a Lambda function to get an s3 bucket and implemented a script to copy in parallel.

Creating a Lambda Function

Create a Lambda Function.

--Click Create a Lambda Function

Select blue print Select the template you want to use.

--Select hello-world-python

Configure function

Make basic settings for the Lambda function.

--Name: Lambda Function name --Example: s3LogBucketCopy --Description: Description of Lambda Function --Example: copy logs between buckets --Runtime: Execution environment - Python2.7

Lambda function code

Provides the program code to be executed.

You can choose from the following three types.

  1. Edit the code on the screen
  2. Upload code from your own machine
  3. Upload code from s3

If you need to import a standard python library or a library other than boto3, you need to choose method 2 or 3.

Details are summarized in here, so please refer to it if you are interested.

By the way, this time, since only the standard library and boto3 are used, it is implemented by method 1.

We will implement it later, so we will not change it at first.

Lambda function handler and role --Handler: Name of handler to execute (module name.function name) --Example: lambda_function.s3_log_copy_handler --Role: Lambda execution permission (access permission to resources such as s3, etc.) --Example: S3 execution Role

Advanced settings Set the available memory and timeout time.

--Memory (MB): Available memory --Example: 128MB --Timeout: Timeout time --Example: 5 min

Review

Check the settings. If there is no problem, select Create Function

Script implementation

Implement the script to copy with multi_process.

Below is a simple sample.

#! /user/local/bin/python
# -*- coding:utf-8 -*-

import boto3
from multiprocessing import Process

  

def parallel_copy_bucket(s3client, source_bucket, dest_bucket, prefix):
    '''
Copy s3 bucket in parallel
    '''    
    #Copy the bucket
    def copy_bucket(s3client, dest_bucket, copy_source, key):
        s3client.copy_object(Bucket=dest_bucket, CopySource=copy_source, Key=key)
        
    # list_Note that you can only get up to 1000 data for object.
    result = s3client.list_objects(
        Bucket=source_bucket,
        Prefix=prefix
    )
    #Get the copy source key list and copy
    if 'Contents' in result:
        keys = [content['Key'] for content in result['Contents']]
        p = None
        for key in keys:
            copy_source = '{}/{}'.format(source_bucket, key)
            p = Process(target=copy_bucket, args=(s3client, dest_bucket, copy_source, key))
            p.start()
        if p:
            p.join()


#Handler called at runtime
def s3_log_copy_handler(event, context):
    source_bucket = event["source_bucket"] #Copy source bucket
    dest_bucket = event["dest_bucket"]     #Copy destination bucket
    prefixes = event["prefixes"]           #Copy source file name conditions
    s3client = boto3.client('s3')
    for prefix in prefixes:
        print("Start loading {}".format(prefix))
        parallel_copy_bucket(s3client, source_bucket, dest_bucket, prefix)
    print("Complete loading")

Test run

Set Configure Sample Event from the ʻActions` button

Set parameters to pass to handler

For example, if the configuration of s3 is as follows

- samplelogs.source  #Copy source bucket
    - /key1
        - hogehoge.dat
    - /key2
        - fugafuga.dat
- samplelogs.dest    #Copy destination bucket

Set the JSON as follows.

.json


{
  "source_bucket": "samplelogs.source",
  "dest_bucket": "samplelogs.dest",
  "prefixes" : [
    "key1",
    "key2"
  ]
}

Where I was addicted

Allow Role to process s3 bucket

The default S3 Execution Rule only defines s3: GetObject and s3: PutObject. At this time, if you call s3client.list_objects (), you will get the error ʻA client error (AccessDenied) occurred: Access Denied. This method cannot be executed with S3: GetObject and requires another execute permission called S3: ListObejct. Therefore, you need to add s3: ListObject` to your Policy.

multiprocessing.Pool

When running in multiple processes, if you specify a pool, you will get the error ʻOSErrors-[Errno 38] Function not implemented`. This is a problem because you don't have the OS privileges needed to hold the pool when running on Lambda. You need to unconfigure the Pool and run it.

TimeOut settings

Lambda must be configured to time out when the execution time exceeds the specified value. Since the maximum timeout value is 300 sec (5 min), execution cannot be completed for items that take longer to execute. So if you have a bucket with a reasonably large file, you'll need to run the Lambda Function several times.

Impressions

I think it's a good place to use it, but I think it's suitable for light processing such as alerts, push notifications, and small file transfers. On the contrary, it doesn't seem to be suitable for writing heavy processing. Also, now that you have an API endpoint, it may be suitable for ultra-lightweight APIs. I will try it next time.

Here is a summary of the happy and unfortunate points of using the Lambda Function.

――I'm happy --No need to set up an instance to write a simple batch process --Easy batch implementation with minimal settings --Easy to access resources in aws as it can be managed by IAM Role --boto3 can be used as standard --Unfortunately --Not compatible with Python 3 series --Importing other than standard packages and boto3 is troublesome --It is difficult to manage the created code --Maximum timeout time is short

reference

https://boto3.readthedocs.org/en/latest/ http://qiita.com/m-sakano/items/c53ba194a8574f44e78a http://www.perrygeo.com/running-python-with-compiled-code-on-aws-lambda.html

Recommended Posts

Connect to s3 with AWS Lambda Python
Connect to BigQuery with Python
[Python] Convert CSV file uploaded to S3 to JSON file with AWS Lambda
Connect to Wikipedia with Python
Export RDS snapshot to S3 with Lambda (Python)
Upload what you got in request to S3 with AWS Lambda Python
I want to AWS Lambda with Python on Mac!
Notify HipChat with AWS Lambda (Python)
[AWS] Try adding Python library to Layer with SAM + Lambda (Python)
Output CloudWatch Logs to S3 with AWS Lambda (Pythyon ver)
[Python] Regularly export from CloudWatch Logs to S3 with Lambda
[AWS] Using ini files with Lambda [Python]
Connect to MySQL with Python within Docker
I want to play with aws with python
[AWS] Link Lambda and S3 with boto3
Connect to pepper with PEPPER Mac's python interpreter
[AWS] Do SSI-like things with S3 / Lambda
Python + Selenium + Headless Chromium with aws lambda
Send images taken with ESP32-WROOM-32 to AWS (API Gateway → Lambda → S3)
AWS CDK with Python
Connect python to mysql
LINE BOT with Python + AWS Lambda + API Gateway
Serverless application with AWS SAM! (APIGATEWAY + Lambda (Python))
Sample to send slack notification with python lambda
[AWS / Lambda] How to load Python external library
Upload files to Google Drive with Lambda (Python)
Summary of studying Python to use AWS Lambda
Connect to MySQL with Python on Raspberry Pi
Dynamic HTML pages made with AWS Lambda and Python
Operate TwitterBot with Lambda, Python
[S3] CRUD with S3 using Python [Python]
[Python] Scraping in AWS Lambda
S3 operation with python boto3
Connect to Postgresql with GO
[AWS lambda] Deploy including various libraries with lambda (generate a zip with a password and upload it to s3) @ Python
Deploy Python3 function with Serverless Framework on AWS Lambda
Write multiple records to DynamoDB with Lambda (Python, JavaScript)
Create a Layer for AWS Lambda Python with Docker
Connect to sqlite from python
Switch python to 2.7 with alternatives
Write to csv with Python
Connect to s3 tokyo region
Make ordinary tweets fleet-like with AWS Lambda and Python
How to use Python lambda
Posted as an attachment to Slack on AWS Lambda (Python)
[AWS] What to do when you want to pip with Lambda
Challenge problem 5 with Python: lambda ... I decided to copy without
Create API with Python, lambda, API Gateway quickly using AWS SAM
[Python] Allow pip3 packages to be imported on AWS Lambda
Site monitoring and alert notification with AWS Lambda + Python + Slack
Connect Raspberry Pi to Alibaba Cloud IoT Platform with Python
[Introduction to Udemy Python 3 + Application] 58. Lambda
Python: How to use async with
Summary if using AWS Lambda (Python)
Link to get started with python
Connect to multiple databases with SQLAlchemy
[Python] Write to csv file with Python
[AWS] Create API with API Gateway + Lambda
Create folders from '01' to '12' with python
Nice to meet you with python
Try to operate Facebook with Python