Skip to content Skip to sidebar Skip to footer

Cassandra Database Session Reuse In Aws Lambda (python)

I am trying to reuse a Cassandra cluster session for subsequent AWS Lambda function calls. I've successfully implemented it in Java, but reusing the session in the python gets the

Solution 1:

I think I can state that this issue is cause by a different behavior of lambda when using the Python execution environment w.r.t. Java.

I had time to set up a simple lambda function, implemented both in Java ad Python. The function simply spawns a thread which prints the current time in a while loop. The question was: will the thread in the Java implementation continue printing even after the lambda function has returned, and conversely, will the Python thread stop instead? The answer is yes in both cases: the java thread continues printing till the timeout configured, while python will stop as soon as the lambda function returns.

The CloudWatch log for the Java version confirms that:

09:55:21 START RequestId: b70e732b-e476-11e6-b2bb-e11a0dd9b311 Version: $LATEST
09:55:21 Function started: 1485510921351
09:55:21 Pre function call: 1485510921351
09:55:21 Background function: 148551092135209:55:21 Background function: 148551092145209:55:21 Background function: 148551092155209:55:21 Background function: 148551092165209:55:21 Background function: 148551092175209:55:21 Post function call: 148551092185209:55:21 Background function: 148551092185309:55:21 END RequestId: b70e732b-e476-11e6-b2bb-e11a0dd9b311
09:55:21 REPORT RequestId: b70e732b-e476-11e6-b2bb-e11a0dd9b311 Duration: 523.74 ms Billed Duration: 600 ms Memory Size: 256 MB Max Memory Used: 31 MB
09:55:21 Background function: 148551092195309:55:22 Background function: 1485510922053
...

While in the Python version:

09:01:04 START RequestId: 21ccc71e-e46f-11e6-926b-6b46f85c9c69 Version:$LATEST09:01:04 Function started:2017-01-27 09:01:04.18981909:01:04 Pre function call:2017-01-27 09:01:04.18984909:01:04 background_call function:2017-01-27 09:01:04.19436809:01:04 background_call function:2017-01-27 09:01:04.29461709:01:04 background_call function:2017-01-27 09:01:04.39484309:01:04 background_call function:2017-01-27 09:01:04.49510009:01:04 background_call function:2017-01-27 09:01:04.59534909:01:04 Post function call:2017-01-27 09:01:04.69048309:01:04 END RequestId:21ccc71e-e46f-11e6-926b-6b46f85c9c6909:01:04 REPORT RequestId: 21ccc71e-e46f-11e6-926b-6b46f85c9c69 Duration: 500.99 ms Billed Duration: 600 ms Memory Size: 128 MB Max Memory Used:8MB

Here's the code of the two functions:

Python

import thread
import datetime
import time


defbackground_call():
    whileTrue:
        print'background_call function: %s' % (datetime.datetime.now(), )
        time.sleep(0.1)

deflambda_handler(event, context):
    print'Function started: %s' % (datetime.datetime.now(), )

    print'Pre function call: %s' % (datetime.datetime.now(), )
    thread.start_new_thread(background_call, ())
    time.sleep(0.5)
    print'Post function call: %s' % (datetime.datetime.now(), )

    return'Needs more cowbell!'

Java

import com.amazonaws.services.lambda.runtime.*;


publicclassBackgroundTestimplementsRequestHandler<RequestClass, ResponseClass> {

    publicstaticvoidmain( String[] args )
    {
        System.out.println( "Hello World!" );
    }

    public ResponseClass handleRequest(RequestClass requestClass, Context context) {
        System.out.println("Function started: "+System.currentTimeMillis());
        System.out.println("Pre function call: "+System.currentTimeMillis());
        Runnable r = new Runnable() {
            publicvoidrun() {
                while(true){
                    try {
                        System.out.println("Background function: "+System.currentTimeMillis());
                        Thread.sleep(100);
                    } catch (InterruptedException e) {
                        e.printStackTrace();
                    }
                }
            }
        };
        Thread t = new Thread(r);
        t.start();
        try {
            Thread.sleep(500);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        System.out.println("Post function call: "+System.currentTimeMillis());
        returnnew ResponseClass("Needs more cowbell!");
    }
}

Solution 2:

There's a similar issue in the cassandra-driver FAQs, where WSGI applications won't work with a global connection pool:

Depending on your application process model, it may be forking after driver Session is created. Most IO reactors do not handle this, and problems will manifest as timeouts. [Here][1]

This at least got me on the right track to check the available connection classes: it turns out that cassandra.io.twistedreactor.TwistedConnection works pretty well on AWS Lambda.

All in all the code looks something like this:

from cassandra.cluster import Cluster
from cassandra.io.twistedreactor import TwistedConnection
import time


SESSION = Cluster([...], connection_class=TwistedConnection).connect()


defrun(event, context):
    t0 = time.time()
    x = list(SESSION.execute('SELECT * FROM keyspace.table'))  # Ensure query actually evaluatedprint('took', time.time() - t0)

You will need to install twisted in your venv though.

I ran this overnight on 1-min crontab and have only seen a few connection errors (up to 2 in an hour), so overall quite happy with the solution.

Also I haven't tested eventlet and gevent based connections, because I can't have them monkey patching my applications, and I also didn't feel like compiling libev to use on lambda. Someone else might want to try though.

Don't forget to http://datastax.github.io/python-driver/faq.html#why-do-connections-or-io-operations-timeout-in-my-wsgi-application

Post a Comment for "Cassandra Database Session Reuse In Aws Lambda (python)"