Queue tasks in Celery after database commit – Introducing django-transaction-hooks

Engineering

by Antonio Páez on April 18, 2018

At Ebury, we use Django and have followed an ongoing upgrade path from 1.3 to 1.5 to 1.7. During that time we have had an issue that was messing with us. You might be familiar with it.

We use celery for executing asynchronous tasks and Django is our framework with PostgreSQL database.

The issue occurs when an asynchronous task makes use of an object that has been just updated, or created. There is a dependency with the database, the object might not have the updated status when the asynchronous task starts, or not even exists yet.

We are now able to utilise the library django-transaction-hooks, which works with Django 1.6 through 1.8, and has been merged into Django 1.9+.

What is important with this library is that adds the event “on_commit” to manage timing with database transactions. So, we can use this for scheduling when to queue tasks for celery workers. The main advantage comes when we want to queue using an object created into an atomic transaction. Consider the following example:

When a task is queued, for instance is not committed into database, and the odds of workers starting tasks with the response “ObjectDoesNotExist” increases with the number of instructions in <other actions>.

With django-transaction-hooks the task is not queued until atomic block is committed.

Essentially, django-transaction-hooks just extends the back-end of the connection with database, managing in memory instructions added with “on_commit” method inside each block, and popping the list out once the transaction ends.

All perfect so far, this suits perfectly with what we want. However, there are two things that still need addressing: compatibility with standard database back-end and an ugly syntax.

As reflected in the library’s documentation, for using it we just need to change settings for the database engine.

DATABASES = {
    'default': {
        'ENGINE': 'transaction_hooks.backends.postgresql_psycopg2',
        'NAME': 'foo',
    },
}

However, people through our teams run their environments with a different settings files, depending on their needs, where they could be using a different backend. Calling “connection.on_commit” with django standard back-end will throw an “AttributeError”. So people would be forced to update its database back-end.

Here come across the second point, we don’t like that syntax. I personally hate the lambda syntax, so always try to avoid it. 😉

At the moment we are only using “on_commit” events for queuing to celery, and we have developed our tasks based on Task classes. So, this is the solution we have come up with: set a new method that looks like celery native and wrap compatibility between both engines.

class BaseTask(Task):
    """
    Base celery task for trades app
    """
    abstract = True

    def apply_on_commit(self, args=None, kwargs=None, task_id=None, producer=None,
                        link=None, link_error=None, **options):

        if settings.TRANSACTION_HOOKS_POSTGRE_BACKEND == settings.DATABASES['default']['ENGINE']:
            connection.on_commit(lambda: self.apply_async(args, kwargs, task_id, producer,
                                                          link, link_error, **options))
        else:
            self.apply_async(args, kwargs, task_id, producer, link, link_error, **options)

We look for the engine value to call “apply_async” method directly or we can use it with connection “on_commit”. Of course, this would need to be reviewed if we’d use more than one database. But it fits really clean in the code.

This means that as the teams move to utilising this new approach we can maintain compatibility with legacy methods too for a nice controlled adoption.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_5ZETTGME4T	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_51187572_43	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	16 years 4 months	These cookies are set via embedded youtube-videos. They register anonymous statistical data on for example how many times the video is displayed and what settings are used for playback.No sensitive data is collected unless you log in to your google account, in that case your choices are linked with your account, for example if you click “like” on a video.

Cookie	Duration	Description
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	These cookies are set via embedded youtube-videos.
yt-remote-device-id	never	These cookies are set via embedded youtube-videos.
yt.innertube::nextId	never	These cookies are set via embedded youtube-videos.
yt.innertube::requests	never	These cookies are set via embedded youtube-videos.