Daemons with Celery II

Engineering

by Ebury Labs on April 27, 2016

Continuing with our previous entry about how to daemonize with Celery. We left the solution in a state where our buffer could eventually collapse due to our producer generating tasks faster that our consumer could execute them.

To solve this I propose another question: “Is foo_action_postsave task currently being executed?”

If we enable our producer to ask this question we can avoid queuing a new task until the current one has finished. In order to know if a task is being executed we can use a cache-based lock with the following behavior: when a task is going to be executed, acquire the lock, do the process and then, just after it finishes, release the lock.

Solution: Second approach

An example of a cache-based lock (this works really well with Django caches backend):

class CacheLock(object):
    """
    A lock implementation.

    A lock manages an internal value that is 0 or nonexistent when lock is free and 1 when is closed. 
    Can be locked calling acquire() and freed calling release().
    """
    def __init__(self, cache_key: str, timeout: int=None):
        """
        Create a Lock using Django cache as backend.

        :param cache_key: Key that will be used in cache to store the lock.
        :param timeout: Time to expire.
        """
        self._cache_key = cache_key
        self._timeout = timeout

    def acquire(self):
        """
        Acquire the lock, blocking follow calls.

        :return: True if lock have been acquired, otherwise False.
        """
        if not self.locked():
            cache.add(self._cache_key, 1, self._timeout)
            result = True
        else:
            result = False
        return result

    def release(self):
        cache.delete(self._cache_key)

    def locked(self):
        return cache.get(self._cache_key, 0) == 1

    def __del__(self):
        return self.release()

Using this lock implementation we can define a base class for Celery tasks that behave how we described:

class SingleTask(Task):
    TAG = 'to be defined, could be __name__'
    abstract = True
    single_run = True

    def __init__(self, *args, **kwargs):
        super(LoggedSingleTask, self).__init__(*args, **kwargs)
        lock_id = 'lock_{}'.format(self.TAG.lower())
        self.lock = CacheLock(cache_key=lock_id)

    def __call__(self, *args, **kwargs):
        if self.lock.acquire():
            result = super(LoggedSingleTask, self).__call__(*args, **kwargs)
            self.lock.release()
            return result
        else:
            return False

    def on_success(self, retval, task_id, args, kwargs):
        self.lock.release()
        super(LoggedSingleTask, self).on_success(retval, task_id, args, kwargs)

    def on_failure(self, exc, task_id, args, kwargs, einfo):
        self.lock.release()
        super(LoggedSingleTask, self).on_failure(exc, task_id, args, kwargs, einfo)

We need to override four methods from Celery base tasks:

init: When Celery creates the task, we assign a lock to this task using a unique tag.
call: When the task is called for execution we try to acquire the lock. If acquired, the task is executed and afterwards the lock is released. If not acquired, the task is ignored.
on_success: When the task finishes, release the lock.
on_failure: When the task fails, release the lock.

This is all we need to do in order avoid collapses in our queues.

But is important to make sure that our workers can operate two tasks concurrently and without prefetch. We need a worker that tries to execute a task even if another task is currently being executed in order to make sure tasks are cleaned from the queue (try to execute, fails acquiring lock and immediately exits).

I agree this isn’t the most elegant approach but, in the third entry of this series, I’ll explain how to avoid having a ‘cleaner worker’ that constantly tries to execute tasks without success.

All code used as examples can be found in our own repository:
Code repository

Celery docs for prefetching and concurrency behavior can be found in:
Concurrency
Prefetch

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_5ZETTGME4T	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_51187572_43	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	16 years 4 months	These cookies are set via embedded youtube-videos. They register anonymous statistical data on for example how many times the video is displayed and what settings are used for playback.No sensitive data is collected unless you log in to your google account, in that case your choices are linked with your account, for example if you click “like” on a video.

Cookie	Duration	Description
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	These cookies are set via embedded youtube-videos.
yt-remote-device-id	never	These cookies are set via embedded youtube-videos.
yt.innertube::nextId	never	These cookies are set via embedded youtube-videos.
yt.innertube::requests	never	These cookies are set via embedded youtube-videos.