Variant analysis is a technique that allows developers to find bugs or problems in the codebase that follow the same pattern or behaviour. In this post we explore how to use code instrumentation to achieve dynamic variant analysis.

The most common way to perform variant analysis is by using grep over your codebase to find places using the same code.

This has the obvious limitation that the pattern you are looking must match exactly with the string provided. You could make it a little better by using regular expressions but it would be really hard and probably not cost effective to find complex patterns using regular expressions.

So, can we do better? Yes we can by using abstract syntax trees and applying complex and well known static analysis algorithms like data flow analysis, symbolic execution, etc. The problem with this approach is that these algorithms are not easy to implement and the false positive rate it’s pretty high.

So, can we do better? We actually can by using a dynamic approach and instrumenting some code.

Let’s start simple with an example. Do you know what happens when we call the len method of a queryset just to get the total count of elements? The whole queryset gets evaluated, so if we have a queryset of millions of elements, that is not good.

This can happen for example on a Django template when using the length templatetag. So there it is, this is going to be the first bug class we are going to be targeting, erroneous usages of the length template tag in Django templates.

Let’s say this happened to you and now you want to find other places in the codebase that might suffer the same problem. First thing that comes to mind is to:

  1. Look for all the usages of the length template tag in the codebase
  2. Forr each usage of the template tag:
    1. Grab the variable name
    2. Look in the view the type of the variable

That is a lot of work! So, how can we instrument this code in order to get all the places in the codebase that are using the length templatetag over querysets.

We could do something like this:

def queryset_check_length(value):
    """
    New |length implementation which checks for a queryset instance value type and raises an error
    """
    if isinstance(value, QuerySet):
        raise Exception('Calling length with a QuerySet')
    else:
        # call the default length template tag
        return django.template.defaultfilters.length(value)

Now if we monkeypatch the django.template.defaultfilters.length function, we will get an exception every time the pattern we are looking for happens.

This is not suitable for a production environment as you might have noticed, and we also need to test manually all the places that might be vulnerable to see if an exception raises, if only there were something like automated tests we could use 🙂

If we have good test coverage then we can let our CI system do all the work and find the buggy templates for us.

@pytest.fixture(autouse=True) 
def template_length_check(monkeypatch):

    # Replace the length template tag implementation with a custom one

    monkeypatch.setitem(django.template.defaultfilters.register.filters, 'length', queryset_check_length)

This is how an autouse pytest fixture applying this instrumentation would look like. What is this thing doing? This piece of code gets run before the execution of each test and replaces the default length template tag filter with the instrumented one, so if one view calls a template with a queryset and than template then calls  the length templatetag over that queryset then the test exercising that view will fail and we will be able to tell which template is vulnerable by analyzing the test failures instead of reading through all the codebase.

Now, all we have to do is wait and relax until our CI system runs all tests and we find other places in the codebases with this kind of bugs.

As you might have noticed, you might still need to perform some manual analysis on your codebase if your coverage is not high enough, you can consider instrumenting the code to log stuff instead of raising an exception in that case. 

Anyway, there is one perfect usage for this kind of instrumentation that will not leave any bug escape, and that is finding bugs on the tests itselves, yes I’m talking about you flaky tests.

A test can be flaky for many different reasons, so, for educational purposes we will go simple again.

Some people are not aware of the fact that querysets are not ordered by nature, and this is a common cause of flaky tests.

So, if your tests do something like:

   def test_flaky(self):
        User.objects.create(name=’david’)
        User.objects.create(name=’pepe’)

        models = User.objects.all()

        self.assertEqual(models[0].name, ‘david’)
        self.assertEqual(models[1].name, ‘pepe’)

Then there is a chance that this test will fail sometimes, if the model does not have a default ordering defined, this is because models[0] is not guaranteed to be the same every time.

So, one pattern we can look for is not ordered queryset usages in which the queryset is accessed using more than one different index. 

How can we instrument this? There are probably many ways but we are choosing to instrument the getitem method of the queryset, this method is the one called each time you try to access the queryset by index.

Since we are instrumenting that, let’s see one possible instrumentation that we can use, the core things we want to check to determine if the test is flaky are:

  1. The queryset is not ordered (this means it has no order_by call nor a default ordering), lucky for us there is an ordered property on the queryset we can use for this
  2. There are more than two unique index access to the queryset
  3. These unique accesses happen within the test code.

Lets see some code

def custom_getitem(self, k):
    # 1) The queryset is not ordered
    if isinstance(k, int) and not self.ordered:
        call_stack = traceback.extract_stack()
        caller_filename = os.path.basename(call_stack[-2].filename)

        # 3) Unique accesses happen within the test code
        if caller_filename.startswith('test_'):
            data.add(k)

    # Call the original __getitem__ otherwise
    return django.db.models.query.QuerySet.original_getitem(self, k)

@pytest.yield_fixture(autouse=True)
def flaky_finder(monkeypatch):
    global data
    data = set() # This will hold the unique index accesses to the queryset
    django.db.models.query.QuerySet.original_getitem = django.db.models.query.QuerySet.__getitem__

    monkeypatch.setattr(django.db.models.query.QuerySet, '__getitem__', custom_getitem)
    yield
    # 2) There are more than two unique index accesses to the queryset
    if len(data) >= 2:
        raise Exception('Possible flaky test detected.')

This code does exactly what we described before. It defines a data set before the test execution starts, and saves in this variable every index access that the queryset goes through, but only if that index access is done from a test. Finally, after the test completes, if the unique number of index access is 2 or more then it makes the test fail.

This time, if we run our tests on our CI system, we can be sure that all bugs on tests following this pattern will get caught.

Taking this further we can use this technique to prevent further flaky tests from entering our codebase by keeping this instrumentation in our codebase. It is a good idea to create a separate pipeline in which this instrumentation will run, we don’t want to mess our tests as a side effect of the instrumentation, so better keep things separated.

Well, that’s all. Now go kill some bugs.

Join the team changing the future of FinTech

Apply now!