Uncategorized

Dynamic Variant Analysis with Python

July 7, 2020

July 7, 2020 by davidarch

Variant analysis is a technique that allows developers to find bugs or problems in the codebase that follow the same pattern or behaviour.

The most common way to perform variant analysis is by using grep over your codebase to find places using the same code.

This has the obvious limitation that the pattern you are looking must match exactly with the string provided. You could make it a little better by using regular expressions but it would be really hard and probably not cost effective to find complex patterns using regular expressions.

So, can we do better? Yes we can by using abstract syntax trees and applying complex and well known static analysis algorithms like data flow analysis, symbolic execution, etc. The problem with this approach is that these algorithms are not easy to implement and the false positive rate it’s pretty high.

So, can we do better? We actually can by using a dynamic approach and instrumenting some code.

Let’s start simple with an example. Do you know what happens when we call the len method of a queryset just to get the total count of elements? The whole queryset gets evaluated, so if we have a queryset of millions of elements, that is not good.

This can happen for example on a Django template when using the length templatetag. So there it is, this is going to be the first bug class we are going to be targeting, erroneous usages of the length template tag in Django templates.

Let’s say this happened to you and now you want to find other places in the codebase that might suffer the same problem. First thing that comes to mind is to:

  1. Look for all the usages of the length template tag in the codebase
  2. Forr each usage of the template tag:
    1. Grab the variable name
    2. Look in the view the type of the variable

That is a lot of work! So, how can we instrument this code in order to get all the places in the codebase that are using the length templatetag over querysets.

We could do something like this:

def queryset_check_length(value):
    """
    New |length implementation which checks for a queryset instance value type and raises an error
    """
    if isinstance(value, QuerySet):
        raise Exception('Calling length with a QuerySet')
    else:
        # call the default length template tag
        return django.template.defaultfilters.length(value)

Now if we monkeypatch the django.template.defaultfilters.length function, we will get an exception every time the pattern we are looking for happens.

This is not suitable for a production environment as you might have noticed, and we also need to test manually all the places that might be vulnerable to see if an exception raises, if only there were something like automated tests we could use 🙂

If we have good test coverage then we can let our CI system do all the work and find the buggy templates for us.

@pytest.fixture(autouse=True) 
def template_length_check(monkeypatch):

    # Replace the length template tag implementation with a custom one

    monkeypatch.setitem(django.template.defaultfilters.register.filters, 'length', queryset_check_length)

This is how an autouse pytest fixture applying this instrumentation would look like. What is this thing doing? This piece of code gets run before the execution of each test and replaces the default length template tag filter with the instrumented one, so if one view calls a template with a queryset and than template then calls  the length templatetag over that queryset then the test exercising that view will fail and we will be able to tell which template is vulnerable by analyzing the test failures instead of reading through all the codebase.

Now, all we have to do is wait and relax until our CI system runs all tests and we find other places in the codebases with this kind of bugs.

As you might have noticed, you might still need to perform some manual analysis on your codebase if your coverage is not high enough, you can consider instrumenting the code to log stuff instead of raising an exception in that case. 

Anyway, there is one perfect usage for this kind of instrumentation that will not leave any bug escape, and that is finding bugs on the tests itselves, yes I’m talking about you flaky tests.

A test can be flaky for many different reasons, so, for educational purposes we will go simple again.

Some people are not aware of the fact that querysets are not ordered by nature, and this is a common cause of flaky tests.

So, if your tests do something like:

   def test_flaky(self):
        User.objects.create(name=’david’)
        User.objects.create(name=’pepe’)

        models = User.objects.all()

        self.assertEqual(models[0].name, ‘david’)
        self.assertEqual(models[1].name, ‘pepe’)

Then there is a chance that this test will fail sometimes, if the model does not have a default ordering defined, this is because models[0] is not guaranteed to be the same every time.

So, one pattern we can look for is not ordered queryset usages in which the queryset is accessed using more than one different index. 

How can we instrument this? There are probably many ways but we are choosing to instrument the getitem method of the queryset, this method is the one called each time you try to access the queryset by index.

Since we are instrumenting that, let’s see one possible instrumentation that we can use, the core things we want to check to determine if the test is flaky are:

  1. The queryset is not ordered (this means it has no order_by call nor a default ordering), lucky for us there is an ordered property on the queryset we can use for this
  2. There are more than two unique index access to the queryset
  3. These unique accesses happen within the test code.

Lets see some code

def custom_getitem(self, k):
    # 1) The queryset is not ordered
    if isinstance(k, int) and not self.ordered:
        call_stack = traceback.extract_stack()
        caller_filename = os.path.basename(call_stack[-2].filename)

        # 3) Unique accesses happen within the test code
        if caller_filename.startswith('test_'):
            data.add(k)

    # Call the original __getitem__ otherwise
    return django.db.models.query.QuerySet.original_getitem(self, k)

@pytest.yield_fixture(autouse=True)
def flaky_finder(monkeypatch):
    global data
    data = set() # This will hold the unique index accesses to the queryset
    django.db.models.query.QuerySet.original_getitem = django.db.models.query.QuerySet.__getitem__

    monkeypatch.setattr(django.db.models.query.QuerySet, '__getitem__', custom_getitem)
    yield
    # 2) There are more than two unique index accesses to the queryset
    if len(data) >= 2:
        raise Exception('Possible flaky test detected.')

This code does exactly what we described before. It defines a data set before the test execution starts, and saves in this variable every index access that the queryset goes through, but only if that index access is done from a test. Finally, after the test completes, if the unique number of index access is 2 or more then it makes the test fail.

This time, if we run our tests on our CI system, we can be sure that all bugs on tests following this pattern will get caught.

Taking this further we can use this technique to prevent further flaky tests from entering our codebase by keeping this instrumentation in our codebase. It is a good idea to create a separate pipeline in which this instrumentation will run, we don’t want to mess our tests as a side effect of the instrumentation, so better keep things separated.

Well, that’s all. Now go kill some bugs.

 

Share on Share on FacebookGoogle+Tweet about this on TwitterShare on LinkedIn

Development

Why Pytest?

April 16, 2020

April 16, 2020 by Judit Novak

This post is targeting Python developers who are used to unittest, asking themselves why they should try pytest. I’m hoping my experience and explanation will help you to clarify these doubts.
Read more

Share on Share on FacebookGoogle+Tweet about this on TwitterShare on LinkedIn

LABS

Getting Started Managing a Remote Team

April 7, 2020

April 7, 2020 by Victor Tuson Palau

Dave Murphy and I decided to record a short video on what we have learned and the best practices that we have distilled over the years on how to manage remote teams.

We have a lot of leads at Ebury that are suddenly looking after remote teams, with this video we hope we can help you during this transition.

Hope you enjoy our video!

Share on Share on FacebookGoogle+Tweet about this on TwitterShare on LinkedIn

Development

Metaprogramming

March 20, 2020

March 20, 2020 by Héctor Alvarez

In Ebury IT we care a lot about our fellow company mates, this is why we like to share knowledge with each other. Sometimes you give and most of the times you take.

One of the main targets of Python programming language since its conception by Guido Van Rossum, was to  create a developer-friendly and easy to read language, but that comes with a cost.

One of these costs is syntactic sugar.  With time, some of the original syntactic sugar is being removed like a print statement, but some is here to stay.

Today’s post is about how classes are constructed without all the syntactic sugar we are used to, covering how multi inheritance and super statement work and finally we are also learning about metaprogramming, a technique used to modify your program on runtime.

In the next video we will be learning about some obscure parts of python and how it behaves behind the scenes so be prepared for some in deep knowledge, but we

also want to have fun so I included some star wars, let the fun begin:

https://drive.google.com/a/eburypartners.com/file/d/1EgZwnPCeJmFFmvYbm3zwmiqZXs5qmCd4/view?usp=sharing

Share on Share on FacebookGoogle+Tweet about this on TwitterShare on LinkedIn

Development

Remote Working at Ebury

February 14, 2020

February 14, 2020 by David Murphy

Following in the footsteps of many other successful companies, the Ebury Tech team introduced our own Remote Working Policy in late 2018. This has allowed us to provide more flexibility for our current employees, and give us access to a larger talent pool for recruiting.

The advantages – and disadvantages – of remote or distributed working have been covered in great detail in any number of posts, so instead I will focus on how we approach distributed working here at Ebury.

Photo by Justin DoCanto on Unsplash

Remote is growing

As of today, dedicated remote workers make up ~23% of our team, with a larger number taking advantage of flexible working arrangements on a regular basis. Over the past year, ~37% of our new hires were remote. Alongside our primary hub in Málaga, Spain, our remote employees are scattered through both Europe and South America.

Remote Equality

Unless a company is 100% distributed from their very first employee, there will always be differences when it comes to remote vs onsite employees. The only way you can mitigate this is to treat employees as equals regardless of their location. This means that if you have any remote employees, then you have to act as though all of your employees are remote.

This is known as being remote first, and is the approach Ebury Tech is taking. We try  to act as if we are fully distributed even though three-quarters of our employees are still office based. To achieve this we’ve adopted the typical distributed toolset: we already used Jira and Confluence and G Suite for company-wide collaboration. To this we added the ubiquitous Slack for internal discussion, and a mixture of Hangouts Meet (with suitably equipped meeting rooms) and Zoom for video calls.

Our individual teams are a mixture of approaches to distributed working. Some – historically – are entirely co-located, one pair of teams are satellites (two teams co-located in two locations), while others are remote first or fully distributed. One model we actively avoid is adding remote members to co-located teams.

All teams, no matter what their composition, follow the same two-week sprints, which allows for standardisation in reporting and resource planning across teams.

Meetings can’t be avoided no matter how much we try – so all our meetings are remote-friendly.  Meeting rooms are equipped with good video conferencing kit – we recently invested in a Meeting Owl for our Málaga office – and an increasing number of meetings are remote-first. This is where all participants join from their own computer, regardless of where they are. Knowledge sharing sessions and town hall meetings are broadcast and recorded for asynchronous viewing.

Our main video conferencing room in Málaga, showing our Meeting Owl in use.

In their own words…

Of course the best way to describe the remote working opportunities in Ebury Tech is through our remote workers themselves. I asked them “What do you like about working remotely at Ebury?”. Here’s some of their answers:

“Opportunity to work in an exciting environment that wouldn’t be available in my home town.”

Gio, DevOps, Brazil.

“People. The company has got to build a spirit that makes it easy to work far from the office. We collaborate quite efficiently regardless the distance barriers. There are tons of things I like but they are not explicitly related with the remote position. Ebury really rocks!”

Jesus, Salesforce, Spain

“Relative flexibility: I have core hours, but I can more or less choose my schedule, which helps when things come up, e.g. at school. Fun on the daily stand-up calls, e.g. one of my teammates uses green screen backdrops, which wouldn’t be possible with in-person meetings.”

James, Engineer, UK

“[Ebury] understands remote work, and trust remote employees. It doesn’t think it is a privilege.”

Rober, Engineer, Spain

“The flexibility I have for balance my time between work and family.”

Ale, DevOps, Spain

If this sounds like something you would like to be part of, take a look at our open positions or get in touch with me on LinkedIn.

Share on Share on FacebookGoogle+Tweet about this on TwitterShare on LinkedIn

Development

Developing Yourself at Ebury

January 23, 2020

January 23, 2020 by Victor Tuson Palau

One of the key things that motivate me every day to work at Ebury is the company’s passion to invest in its people. This comes directly from Juan and Salva (our founders) and it is lived by all of us in management positions.

But what does it mean for our Tech teams? Originally, when we were 10-20 people in Tech, this happened organically, but over time and with growth, it is important that career development also scales up and it is consistent across teams. 

During the last year and a half, I have been working with the team to define a framework that allows people to plan their development and have meaningful conversations with their leads.

At Ebury Tech, your career can take several paths, but it all starts with mapping what are 6 competencies that we really care about.

6 Competencies and 4 Levels

We have summarised our cultural values into 6 competencies, each of them with 4 levels. Levels 1 & 2 tend to focus on you, as an individual, while the others focus on your impact in your team(3) and the wider tech organisation(4):

  • Domain Mastery: How good are you in your domain and do others agree? 
  • Team Work: Do you help to achieve others’ objectives? 
  • Continuously Improves: Do you improve and help others improve?
  • Problem Solving: Can you resolve real-life issues with a simple solution?
  • Business Impact: Deliver value, not just code
  • Leadership: Do you set an example to others and take responsibility?

Not everyone follows the same path

Although it might sound like something that you get from a fortune cookie, it really applies to personal development. At Ebury, we have define 3 loose paths to guide your development in Technology:

People Leadership – This is the path I chose, while still technical, my passion is to help others develop and work with team to achieve high performance

Technical Leadership (Architecture)Another available path is to focus on your ability to design systems that deliver customer value and to work with others across the organisation to hone the best solutions for our clients.

Technical Leadership (Hands on)Finally, there is also a path to lead by example, to bring change and best practices from within a squad. 

Putting it all together

Each step in a development path is defined by a set of behavioural and observable expectations. These are a level within each of the 6 competencies.

Note, that I have said observable. This is important because, in order to move from one step to another, your team lead will put you forward a case with examples to a panel of peers that will support your change based on observable facts.  Not everyone will move to the next step at the first try, but you will receive tangible feedback on how to achieve it next time. 

Digg deep, let me know what you think

There is no secret sauce to what we do to invest in our people, hence I am sharing with you and I look forward to your feedback. Hit me up in Linkedin

Share on Share on FacebookGoogle+Tweet about this on TwitterShare on LinkedIn

Javier Vázquez
Salesforce Developer

Development

CI with Salesforce – part 1: Where we were at and where we should

April 10, 2019

April 10, 2019 by Javier Vázquez

When I started to think about this post, it was not easy for me to find a clear way to explain it. This has been a long and hard project, involving a lot of teams and processes, and explaining what we have now and why, could be complex without previous knowledge of our flows, methodologies, technologies and general context. This is the reason why I decided to explain how our deployment flow has evolved with the time by writing a series of posts to explain individually every step we made and all the problems we encountered along the way.

In this first article, I will share with you what was our starting point. How we were working and why we decided to improve this flow.

In the very early stage of our Salesforce team here in Ebury, the team was formed only by a couple of developers and a QA engineer. It was an easier time where all the developments were agreed by the full team; each member knew what their colleagues were working on, and the conflicts were very few and quick and easy to resolve.

Every developer had their own sandbox (or several ones in case we were working in stories related to different projects) and, when the development was finished and the code review approved, the code was moved into DEV sandbox (the one used for testing) using changesets. At the same time, every story was developed in a separate branch and, after PR approval, it was merged into the dev branch. You can realise we are using VCS only for code review and for auditing, but not for deployments.

OK, let’s stop here and think why this worked and why this cannot work at scale.

It is not a bad flow, keeping different sandboxes for development and other ones for testing and staging, as you can see in the next picture:

Let’s enumerate the different problems we faced:

  • Repository is not the source of truth. This is probably the key point for almost all of our problems, and the most important difference between Salesforce development and the development in any other technology. We cannot trust that the code in the repository is accurate to the orgs, and that is something we cannot allow.
  • Changesets are painful. If you are reading this article surely you have worked with changesets at some point. You already know how manual and slow it can be adding changesets, uploading them into a different sandbox (sometimes this can take reeeeally long), deploying, checking something is missing, going back to the source sandbox, cloning it, deploying it again, and all kinds of painful stuff. This can work for little changesets, but when you are deploying hundreds of components, this is worthless.
  • Resolving conflicts. What happens if you are modifying some file which is also modified by another developer? We have two options: conflicts or not. If we have conflicts, the developer who faced them has to resolve them, compile the files again in the sandbox, clone the changeset and deploy it again. It could look like not much work… but the problem now is that we have code in our platform which is not related to us. If that other story had new fields, objects, and so, everything has to be created also in your org to be able to continue. And what happens if we have no conflicts? This could be the happy path, but is not, and here is the next point: code overwriting.

Resolving pull requests...

  • Code overwriting. You just finished your development, the PR is ok and you merged it. Great! Let’s deploy it into dev sandbox, it’s QA time… or not. There is a possibility that when you deployed your changeset into dev, if you have modified the same file another developer did, and as you don’t have those changes in your org (you had no conflicts in the PR, how could you know?), you have deployed a different version of that file and the changes the developer did are now no longer there. We have several workarounds for this, but the communication is the most important point here. If you know what the other developer is doing, mainly because you are reviewing all their changes, then you can anticipate to these conflicts. However, in my experience this step was commonly missed.
  • Dependencies: What happens if you pull some dev changes into your code to avoid overwriting but later your story is ready before the other one? You cannot deploy your changeset! This is because your changeset also has changes which are not in the target org, so your story is ready to be deployed into staging/prod, but the deployment is stopped until the other story is also finished. This is frustrating.
  • Automation: Changesets are not considered for CI, and there is no easy way to use them from the command line.
  • File removal: You cannot remove files using changesets, so you have to search for alternative ways to do it.

Probably I could add some more issues to this list, but I have focused on the main ones.Before finishing: if you are thinking about how great changesets are, I don’t want to wake you from your dream. I know changesets are great for non-developers or people starting in Salesforce, they are easy to use and to understand, and you can do everything through a UI. However, since the moment you start working in more complex projects, involving hundreds of files, dozens of developers, multiple phases and integration with external systems, you need to (must) think about taking a step forward into the CI world.

I hope I’ve been clear enough conveying the problems we had and the reasons for a change. In the next post, we will talk about the different possibilities we were analysing and take a detailed look at the direction we chose (spoiler alert: sfdx).

Share on Share on FacebookGoogle+Tweet about this on TwitterShare on LinkedIn

Sergio Robles
Devops Team

Development, DevOps, LABS

Terragrunt : Terraform the easy way

March 12, 2019

March 12, 2019 by Sergio Robles

We are using Terraform to configure our Infrastructure as Code in AWS and we used it with Terragrunt from the first day.

What is Terragrunt? the official definition

“… is a thin wrap for Terraform that provides additional tools to maintain your Terraform DRY configurations, work with multiple Terraform modules, and manage the remote state.”  (read more)

We have found Terragrunt very useful, it allows us to configure the remote state, locking, additional arguments, etc. depending on the configuration of your terraform.tfvars file.

The best features of Terragrunt for us are:

Read more

Share on Share on FacebookGoogle+Tweet about this on TwitterShare on LinkedIn

Sergio Robles
Devops Team

DevOps, Events

DockerCon EU 18

March 12, 2019

March 12, 2019 by Sergio Robles

On December of 2018, the Ebury team attended DockerCon EU.

DockerCon Europe describes itself as “a 2.5 day technology conference, where customers and community come to learn, share and connect with each other. Attendees are a mix of developers, systems admins, architects, and IT decision makers —from beginner to intermediate, and advanced users—  who are all looking to level up their skills and go home inspired and ready to invest and implement their containerization strategies” and that is exactly what we wanted to get out of it.

Read more

Share on Share on FacebookGoogle+Tweet about this on TwitterShare on LinkedIn