Unit testing execution for legacy code (part 1): Implementation

Quality assurance

by Daniel Gordillo on October 17, 2016

When we talk about Unit Tests we generally mean tests for individual units of source code as created as an outcome of Test Driven Development. But how can we apply this when we’re developing against areas of our code base that don’t have unit tests to start with, that is essentially Legacy code?

This post describes how we approached this problem in three phases.

The ideal setup

The target for every company is to create a set of test coverage in the shape of the pyramid below.
design_article_daniqa_pyramid_2

E2E or end to end, are automated GUI tests. We use a combination of Behave + Selenium + Python.
Integration tests, for which we use the pytests library.
Unit tests, for which we also use the pytests library (these evolve from TDD philosophy)

Generating this type of coverage for greenfield development is fairly well understood, but what happens when parts of the codebase we are working with are legacy? (At Ebury, we think of legacy code as code without unit tests).

Let’s explore how we’re approaching this shape on the legacy part of our codebase, step-by-step:

First phase

We first made a decision not to throw away everything that was legacy. Everything was working well but the cost to make changes in those legacy areas was increasing and so our ability to change stuff was getting slower.

Unfortunately you can’t just jump in and start adding unit tests as the structure of the code has not been built to be tested. Its too risky at this stage to just change the code to add unit tests, as you don’t know what you might be breaking.

We didn’t have an E2E framework in place. So initially we worked to increase the coverage through integration tests. Our pyramid was inverted.

design_article_daniqa_pyramid_3

Second phase

Next we included E2E Tests.

Again a UI needs to be designed and have the hooks in place to be tested. While perhaps not an approach we would recommend for all projects, we had an opportunity to kill two birds with one stone: To meet various requirements to update the look and feel of our application and add E2E coverage as we went.

It’s worth noting that E2E tests can take along time to execute. We started running them overnight but, as the number increased, we had to do work to parallelise the running of these E2E and indeed other tests so our CI process itself did not become a bottleneck.

design_article_daniqa_pyramid_1

Third phase

With good top end coverage in place, we were able to begin fixing our pyramid, not as a specific project but just as changes were needed. When touching the legacy code we began to add unit tests, remove or “mock” out dependencies in the parts of the code needing to be changed, ensuring that tests were failing for the right reasons and then adding what’s needed to get the new tests to pass.

Michael C. Feathers ‘Working Effectively with Legacy Code’ is a very useful guideline for the approach we have taken here.

What have we got now?

So currently our “pyramid” for the original code feels a bit more like a trapeze:

design_article_daniqa_pyramid_4
Over time we have seen our unit test coverage increase and most importantly a significant drop in defects getting to production. We’re pretty happy with our E2E Test coverage and are continuously increasing the Integration and Unit Tests coverage.

Importantly we have considerably reduced the test execution time, down from many hours to around thirty minutes. The key changes we have made to achieve this are:

Removing or mocking out the most significant external dependencies
Using multiple cloud servers to execute the process through Docker containers
Avoiding manual execution using Bitbucket + Jenkins + Docker integration
Parallelisation and prioritisation of Jenkins jobs

This is our current set up for executing tests:

design_article_daniqa_graphic_1

Whats next?

We’ll continue to invest in the shape of our pyramid and we have plans to speed everything up further with:

Database execution in memory
Continuing to remove or design-out more dependencies

I hope you find this story helpful and can see how it may be applied to your own project.

In the next post, we’ll talk about how we included coverage analysis in this process.

Thanks!

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_5ZETTGME4T	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_51187572_43	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	16 years 4 months	These cookies are set via embedded youtube-videos. They register anonymous statistical data on for example how many times the video is displayed and what settings are used for playback.No sensitive data is collected unless you log in to your google account, in that case your choices are linked with your account, for example if you click “like” on a video.

Cookie	Duration	Description
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	These cookies are set via embedded youtube-videos.
yt-remote-device-id	never	These cookies are set via embedded youtube-videos.
yt.innertube::nextId	never	These cookies are set via embedded youtube-videos.
yt.innertube::requests	never	These cookies are set via embedded youtube-videos.