How (and why) to move performance testing earlier in the development cycle [WOPR22]

I recently attended the WOPR22 conference in Malmo. This focussed on discussions around how to move performance testing earlier in the process. This is a big subject and there is clearly no magic bullet solution, but I thought I’d share some of the key takeaways from the discussions.

Performance testing needs to be thought of as more than just load testing

A traditional approach of “finish development, go through a load testing process and approve/reject for go-live” really doesn’t work in a modern development environment. The feedback loop is just too slow.

We have to be providing feedback earlier. Load testing though is not ideal for this, there are a lot of drawbacks – scripts have to be maintained, environments available, datasets managed, load models understood. These are surmountable but quicker feedback can be provided by simpler processes that can be added to a standard CI solution or executed manually but regularly.

Examples are things such as

  • waterfall chart analysis of pages being returned
  • unit test timings
  • parallelisation of unit tests to get some idea of concurrency impacts
  • using perf monitors/APM on CI environments

These will not find true problems that only occur under load but they will give you some early indications that problems may be there.

Performance Testers need tighter integration with the development team

The performance testing team cannot be too distant from the development team – for practical and political reasons. There was a lot of discussion about whether a performance team works best as a distinct entity or as individuals integrated across the teams. There are pros and cons for each argument.

What is clear though is that when issues are identified there must be co-operation between performance testers and developers to share their knowledge to resolve the problem. Performance testers should not be people who just identify problems – they must be people who are part of the team that solves the problem.

Mais Tawfik has the policy of physically pairing the performance tester who has identified the problem with the developer who is working on the fix until a resolution is found.

Performance Testers still need space for analysis

One of the downsides to pushing performance testing earlier was that it often results in an additional demand for testing without provision of appropriate space for analysis. Performance testing is an area where analysis of the data is important – it is not based on black and white results.

It is often overlooked that data is not information. Human intelligence is required to convert data into information. An important role of the performance tester in improving any process is to ensure that there is not an acceptance of data over information because data can be provided more regularly. We must ensure that there is sufficient quality, not just quantity of performance testing during the development process.

Environmental advances can make this process easier

Cloud and other virtualised environments as well as automation tools for creating environments (e.g. Chef, Puppet, CloudFormation) have been game changers for earlier and more regular performance teasing. Environments can be reliably created on demand. To move testing earlier we must take advantage of these technologies.

Use automation to simplify the process

Automate the capture of metrics during the test to speed up the entire process. Using APM tooling helps in this respect. Automating this reduces the overhead associated with the process of running a test and analysing results.

Attendees of WOPR22 in Malmö, Sweden.

Attendees of WOPR22 in Malmö, Sweden.

Based on discussion with all WOPT22 attendees:

Fredrik Fristedt, Andy Hohenner, Paul Holland, Martin Hynie, Emil Johansson, Maria Kedemo, John Meza, Eric Proegler, Bob Sklar, Paul Stapleton, Neil Taitt, and Mais Tawfik Ashkar.

Leave a comment

Filed under Opinions

Executing Gatling tests from Node.js

So, I’ve been playing with Gatling quite a bit recently. It’s a really neat open source load testing tool.

Anyway, what I wanted to do was to have a remote instance that I could trigger a Gatling test on and then get the results back, all over http. Node.js seemed the obvious lightweight solution to do this in….

Now, how to trigger a Gatling test from Node.js. Well, there are a few complexities but generally it is not too bad.

Executing a Test

Gatling runs as a command line tool, so the first step is to use an NPM package called “sh”. This is a package that allows execution of commands and executes a callback on completion.

By default Gatling will run in an interactive mode, awaiting user response to determine the tests that should be run. Obviously this is not viable when running headless so we need to add some switches to the base command

-s class.classname [where class.classname is the class of the test you wish to run]
-sf path/to/script/folder [this can be avoided if scripts are stored in default Gatling directory]

Executing the Gating command with these two switches via the following line of node will successfully execute a test and execute the callback script on completion

var command = this.config.gatlingRoot + ' -sf ' + this.config.rootfolder + ' -s ' + this.config.testClass;
sh(command).result(function(){test.completeTest()});

Capturing the Results

Executing a test though is no use without being able to gather the results and this was slightly more of a challenge. Gatling creates a new folder for every test execution, by default this will be another folder within its standard results folder. The issue this created was that there was no way, as far as I could tell, of getting the details of that folder from the command line response.

What you can do however is to define the root folder for the results in the command line by adding the -rf switch.

-rf path/to/results/folder

Gatling however will still create a subfolder within that for every test run. This where NPM comes to the rescue again with the “fs” package which allow monitoring of a folder and raising of an event on any changes. Therefore you can create a folder specifically for the holding of your test results then execute a test and be confident that the next event on that folder will be the creation of the results folder. The fs callback includes the name of that folder.

function setupResultFolder(test){
var resultfolder = test.config.rootfolder + "results/"
fs.mkdirSync(resultfolder);
fs.watch(resultfolder, {
persistent: true
}, function(event, filename) {
test.setResultFile(filename);
});
return resultfolder;
}

Then to get to the results you can just access the relevant files within that folder. I was only interested in the raw results so I was looking at simulation.log.

Notifying of test completion

To finish off I just added a simple event that is raised when the test is complete

this.completeTest = function(){
this.complete = true;
this.emit("testComplete");
}

Complete code

The complete code comes in at <50 lines.

var fs = require('fs');
var sh = require('sh');
var events = require('events');
var eventEmitter = new events.EventEmitter();

var GattlingTest = function(id, config) {
events.EventEmitter.call(this); 

this.id = id;
this.complete = false;
this.resultfile = "";
this.resultfolder = "";
this.config = config; 

this.start = function(){
var test = this;
this.resultfolder = setupResultFolder(this);
var command = this.config.gatlingRoot + ' -sf ' + this.config.rootfolder + ' -s ' + this.config.testClass + ' -nr -rf '+ this.resultfolder;
console.log(command);
sh(command).result(function(){test.completeTest()});
}
this.setResultFile = function(resultfile){
console.log("resultfile for " + id + " set to " + resultfile);
this.resultfile=this.resultfolder + resultfile + config.resultFileName;
console.log(this.resultfile);
}
this.completeTest = function(){
console.log("complete");
this.complete = true;
this.emit("testComplete");
}
this.results = function(){
return fs.readFileSync(this.resultfile, "utf-8");
}
}

GattlingTest.prototype.__proto__ = events.EventEmitter.prototype;
module.exports = GattlingTest;

function setupResultFolder(test){
var resultfolder = test.config.rootfolder + "results/"
fs.mkdirSync(resultfolder);
console.log("Tracking ... " + resultfolder);
fs.watch(resultfolder, {
persistent: true
}, function(event, filename) {
console.log(event + " event occurred on " + filename);
test.setResultFile(filename);
});
return resultfolder;
} 

Leave a comment

Filed under Code

It’s not just about being the fastest…

First published on Performance Calendar on 21st December 2013

I have been doing a lot of work this year about creating a performance culture within a company. This is an essential step on the route to creating good performance in your products and only when you start treating performance as a first class citizen will you start to get buy in to the time and effort needed to create performant systems, both from the developers and from the business as a whole.

This is a costly process and requires an investment of time and effort from a company to fully implement and the business will expect to see benefits back in return for this investment.

I would like to address two common problems that cause the value of this investment to be undermined.

1) Solving the technical challenge not the business problem

When I have talked to some developers who are getting into performance and trying to get buy in from their business and are struggling I often hear the same complaint – “why don’t they realise that they want as fast a website as possible?”.

To this I always answer – “because they don’t!”. The business in question does not want a fast website. The business wants to make as much money as possible. Only if having a fast website is a vehicle for them doing that do they want a fast website.

The key point I am making here is that it is easy as a techie to get excited by the challenge of setting arbitrary targets and putting time and effort into continually bettering them when more business benefit could be gained from solving other performance problems.

To address this we need to take a step back and address exactly the performance problems that are being seen and how they are impacting the business.

These may be slow page load when not under load. Equally likely will be that they suffer slowdowns under adverse conditions, they suffer intermittent slowdowns under normal load, they use excessive resources on the server necessitating an excessively large platform or many other potential problems.

All of these examples can be boiled down to a direct financial impact on the business.

As an example, one company we worked with determined that their intermittent slowdowns cost them 350 sales on average which would work out to £3.36m per year. This gives you instant business buy in to solve the problem, a direct problem for developers to work on and a trackable KPI to track achievement and know when you are done after which you can move on to the next performance problem.

Another company I worked with had a system that performed perfectly adequately but was very memory hungry. Their business objective was to release some of the memory being used by the servers to be used on alternative projects (i.e. reduce the hosting cost for the application). Again a direct business case, a problem developers can get their teeth into and a trackable KPI.

To sum up – start your performance optimisation with a business impact, put it into financial terms and provide the means to justify the value of the development efforts to the business.

2) Over optimisation results in technical debt

The second issue I would like to address is the idea that we should always build the most ultra performant system.

No – we should always build an APPROPRIATELY performant system.

Over optimising a system can be just as negative as under-optimising. Building a ultra performant, scalable web application takes many factors such as…

Time

Building highly performant systems just takes longer

Complexity

Highly optimised system tend to have a lot more moving parts. Elements such as caching layers, NoSQL databases, sharded databases, cloned data stores, message queues, remote components, multiple languages, technologies and platforms are things that may be introduced to ensure that your system can scale and remain performant. All these things take management, testing, development expertise and hardware.

Expertise

Building ultra performant website is hard, it takes clever people to devise intelligent solutions, often operating at the limits of the technologies being used. These kind of solutions can lead to areas of your system being unmaintainable by the rest of the team. Some of the worst maintenance situations I have seen have been caused by the introduction of some unnecessarily complicated piece of coding designed to solve a potential performance problem that never materialised.

Investment

These system require financial support in terms of hardware, software and development/testing time and effort to build and support.

Compromises

Solving performance issues often is done at the expenses of good practice or functionality elsewhere. This may be as simple as compromising on the timeliness of data by introducing caching but often maybe accepting architectural compromises or even coding good practice compromises to achieve performance.

Summary

The warning I want to give here is to understand your performance landscape, set your KPIs, define your performance non functional requirements, set performance acceptance targets or whatever method you use to determine how your application is expected to perform.

This action is essential to allow developers to be able to make reasonable assessments of the levels of optimisation that are appropriate to perform on the system they are developing.

Leave a comment

Filed under Opinions

Progvember – a month of coding in November

A year ago I came across the National Novel Writing Month which is basically a challenge to aspiring authors to dedicate themselves to complete the writing of a novel within a month. This seemed like a good challenge; after all, we all have a novel within us somewhere. However, what I like doing better than writing a novel is computer programming. Therefore my novel kept getting pushed back by bits of development I was doing.

At the same time I was writing an iPhone app for my son for his Christmas present. This was good for two reasons – 1) it was a present for him and 2) it was a chance to finally do some different things I had been wanting to try out in Objective C. So, I had the project, I had the motivation and most of all I had a fixed deadline – Christmas was not going to be pushed back because of an incomplete piece of software.

Like most developers I have a list of new technologies, languages and patterns that I want to try out and just like the great novel that is inside us all there are always more important things to be done. There is no defined project and no defined deadline so nothing really gets done.

What I liked about the “nanowrimo” concept was that it created a focus, it created a deadline to work towards and it created a sense of community of other people all working towards the same goal. So why not create something like that for development?

Hence, I came up with the concept of Progvember (programming in November – get it?!). This would be a month where developers set themselves a challenge to define and project and complete it – to take some of the ideas that have been there for a while and set a deadline to get them done. November seemed a good time of year to do it – long dark nights, cold, wet weather (apart from our southern hemisphere Provemberers) and all that most people are doing is sitting in, or maybe growing a moustache for Movember.

On the back of this I wanted to also create a sense of community of other developers who are also setting themselves the same challenge, and also to create a forum where people can ask/offer to help on other people’s projects.

And so I have created Progvember.com. Anyone is welcome to come and sign up.

Leave a comment

Filed under Progvember

Making Slow Websites Faster, Quickly

Intechnica recently hosted an event called Faster Websites – aimed at discussing with retailers some of the means and methods that can be adopted in improve performance of their online presence.

As part of the preparation for this event we evaluated the websites of the potential attendees as well as the top 50 leading retail sites in the UK.

As would be expected there was a wide range of results from the very fast to the quite slow.

I had a look into the performance of some of the slower sites to see if there were any quick wins that I could propose to improve their speed. I did a very limited investigation using WebPageTest under normal traffic conditions (as far as I know) and came up with the following observations.

Most follow general good practice

With very few exceptions most were doing the obvious things (minifying javascript, compressing content, using a CDN etc). This illustrated that there was unlikely to be a simple, config based solution to the slowness.

Slowness was caused by client side, not server side, issues

None of the sites spent more than 0.5 seconds waiting for a server response, indicating that the server is not struggling to return content. This is as would be expected for a site homepage that is not under load.

Very large page weights – especially javascript

A large amount of the slowness was being caused by simple page weight issues. All of these sites were requesting well over 100 elements with some requesting over 200 items.

The largest chunk of this was images, as was to be expected. As these are retail sites, there is an argument to be made that high quality imagery is to be expected and is essential for business. However one site was requesting close to 70 images, taking 3.5mb of data. It would certainly be worth investigating whether these images could be compressed, loaded asynchronously or even just removed.

Of more concern to me across all these slow loading sites was the general size and number of javascript files that were being requested. Sites were requesting over 40 distinct javascript files and file sizes totalling 300kb+ were common, with one site topping 600kb of javascript content. In most cases this javascript had already been minified and compressed. In all these cases the use of javascript should be fully investigated and rationalised.

CSS and even HTML files were similarly large (50kb+) and could equally be rationalised.

Complex DOMs

Most of the slower sites had more complex DOMs, often topping 2,000+ elements. This does not necessarily cause an issue but when combined with complex javascript and content manipulation it can easily lead to slowdown.

In the examples I tested this was illustrated in how long the startup event was taking for some pages. In one example this took over 1.5 seconds. This illustrates a page that is far too complex and needs rationalising.

3rd parties causing slowdowns

There were a couple of sites that were slowed down in their load time by waiting for responses from 3rd parties for content (e.g. from Facebook). In one case this was causing a 12 second delay.

As a site owner you really can’t let your performance be in the hands of 3rd party content. You must aim to make all these calls asynchronous after page load if possible.

Part of any performance assessment should include poor performance of these elements.

Overall the impression that I got was that effort on these systems was still mainly focussed on providing server side performance, and the state of the client side was being generally ignored beyond following standard good practice. A more considered approach could easily (days not months of effort) speed these pages up dramatically.

Leave a comment

Filed under Opinions

Treat Performance as a First Class Citizen

Steve Souders wrote a very interesting blog post recently (http://www.stevesouders.com/blog/2013/08/27/web-performance-for-the-future/) about treating “performance as a discipline”.

The premise of this article was that performance is such a fundamental issue that a separate team should be created to focus purely on performance.

Seeing this view put in writing by one of the leaders in the performance arena was very refreshing to me. At Intechnica we have been pushing this message for a number of years and it is nice to finally see it gaining some traction within the industry. We formed the company to deliver this capability into other companies.

For me the battle that we face day to day is the battle to get people to treat performance as a First Class Citizen within the development industry. There does often still seem to be a sense that good performance is just something that developers should be able to achieve with more time or more kit.

The reality of course is that that is true up to a point. If you are developing an average complexity website with moderate usage and moderate data levels then you should be able to develop something that performs to an acceptable level. As soon as these factors start to ramp up then performance will suffer and will require expertise to solve the problems. This does not reflect on the competency of the developer, it is just a reflection that a specialised skill is required.

The analogy I would make to this would be to look at the security of a website. For a standard brochureware or low usage site then a competent developer should be able to deliver a site with sufficient security in place. However when the site ramps up to a banking site you would no longer expect the developer to implement the security, there would be an expectation that security specialists would be involved and would be looking beyond the code to the system as a whole. This is no negative reflection on the developer, just that the nature of security is so important to the system and so complex that only a specialist can fully understand the solution that is required. This is acceptable because security is regarded as a First Class Citizen in the development world.

Performance issues often require such a breadth of knowledge beyond simply looking at the code (APM tooling, load generation tools, network setup, system interaction, concurrency effects, threading, database optimisation etc.) that specialists are required to be able to solve them.

These are not better than developers, they just have different skills.

At Intechnica we have run projects where we have a performance scrum team. Leaving developers to deliver functionality with usual performance best practice but no specific defined KPIs. Then applying the acceptance KPIs afterwards and passing failures onto the the performance scrum team backlog.

We have also developed projects with performance engineers within each scrum team and applied KPIs as part of the feature we were developing and to the system as a whole.

Both are valid approaches. There are other valid approaches. As long as performance is treated as a First Class Citizen then you will be on the right track to performance success.

Leave a comment

Filed under Opinions

The Joy of Performance Improvements

I think that most developers get in to development for one of two reasons – they like solving problems, or they like building things. For me when I got into it is was the latter. I loved the fact that I would build things that people would then use (admittedly, I have often also built things that people didn’t use).

However as my career has progressed I have realised that by far a more enjoyable and satisfying element of development is working on performance improvements.

Since I moved into management I don’t get to work on development as much as I would like (which would be all the time), but recently I have built a system from scratch to delivery while also working on some performance improvements to an area of a site that had been declared “not fit for purpose”.

Comparing the two experiences, there was just no comparison. I’m talking about the experience of taking a failing system, investigating, manipulating, testing, making amendments, re-testing, assessing the impact of tiny changes, assessing the likely impact of large changes, finding out why elements that perform independently do not perform in combination, re-testing again, making another small improvement, re-testing, re-investigating and repeating it until performance is acceptable – then doing a few more rounds until performance is awesome. This was soooo much more enjoyable and rewarding than just building a system from scratch.

I’d recommend any developers to try to get involved in this side of development. That’s why I set up a performance management and improvement company!

By the way, if anyone is interested, the outcome of the performance improvements was taking a process that was taking 45 minutes to run down to running in 3 minutes. The target was 10 minutes.

1 Comment

Filed under Opinions