<img height="1" width="1" style="display:none;" alt="" src="https://dc.ads.linkedin.com/collect/?pid=1005900&amp;fmt=gif">

Insights

Perspectives on the National Rail Web Site Outage

24th February 2017 by 
Dr. Manzoor Mohammed Prepare for Peak

The Daily Express reported yesterday that the National Rail web site was down yesterday evening at around 5pm.  Storm Doris caused wide-scale disruption across the UK’s train network.

It seems that disruption was not just limited to the network infrastructure but also impacted the National Rail IT infrastructure as well.  Many of the millions of train users who experienced disruption would have looked to the National Rail website to provide them the essential information on when and how they could get home.

Users expressed their frustration on Twitter. Capacitas staff based in London experienced the frustration first hand when they saw the page below.

Site Performance Testing

They were not the only rail information site to experience difficulties. South Eastern Railways also experienced issues last night:

Site Capacity Monitoring

The National Rail platform has three primary engines, Darwin, KnowledgeBase and OJP. The National Rail website says that across all channels including the website it handles 2.5 million enquires per week.  It’s difficult to know from outside which of these engines had problems during Storm Doris.

Event driven demand can generate peaks of anywhere from four to twenty times normal demand profiles depending on the nature of the event. Of course, it is very difficult to plan for these events due to the scales involved. In National Rail’s case, during the busiest hour we estimate 800,000 journeys were taking place. Also, with the ubiquitous use of mobile technology there is a very large event-driven user base.

Download our Guide to Ensuring Website Performance During Trading Peaks here  and prepare for your peaks effectively.

 

While National Rail Enquiries had successfully weathered the storm of Southern Rail strikes and many other high traffic events, the widespread network disruption caused by Storm Doris was too much for it.

How can you prepare for unplanned EVENT-DRIVEN peaks?

There are four activities you can do to make sure that your system can still meet these unexpected peaks:

  1. Understand the bottlenecks in your system that could prevent you using all the available capacity. This not as easy as it seems.

    There is a level of expertise required to identify risks in the platform and know which one of these will act as a bottleneck in your system. Sometimes risks in the platform are not picked up because there is not the level of expertise available in the organisation to know what “good” looks like.  In a recent conversation with a client running an ecommerce platform, we highlighted that the product search was not behaving as expected. The client felt that this was not a risk because “it had always behaved like this and hadn’t caused us problems in the past”.

  2. Make sure your frequently called business processes are as efficient and scalable as possible.

    The more efficient a process is, then the more scalable it is likely to be.  So if the most frequently called business process on your website is searching and this process is likely to increase substantially during an unplanned event-driven peak – then you want to make sure that it is as efficient as possible. The question then is what is efficient, i.e. what does good look like? Again, a level of expertise is required to judge whether the process is efficient and scalable. This type of analysis can be done based on live data and also looking at measurements made in a test environment. 

  3. Understand the heavy hitters and have an agreed process to turn these off during times of exceptional demand.

    There will be inevitably some business processes on your website that are less efficient than others. These are sometimes lower volume processes but their website capacity usage is high, e.g. calls to third parties. These can be turned off during periods of high demand to ensure that the key business processes can keep running.

  4. Keep and analyse all the data from previous crashes!

    The final activity is to learn from previous unplanned-event driven peaks. There will be a wealth of data available from your website. Make sure all this data is stored and analysed to understand (1) – (3) above. This will allow you to invest your time and money wisely to make sure you are prepared for the next unplanned peak.

CONCLUSION

These crashes are high profile and cause great frustration for users, however planning and preventing them is non-trivial. On the plus side, there should be lots of data available to learn the lessons from the current crash to improve the system scalability and efficiency.

Guide to ensuring website performance during trading peaks

  • There are no suggestions because the search field is empty.