Building reliable, cloud friendly software

One of my main interests is in architecting and building resilient enterprise apps. Working in critical financial services domain apps spanning payments, risk, settlement and trading where availability, correctness, resilience are very important, I had put lot of emphasis in the “trustable” aspect of software systems… more like the trust we put in when we sit in car and start driving it.

Throughout the years we have moved from traditional client server architectures, N/Three tier web based architectures and then into more “cloud” scale and cloud friendly software products. With the movement to more distributed application environments , traditional client/server and monolith architectures have been compromised and questioned. More focus has to be given in balancing between the consistency of the data/applications, availability of the applications (especially serving global demand) and then how the partition/disconnect between multiple components affects the consistency and availability. The famous CAP theorem by scientist Eric Brewer nicely explains how the “trilemma” between Consistency, Availability and Partition can be managed as at one point in time one can only achieve two, for e.g. when there is a breakage between connectivity between many components of the system ( Which is true for most of the large scale applications we have now ) we cant achieve both consistency and availability. We have to choose between;

  1. Selecting consistency over availability by reducing the availability of the system or part of the system

  2. Selecting availability over consistency by risking the consistency of the data (and impacting the Transactionality )

With the advent of public cloud, apps became more distributed… The growth in internet and user base exponentially growing required highly scalable, globally present ( 24*7) applications. To support scalable applications architects and developers have to focus on many aspects like 1) Availability 2) Time to market 3) maintainability/supportability 4) Reliability and also 5) economical.

Early cloud providers like Heroku published principles like 12 factor apps ( with additional requirements been introduced in the recent past ) that address many dimensions that are key for “trustable” applications.

I would like to think there are 12+ factors that requires to make a modern application resilient, secure, trustworthy, available, and easy/cheaper to manage. These can affect two types of quality;

  • The production environment quality covering stability, scalability, security, performance, correctness

  • The overall engineering/delivery process’s agility, time to market and efficiency

Below summarises at a very high level how these factors affect the two dimensions. In subsequent blogs I intend to go into each one of these into detail citing some examples as well.

These are not mutually exclusive. They have dependencies between them and deficiency in one can actually impact another one as well. For e.g. unless security was considered as a “first citizen” and baked into the fundamental software architecture , addressing the entitlements, authorisation and authentication as an “afterthought” will be complete disaster! I have seen many software projects where initial focus has been to quickly deliver “functionality” and not bake in the non-functional architecture and security considerations failed miserably at the latter part of the delivery iterations.
I am a massive fan (and practitioner) of “agile at scale” and “evolutionary architecture” and have been practicing both for some time…. ( Will write more about both in a future blog). Factors like backing service, concurrent processing, stateless services clearly depends on a good technical and solution architecture. And most times getting the detailed front to back solution architecture right first time is difficult. We would start with 80-90% confidence and evolve the architecture when more clarity is gained in requirements and business processes.

My subsequent posts will talk about these topics. But I like to breakdown the taxonomy to few categories like

  1. Building reliable processes/ services

  2. Reliable inter-process communications

  3. Domain driven architecture and domain decomposition

  4. Development and delivery agility that would cover 1) config externalisation 2) code management 3) dependency management 3) performance engineering / telemetry

Previous
Previous

Addressing scalability and performance in complex enterprise platforms

Next
Next

Pragmatic Domain Driven Architecture -1