Architecture · cloud-architecture.io

Introduction

“Architecture” is often hard to define as the term is used to describe many facets of structure, behavior, design, activity and documentation. The concept of architecture is also relevant at different levels of an IT system as the components of one system become a reusable platform for the next system. Ultimately the emergent behavior caused by layers of architectural dependencies and reuse can be difficult to predict.

Architecture is a high level view of a system in the context of its environment, dependencies and technology. Architecture describes structure, behavior, integration and aesthetics.

Architecture must be solid, useful and beautiful.

Architecture is concerned with explaining the structure, behavior, integration and aesthetics of a system (or system of systems). It explains common ways of doing things (patterns and mechanisms), how Non-Functional and Functional requirements are met, technology choices, how systems are put together and provides a common technical direction for teams working with them. Architecture provides a shape, and look and feel, to the internals of a system that are the foundation for the ultimate external behavior.

The idea of an “Architect” with responisbility across such a broad focus is analogous to that of a structural architect concerned with buildings. Indeed, in 25 BCE Vitruvius (a well known Roman writer and architect who inspired Leonardo Da Vinci, hence the Vitruvian Man) described an architect as:

The ideal architect should be a [person] of letters, a mathematician, familiar with historical studies, a diligent of philosophy, acquainted with music, not ignorant of medicine, learned in the responses of jurisconsults, familiar with astronomy and astronomical calculations.

Vitruvius is famous for stating that architecture must be solid, useful, beautiful (De architectura). The same three qualities relate to software architecture. Despite architecture being a fine balance between a subtle science and an exact art we must realize it is only useful if it is aligned to requirements and becomes executable.

Good architecture speeds you up in building some software. Bad architecture is lots of diagrams and documents.

Architecture, not documentation

Modern software development values working software in the form of quality releases from short development cycles over comprehensive documentation, business analysis and enterprise architecture documentation. Choosing how much architectural work is necessary up-front, and how much architectural documentation is necessary, is difficult. The amount of documentation you need to solve a problem is often less than you need to explain it to others.

Traditional document-focused development methods promoted large up-front effort to detail the architecture and system design prior to development, usually in document or model form. As well as the inherent late risk mitigation issues in waterfall processes this could often cause “analysis paralysis” where architectural work was seen as an endless diagram drawing exercise.

Agile and iterative methods have focused on working software over documents and designs, however this doesn’t mean that no documents or designs should be produced. It is almost always useful to have some description of an architecture before and after development, even if it’s just a statement that the team’s usual architecture is used.

Good architecture addresses how we’ll resolve the major technical risks, communicates between the team the overall structure of the solution and works out how our solution will meet the functional and non-functional requirements. Knowing when we’ve worked out enough architecture so that we’ve reduced the complexity of the problem into manageable sizes for a team is a difficult challenge.

Doing too much architectural analysis or elaboration, either through abstract design and modelling, or practical spiking (writing small amounts of throwaway code to demonstrate the feasibility of a technical approach or architectural idea) will slow down a development project and increase the risk of wasted work. Value is only achieved once working systems are in the hands of the customers/users.

Alternatively, not doing enough architectural analysis can lead to significant architectural refactoring during a projects lifetime, as key requirements can’t be met based on the current architecture, again leading to wasted work and slowing down the project.

Architectural work, and corresponding documentation can be more implicit when complexity is low, rapid feedback is working and there are strong, high trust, relationships. Architectural analysis and documentation need to be more explicit when work is more complicated, where risk is higher and where there are more cross-team or distributed communication concerns.

High complexity architecture, in contrast, is often subject to a lot of change. Up front work on structure and behavior is prone to extensive rework, and although we do not recommend completely emergent architecture, in these situations managing for emergent structure and behavior by architecting for change becomes the best way to succeed.

When doing up front architecture, especially in the form of documents and models, we need to be careful to consider architectural work in the context of the team’s definition of done. Typical levels of done don’t normally include “analysed”, “architected” or “designed”. Although these terms might be meaningful to describe internal team development states they do not constitute working software and are only a step on the way to creating value.

Architectural models and documentation do not constitute progress. Good architecture is working software, and that’s progress.

Levels of Architecture?

Architecture can be thought of in three levels:

Enterprise Architecture applies architecture principles and practices to guide organizations through the business, information, process, and technology changes necessary to execute their strategies
Solution Architecture applies architecture principles and practices, addressing structure, behavior and aesthetics, to a related set of products (or system-of-systems) that collectively generate business value through their end-to-end use. Solution architecture focuses on integration and behaviour.
System Architecture applies architecture principles and practices to a particular software system focusing on structure and behavior to address functional and non-functional requirements (usability, reliability, performance and scalability etc.).

How does Cloud change Architecture?

Traditional architecture practices are there to reduce risk. Often that risk is simply the cost of being wrong. That cost was often in infrastructure, servers and software. Because Cloud Computing makes all of things temporary, the cost of being wrong, and trying something new is greatly reduced. When it can take just a few hours to create a large set of infrastructure to try and idea, it’s more expensive to have a half-day meeting about a problem than to simply try it and throw it away afterwards, only paying for the temporary resources used to prove something.

Cloud Computing changes how we approach architecture because the cost when we get things wrong is reduced. That means that architecture practices with cloud are more focussed on experimentation, empiricism and spiking. Because structure and scaling can be changed dynamically, Cloud architectures are typically focussed on the flow of data, rather than the structure and deployment of a solution. As a result cloud architecture diagrams are typically structured in a left to right format describing data moving between high-order services rather than the traditional top-down structural component or class diagrams.

Edit this page

Principles

Having a common understanding of architecture, regardless of the format of that understanding (documents, models, sketches, whiteboards, implicit team knowledge), is described as “Intentional Architecture”.

Good Architecture and design have the folliowing characteristics:

Intentional structure and behavior
Highly modular: consisting of separate services with well-designed APIs
- Services are highly cohesive
- Services are loosely coupled
- Services have low algorithmic complexity
Avoids duplication
Well described Services: modular elements have simple expressive meaningful names enabling readers to quickly understand their purpose
Runs and passes all defined tests or acceptance criteria
Lightweight documentation
Use Infrastructure-as-Code

Some additional Cloud Architecture Principles that are often useful:

Use Serverless event-driven architectures first, drop to PaaS if required, and IaaS of strictly necessary
Seperate Compute and Storage
Express Services as APIs using ubiquitous standards (HTTPS REST + JSON)
Use Managed Services where possible rather than inventing your own
Use Immutable Infrastructure - rebuild the infrastructure when you deploy, instead of editing what’s running on existing infrastructure
Don’t SSH or exec in, build new
Prove, don’t guess

Edit this page

Complexity

Any problem or piece of work can be considered in terms of its complexity. Problems can be simple, complicated or complex.

Different types of problems require different types of solutions, and not every approach is suitable for every type of problem. Complexity is one of the most significant factors. Many organizations will have work at multiple levels of complexity as part of their portfolio, and often treat all of them in the same way, incorrectly applying the same processes without considering how they may be inappropriate.

Simple work is that which is well understood. Risk is low because we understand the problem and the solution is obvious to everyone involved. In these cases, traditional forward planning and management with a focus on reducing variability to improve efficiency can work well, as long as there is rapid enough feedback to deal with disruptive change. Continuous flow models and/or service management are well suited to this work.

Complicated work requires analysis to understand the connections between cause and effect, or required inputs and outputs. Investigation and some creative thinking is necessary to understand a problem and then propose a solution. Agile/Iterative models are well suited to this work. Complicated endeavors are built upon specialist knowledge and aren’t just obvious. They might take a lot of time and skill to do but are ultimately predictable processes. For example, a recursive sorting algorithm in computer programming is complicated, it involves variables, loops and recursion - it’s not easy for everyone to understand but it’s very predictable. Building a car engine is complicated.

Complex work involves many inter-dependent integrating parts that interact in a number of ways resulting in unpredictable, emergent outcomes. The relationship between cause and effect isn’t predictable, undermining planning, estimation as well as analysis and design practices. Organic networks, outcome orientated teams and high complexity architecture practices are used for this kind of work. Complex endeavors are those which have many influencing parts and events whose interactions are not predictable. Driving a car is complex.

Complicated is easy if you can get the right skills lined up. Complex is always hard.

High-Complexity Architecture

High complexity architecture involves applying architecture practices and principles to business and technical problems involving many interdependent integrating parts that interact in a number of ways resulting in unpredictable, emergent outcomes.

Most organizations work with complicated problems, not high complexity issues. However, to innovate or invent ahead of competitors or adversaries, organizations often have to work in higher complexity areas. When working on very new ideas, or new problem spaces, there can be a lack of knowledge and tried and tested techniques; this widens the cone of uncertainty significantly. Where there are a large number of interacting parts and a large number of interactions complexity can be very high.

Very high complexity architectures are those in which a number, or all, of the following conditions are true:

There is extensive integration between many independent components, technologies
The work is highly speculative and unpredictable, we expect to fail fast and often
There isn’t a large amount of domain knowledge or experience in the field
One or more of the dimensions of an architectural profile are “Very high”
Estimates are extremely uncertain
Risks are very high
Extremely large scale (of data, change and distribution)
There is a complex logical relationship between inputs and emergent properties or behaviors

Part of the job of architecture is to reduce complexity. When working in high complexity systems, the intent is often more important than the detail. Work is often highly speculative as people are inventing new techniques that may or may not work. Empirical feedback becomes more useful than specification, and due to the experimental “sensing” nature of high complexity work details will change significantly. Failure is always likely in high-complexity work and so management practices that assume success will fail to cope with complexity. Complexity cannot be controlled, it can only be responded to. The following practices fail or are extremely dysfunctional for high-complexity problems:

forward planning - We don’t yet know how we’re going to solve the problem, if we even can. Plans will change quicker than they can be written down. “Probing and sensing” is more appropriate.
detailing requirements - The intent is important, not the detail of how. Our technical solutions are likely to change significantly meaning that details will change. A sub-optimal solution might be the only cost-effective option significantly affecting scope.
analysis and design - Decomposing problems into manageable chunks isn’t the right answer when the complexity is in the number of “chunks” and their interactions. Instead we need to manage emergent properties and create architectural experiments (spikes) to prove or disprove our ideas.
user-centric design - Users are unlikely to have resolved the high complexity issues and may not even understand the problem fully. Giving the users what they want, often a good idea, is often the opposite of a high complexity solution. Users seek simplicity, and although this is a good idea in terms of interaction with high complexity architecture, designing interaction doesn’t help solve the problem. Since scope and technology are likely to change, UX detail is best left until the solution is more stable.
estimation - By the nature of the complexity estimations will all be extremely uncertain. Numbers may be significant orders of magnitude out. Instead organizations are better off funding experimentation in a number of timeboxes to see how much complexity can be produced in those timeboxes through spiking and experiments.

When working with high complexity architecture, standard Enterprise Architecture (if it exists) may need to be abandoned to deal with emergent behavior. However, the cost of not using Enterprise Architecture should not be considered as an inhibitor to high complexity work if the business opportunity is significant enough.

In Solution (multi-system) Architecture for high complexity architectures, a common dysfunction is to try and connect all of the architectural information and system architectures. We’ve seen organizations create massive over-complicated models that try and resolve integration complexity by documenting it all. Well-meaning, but ineffective and wasteful. This approach is counter-productive because the complexity comes not from the number of items, but the dynamism of their interactions and unpredictable aggregate behavior. These large models are, at best, a snapshot view of complexity at a single point in time but they don’t help anyone solve the problems.

Instead, when working with high complexity Solution and System Architectures we recommend architecting for change, not solution requirements. By that we mean that since requirements, scope and emergent behavior are so likely to change the only thing we really know the architecture needs to support is change. Therefore, focus on the principle, technology and mechanisms that enable change such as:

High cohesion, low coupling principles
Integration architecture (message passing, queueing, elastic deployment, discovery, data formats
The, relatively simpler, complicated parts
Possible reduction in complexity through low algorithmic complexity, refactoring out components, heuristic approaches
Use of serverless/event-driven architectures to facilitate changing approach
Focus on data flow, not structure of services
Use cloud managed services to reduce complexity and infrastructure management wherever possible to allow more effort to be spent on the business problem

Since the behavior and properties of high complexity systems are emergent we recommend creating a “pull” towards Business Value through rapid, preferably continuous, integration and deployment. Creating measures that reflect desired outcomes (not intermediary stages or logical decomposition) and test every change in terms of moving towards or away from “good” emergent behavior.

At the extreme end of this high complexity scale, using machine learning and evolutionary techniques to generate possible solutions and test for emergent properties can be useful to speed up iteration cycles.

Edit this page

Architectual Synthesis

Complexity can occur in different areas of a problem, and requires different architectural approaches to simple problems.

Architectural Synthesis is the creative problem solving activity that turns a set of requirements or direction into an early candidate architecture identifying a possible solution.

The “magic sauce” in software design, architectural synthesis is the activity that shapes initial candidate architecture. Based on user needs, requirements (which will have little detail but hopefully represent scope), and investigation into non-functional needs an Architect or team will come up with a number of options for how to meet the needs, or solve the problem. Architectural synthesis is dependant on the complexity of the problem being addresed:

For simple pieces of work, architectural synthesis is implicit as the answer is already obvious to everyone.
For complicated pieces of work, investigation into where areas of complication are (using an Architectural Profile) backed up with experience and experimentation/spiking tend to lead pretty quickly to a candidate architecture.
For complex work we recommend a series of experiments/spikes to investigate areas of complexity or try ideas that might work towards delivering business value. If possible, measurement of emergent outcomes will allow multiple candidate architectures to be compared against each other.

Architectural synthesis is a creative process, especially when dealing with anything other than simple work. It is often the point where the level of complexity will be recognized. Investigations and spiking will often change understanding of complexity and risks, either uncovering them or addressing them.

We do not recommend following a standard process for synthesis since it’s essentially creative idea generation, and it should not be rushed. In our experience critical thinking and logical analysis can help architectural synthesis.

Cloud is a great enabler for dealing with complicated and complex architectural synthesis since it allows different ideas to be tried, quickly and monitors for cost, performance and stability. Being able to rapidly change architectures through Infrastructure-as-Code reduces the cost of architectural experimentation, or going down a few dead ends while exploring a problem.

Edit this page

Architectural Profiling

Architectural Profiling is a technique used to understand the relative complexity of different areas of concern for an architecture.

Architecture addresses a number of concerns within a system and so a useful early approach is to consider the profile of the architecture in terms of the relative complexity of each of these areas. This helps to give a feel for the shape of the requirements, architecture and overall solution. We can also identify possible areas in which we can reuse existing components, frameworks or maybe standard packages of requirements, components and tests (e.g. serverless applications or cloud templates). Architectural profiles may be useful at any level of architecture (Enterprise, Solution and System) and are useful to understand areas of complexity in the requirements space, especially for understanding the relative complexity of non-functionals.

We consider a number of “dimensions” representing the various concerns of architecture. Starting with the FURPS scale (Funtionality, Usability, Reliability, Perforamce and Supportability), but also considering other important dimensions such as security or cost-optimisation we extend the model in whatever meaningful way is necessary. The dimensions used aren’t set in stone, we use the dimensions that are meaningful for the organization and type of project.

Here’s an example profile of an application that does some significant data processing, needs to do it reasonably quickly but not excessively so and has got to do some pretty visualizations of the data. Other than that it’s fairly straight forward. Initially we’ll discuss the non-functional aspects.

The x-axis here is close to the standard FURPS scale with a couple of extra dimensions - as mentioned above we might customize the dimensions to the context of the project or organization.

The y-axis ranges from simple to high complexity but it is deliberately not labeled (not something we’d normally recommend!) so we can focus on the relative complexity of these dimensions of the requirements, quality, architecture and therefore solution. Adding false accuracy of scale is not worthwhile.

The height of one of these bars helps us shape the architectural approach that the work needs, including how implicit or explicit the architecture and documentation needs to be.

For example, let’s take the security dimension and consider it from simple, complicated and complex.

If it’s empty/simple we know that we don’t need users, audit logs etc. Maybe we just need a user session for personalization but we could pick that up from the browser or OS.

If it’s medium/complicated, like it is in this example, we know that we’re probably going to some user stories around:

logging in
changing password
managing users
managing permissions
checking permissions

In most organizations, and for most developers, these are very common requirements which have been implemented many times. There is unlikely to be a need to do detailed requirements documentation, design or even significant testing as the quality risk is likely to be low. Hopefully we will be able to simply use a Cloud managed service or common corporate authentication and authorization mechanisms, so we may not have to implement any of this stuff directly. Of course just because quality risks are likely to be low doesn’t mean we can assume there are no bugs, a minimal level of testing might still be required.

If the bar for security is higher/complex up then we might need to consider elements such as fine grained security, overlapping groups, encryption, auditing, digital signing, federated user data stores, legal compliance, information assurance, biometric identification, multi-factor authentication, etc. Cloud Services can often play into many of these spaces but few cloud services can currently support all of these requirements directly.

If the bar for security is at the very top, then we are in a high complexity security context. In this case we may be dealing with an unstable cyber security situation such as needing to operate securely in a hostile environment where adversaries are actively trying to compromise our software or operational effectiveness. These situations are not resolved through just up-front design, but through architecting for change, experimentation and learning.

Other Dimensions

As well as the standard non-functional “URPS+” dimensions a common aspect is “Data Processing” which covers the volume and shape of data a system might need to deal with. Simple entity management is typically fairly low, whereas running significant algorithms across that data will push the bar up. Large datasets start to bring in some elasticity and cost concerns, “Big Data” and massively parallelized processing will push the bar up further.

For lower levels of complexity, dealing with simple entity creation, editing and deletion we do not recommend elaborating textual requirements or CRUD stories/simplistic scenarios, similarly we do not recommend creating a lot of designdiagrams that describe CRUD operations for each entity. However a simple persistency mechanism and data model are frequently worthwhile.

Another dimension we frequently use for user facing systems is “Reporting and Visualization” means graphical rendering of data or processing. At its simplest level this dimension can be simple GUI feedback but it can range to interactive touch displays, augmented reality, VR etc. As the bar increases, the requirement for User Experience (UX) practices increases.

The Functional Dimension

The first dimension we title “Differentiating Functionality” which represents the functional requirements which make the product unique. If this bar is low then the product is likely to be a commodity offering rather than a market differentiating product, in which case there should be a strong strategic reason to buy not build. A significant non-functional difference between the proposed product and existing alternatives may be a perfectly valid reason, but we recommend making that explicitly visible.

If, during product evolution, this bar lowers significantly then the product should be considered for retirement or replacement with a commodity solution. Sometimes market disruption will out-move a business in which case cutting losses, and redirecting onto more differentiating business value, is a sensible business decision.

Other than these cases, the functional dimension is typically the source of business value and so is often prioritized by customers and users above the other dimensions. Using an architectural profile is a useful way of balancing the requirements and development work to consider the whole problem. We tend to use a different color for the functional bar to help draw out this distinction and ensure a realistic balance of concerns.

Technical risks, and likely quality risks, will be hiding in any dimension with a high complexity and so will be fertile ground for finding fringe cases. Complex areas are excellent candidates for early iterative development as their implementation can help to de-risk project, programmes and portfolios.

The use of managed Cloud services in high complexity areas can help to reduce complexity, however often at the expense of trading configurability or other non-functional aspects. This can be beneficial or negative depending on your specific context.

Edit this page

Architectural Spiking

Architectural Spiking is running a small technical experiment, building working software to prove or disprove feasibility or a specific hypothesis. Spikes are throw away code they are not integrated into released Products, they are used to prove or disprove a theory.

Spiking is intended to investigate an area, reduce complexity, mitigate a risk or otherwise prove/disprove a theory. By diving deep into the solution space (a vertical “spike”) we can understand whether our early architectural ideas are likely to work. Spikes are particularly useful during Architectural Synthesis.

The purpose of a spike is to test a theory, not to create part of a working product. So although they are built as working software, they are throw-away code, usually ignoring all architectural, style and even good coding guidelines in favor of expediency. Spiking is an excellent form of risk mitigation, and a great way to reduce Complexity through learning.

We recommend that Spikes are formed by articulating a theory and a simple test or two. Although we describe Spikes as “throw-away code” we don’t actually throw them away. We don’t integrate them into our real code but they are useful to keep, alongside their specification/tests for people to refer to later or find out how something was done.

Here’s a simplistic example of a real spike for a system that was looking to access a SQL Server database using JavaScript. If this spike failed the team had other ideas, but this was a feasibility test they needed to answer.

This was the entire documentation for a Spike. The spike involved doing some web searching, running some command lines and writing around 5 lines of code for each theory, which proved both cases. The team simply ticked the tests and saved the following produced assets:

Spike documentation
Links to online guidance
Install scripts
Produced code

The team were then able to reduce a risk related to technical integration (although had to create some new risks around the security implications of accessing a database using local Javascript – but that’s a different story).

Edit this page