When it comes to cloud, infrastructures, and application development environments, Amazon Web Services (AWS) is one of the leading companies (besides Microsoft Azure and Alibaba). In order to provide a better guidance for modern architects and developers, AWS has published its own architecture framework with its latest release in November 2017. In contrast to TOGAF, this framework is less holistic, provides less concrete guidance, but it is also much better designed to support in the era of cloud, connectivity, and integration platforms. It does this by addressing general design principles and by linking architectural decisions to concrete business impacts. It also provides five pillars to achieve a well-architected framework, which are operational excellence, security, reliability, performance efficiency, and cost optimization.
The General Design Principles are as follows:
1. Stop guessing your capacity needs >> Only decide on capacities when you are certain)
2. Test systems at production scale >> Cloud test systems on demand are much cheaper than testing on premises)
3. Automate to make architectural experimentation easier >> Automation where possible enhances cost and time savings, tracking of changes and impacts, and reverting to previous versions)
4. Allow for evolutionary architectures >> Evolvement over time enables an architecture to become business demand-driven and the business can react to innovations and changed requirements or market situations)
5. Drive architectures using data >> Cloud environments have data everywhere and it is easy to collect, so it should be used e.g. for enhanced decision making or communication)
6. Improve through game days >> Regularly schedule simulated events to test your production environment and better understand how to improve it)
In the following, I will provide an overview about the five pillars of the Well-Architected-Framework and the design principles corresponding to every pillar:
- Perform operations as code >> Using codes limits human error and enables a consistent response to events)
- Annotate documentation >> The creation can be automated in a cloud environment and its benefits should be exploited)
- Make frequent, small, reversible changes >> Changes should be small enough to be reversibly again and that they have no big impact in case that they fail)
- Refine operations procedures frequently >> This can be done on “Game Days” as described in the general design principles section)
- Anticipate failure >> Anticipate sources for failure, what scenarios there could be and how to react)
- Learn from all operational failures >> Share lessons learnt from failures across the organization so that everybody benefits from it)
- Implement a strong identity foundation >> Give the least privilege as possible and centralize its management)
- Enable traceability >> Monitor, alert, and audit actions and changes in real time and, if possible, react automatically)
- Apply security at all layers >> Secure every instance and on every layer instead of only focusing on an outer layer of security)
- Automate security best practices >> Include software-based security mechanisms and controls as code)
- Protect data in transit and at rest >> Classify data, encrypt, add time stamps, and minimize direct human access to data to minimize human errors)
- Prepare for security events >> Align to the organizational incident management process and run regular simulations to automate and accelerate the speed of detection, investigation, and recovery)
- Test recovery procedures >> Cloud enables the exact recreation of scenarios that led to a failure, so that different strategies can be tested)
- Automatically recover from failure >> Use KPIs and add automated recovery processes where possible)
- Scale horizontally to increase aggregate system availability >> Replace large resources with smaller and independent ones to mitigate the risk of one resource failing)
- Stop guessing capacity >> Only decide on capacities when you are certain; see general design principles)
- Manage change in automation >> Infrastructure changes should be automated and a “real change” should only occur if automations need to be changed)
- Democratize advanced technologies >> Try to consume modern technologies that you lack the skills for via services from third party providers)
- Go global in minutes >> Deploy your systems across different regions to provide lower latency to customers)
- Use serverless architectures >> Minimize the usage of own servers to avoid the need to manage them)
- Experiment more often >> Try different options for instances, storages, configurations etc. to optimize them)
- Mechanical sympathy >> Align your technology approach to your goals)
- Adopt a consumption model >> Adopt your resources to the business demand and requirements)
- Measure overall efficiency >> Understand what the architecture brings the business)
- Stop spending money on data center operations >> Use cloud service providers for data center operations work to save resources and focus on projects and business)
- Analyze and attribute expenditure >> Identify business owners of IT components to enable an improved ROI management)
- Use managed services to reduce cost of ownership >> Reduce servers for operational tasks like sending emails or managing databases, as service providers can do this at higher scale and lower marginal costs)
The Review Process
The review process describes in high-level terms, how the assessment of the principles should be done. For AWS, this should be a lightweight process, which is taking rather hours, instead of days and it should be repeated multiple times across the architecture lifecycle. AWS states that it is important to have a conversation (not an audit) during the review and it is also important to involve the right people. The results of the conversations should be a list of issues that can then be prioritized based on the business context and that can be formulated into a set of actions that help to improve the overall customer experience of the architecture.
Having analyzed the AWS white paper on its Well-Architected Framework, I realized that there are no guidelines on how to do an assessment of the framework. The chapter “Review Process” stays very high-level and does not include any lists of activities that should be performed or deliverables that should be achieved. Neither are there overviews that help to understand the bigger context of the framework, how to get started, or how to combine it with other frameworks. Overall, the guidance stays very rough and provides only a broad direction.
Nevertheless, such detailed guidance is neither the goal of AWS. On the contrary, they succeed in providing an overview of good principles – as a starting point – for how to build cloud environments. It can also be well used to amend and enrich other architecture frameworks whose focus is not purely on cloud.
A full version of the AWS Well-Architected Framework is available here.