Before we on-board a customer to Presidio Managed Cloud Services we use the AWS Well-Architected Framework to perform a Well-Architected Review (WAR) of customer workloads. This helps us identify best practice gaps that need remediation as part of managed services on-boarding. It also helps our customers discover foundational issues or architectural debt in their systems they may not have previously been aware of as it has evolved.
In the course of leading these WAR sessions with customers, we have identified best practices for leading a Well-Architected Review. These best practices can help you assess your architectures using the AWS Well-Architected Framework during the initial design phase and at key product lifecycle milestones.
Before we get started, let’s quickly review the Well-Architected Framework. It is made up of five pillars; Operational Excellence, Security, Reliability, Performance Efficiency, and Cost Optimization. It contains general design principles, design principles for each of the five pillars, and questions that identify alignment or misalignment with best practices for each pillar. If you need a Well-Architected Framework refresher, this AWS APN blog post covers the five pillars of the framework and their best practices at a high level. If the framework is entirely new to you, take some time to read through the AWS Well-Architected whitepaper. Finally, if you’re not familiar with AWS’ Well-Architected Tool in the AWS Console, follow this Walkthrough of the Well-Architected Tool.
We have divided these best practices for Well-Architected Reviews into the following categories: Preparation, Interaction, and Evolution.
Define Your Goals
Consider your organizational objectives and write them down before the review session. While certain critical findings from the WAR will always be a higher priority than others (is Security ever not the number one priority?), it’s important that all involved understand the goals and motivations at work in the organization. This business context should be available prior to the WAR.
Define Your Workload
What is the scope of the review? It’s important to review one workload at a time. In Well-Architected terminology, a workload is “usually the level of detail that business and technology leaders communicate about.” It can be overwhelming to try and review everything in an AWS account or a multitude of accounts. Clearly define the scope and communicate that those who will be attending. This will put you on the path to an efficient session and building a repeatable review process for other workloads.
Do not perform Well-Architected reviews where only certain pillars are covered. While one area may be a higher priority, without the full picture of the workload you will not have a valid assessment of any single pillar.
Build the Team
Without the right people in the room (or video conference) to answer the WAR questions, your session will fall flat. Ensure key stakeholders and subject matter experts receive the WAR meeting invitation and follow up to guarantee their attendance. Don’t be shy! Exposing key business and finance leaders to the challenges and opportunities of building systems on the cloud will pay dividends for your organization. Roles that are needed, but which are sometimes covered by people wearing multiple hats:
- Cybersecurity – someone familiar with your organizational policies and procedures
- Development – an expert in the particular workload being reviewed (if development teams are separate, include individuals from each team)
- Operations – this role is usually the person responsible for AWS infrastructure setup
- Technical Management – a leader who has the workload in their scope of responsibility and can speak to business acceptance of different risks
- Budget holder or Finance personnel – this is essential to have a deep conversation on goals and cost optimization.
Review Current Issues
Look at the pain points you are dealing with on that workload and review major issues that have occurred. Is your workload having issues with monitoring, component reliability, or costs higher than what was estimated? Ensure these are documented and circulated before the review.
Gather Workload Documentation, Policies and Procedures
Diagrams of your workload are essential. Having a visual focal point during complex technical conversations promotes shared understanding. Your diagrams do not need to be detailed to the point of showing commercial or technical proprietary information. While detailed policy and procedure documentation does not need to be gathered for the WAR, you should have a basic inventory of the policies and procedures in use across the organization. If you have an information security policy, but no one knows what’s in it and it does not affect the way you design and operate your workloads, do you really have such a policy?
Promote Well-Architected Awareness
Make WAR invitees aware of the WAR questions that will be asked. Prior to the WAR, give subject matter experts in your cybersecurity, development and operations teams access to the AWS Well-Architected Tool (with an example workload defined for their perusal) prior to the WAR. This will make your Well-Architected Review proceed much more smoothly and quickly.
Remain objective during the review. The person leading the review (that’s probably you!) should ask each question as written, leave room for a response, and then, if needed, expand on the meaning of the question and best practices the question speaks to. Do not assume or check off boxes on the questions without verifying you meet the best practice. For example, on Operation Excellence question 1, “How do you determine what your priorities are?”
- Have you truly evaluated your external customer needs?
- Have you evaluated your internal customer needs?
- It may seem simple, but were all stakeholders involved?
This includes business, development, and operations determining where to focus the efforts for either an internal or external customer.
A Well-Architected review should never be an interrogation. Sometimes conversation will flow from one of the questions and will result in answers to several questions. Sometimes, hopefully less often, no one in the room will be able to answer one of the questions. You’ll then need to add an item to an action list for follow up after the session. Your action list will contain any questions that require some research or time to get an answer. That’s ok! The Well-Architected Framework has many great points, but it is important to ask the right follow-up questions to dig deeper. Similarly, not every question is going to apply to every team involved in a workload. Remember to record what questions were asked and the answers given that are not in the Well-Architected Tool (see Be Consistent).
Try to keep the review process consistent each time you go through a review. The process will become easier and take less time as you perform reviews for key milestones in the product lifecycle, major re-architectures in the workload, or time-based refreshes. Answer every question, take notes (have someone else do this for you or record the session and write them down after), and add to the action list as needed.
Keep the review lightweight and question-answer focused. If you find yourself or others starting to architect or solve the gaps that are being discovered, pull everyone back on track. Once you get the hang of it, a WAR should take two to three hours, at most. This can be one long meeting or broken up over a couple of meetings. If you plan on diving deep into a part of the workload it may be best to make that a separate meeting.
Keep it Blameless
It is important to remember this is a review and should follow a blameless approach. You’re attempting to find ways to improve your workload, not performing an audit or looking for the root cause of a specific failure. Root cause analyses (RCAs) can point you to areas that need improvement, but those should have been reviewed prior to the WAR.
Write it Down
The Well-Architected Review will likely show you where documentation is lacking. Use the conversations that occurred in the WAR to build out or update your documentation, runbooks, and playbooks. The action list may contain these items, or you may find them in the results of the Well-Architected Tool. Don’t fix your documentation during the review!
Technical and Business Contexts Change
Be mindful of the tradeoffs and goals before and after each WAR. As workloads evolve, priorities will change. Where you may have started with cost as your main concern when developing the workload, it can easily change to security, reliability or any combination of pillars.
Prioritize and Execute
When you are finished with the review, you will have a list of items to be resolved. If you’re using the Well-Architected tool, you can produce a report of those items for circulation among the stakeholders. Review this list with the WAR attendees in a short follow up session. Prioritize the list based on your needs and your organizational goals. Assign ownership of the action items and get started becoming more Well-Architected.
After you work through a couple reviews, you may begin to see areas where your teams are struggling with regard to the five pillars and the associated design principles. Each team should read and review the Well-Architected Framework whitepaper as they make changes or work on fixing issues related to a WAR. This will help connect the improvement item to the best practices and principles in the whitepaper. It could also point to potential training needs, like an Immersion Day or GameDay. Enjoy the Journey Building and operating Well-Architected systems in the cloud can be challenging. If you’re looking for a partner or managed services provider to accelerate your cloud journey, need someone to lead an Immersion Day to help your team become more effective, or would like assistance leading a Well-Architected Review, we’d love to hear from you.