
Running workloads in the cloud provides a number of advantages to businesses. Because the business does not own the hardware the workloads are operating on, they can be treated differently than workloads running in a data center.
No longer do businesses have to worry about capital infrastructure costs; cloud costs are typically operational and can be adjusted as needed. With cloud services, cost optimization should be done on a periodic basis to keep costs down.
With cloud infrastructure, a business pays for what it uses. There’s no need to pay upfront for hardware and risk over or under-provisioning for an anticipated workload. Infrastructure can be turned off when not in use or resized if the workload increases or decreases. Workloads usually follow a predictable pattern. For example, some are more active during weekday work hours, and some are more active during the holiday shopping season. However, optimizing these workloads should not be a guessing game; decisions should be made using data and metrics from the running infrastructure.
Cloud services have monitoring turned on by default for certain metrics. For example, AWS sends EC2 instance CPU and network usage to CloudWatch. Memory and disk usage statistics, however, require an additional agent running on the instance, or an additional container running in their container orchestration service. These are lightweight services that provide extremely valuable insight into a workload. With this information being logged, a cost-conscious user can chart their infrastructure usage over time and determine how their workload operates in the cloud.
It is important to capture the workload’s average and maximum usage for the various metrics to determine how to proceed with cost optimization. What are high and low usage times for these workloads? Are there times when a server seems completely inactive? Do the servers reach their maximum capacity in their various metrics? For any of these cases, when does this occur? How often? This information alone isn’t enough to make a decision. It must be taken back to the teams responsible for the workloads and reviewed.
Suppose there are instances in which servers are running but are not being utilized on a predictable and periodic basis. In that case, companies should consider turning them off, so the business isn’t paying for them. If workloads are not using the full potential of their hardware, and usage only peaks periodically, businesses should consider setting up an autoscaling group, keeping the number of active servers low until demand requires more servers or larger servers to be spun up. Infrastructure as code solutions are well integrated with cloud services and should be utilized to customize when and how cloud infrastructure is deployed.
When an organization has a small number of assets to manage in the cloud, using the default monitoring tools in the cloud to optimize cost may be feasible. However, as the size of its cloud footprint grows, an organization may have thousands of servers running. Checking on each workload may become too tedious and may not be cost-effective. All workloads can be optimized, but some are better candidates for optimization than others. A savvy user could recognize this and home in on the larger, more process-intense workloads for optimization, but other tools that can be utilized.
Third-party cost-optimization tools play a big role here. These tools take the default metrics from the cloud, utilize the additional default agents provided by the cloud, or use their own set of custom agents to pull relevant data from an organization’s infrastructure. These tools then use custom algorithms to highlight the biggest cost-saving changes, showing these changes on a user-friendly dashboard, providing notifications, and allowing API access for custom scripting.
With any of these options available, users can begin to review their cloud infrastructure and see where they could be saving money in the cloud without reducing their workloads’ performance.
Learn more about Presidio Cloud Solutions.