Making the Cloud Decision
Recently, Unicon has been engaging with a number of clients on cloud migrations. In some cases, those engagements have been focused on executing migrations to private or public clouds. In other cases, Unicon starts with the challenge of whether a service should be migrated to the cloud and if so, how far does that go - just the infrastructure or the management of the environment and services as well? These are all complex conversations - in some cases driven by external pressures on the IT decision makers to reduce costs or improve service quality. In other cases, existing staff is simply overloaded; there is a desire to shed work not in line with core competencies, or recognition that there are gaps in ability to deliver necessary service levels. More often than not, factoring in the various elements that inform a decision results in no clear or obvious answer.
The goal of this article is to help IT decision makers separate and address strategic elements to create a plan when approaching cloud and/or managed services transitions. It is broken down into segments covering each element: strategic, financial, architecture, security, process, and people.
Note: For the purposes of this discussion we will interpret the term "cloud" broadly, to include Infrastructure as a Service (IaaS), as well as Platform and Software as a Service (PaaS and SaaS) and associated vendor-provided service management.
The Strategic Element
Our first question usually considers the overall goals of the institution or business and the IT strategy that supports those goals. Hopefully, the organization has a well-articulated technology strategy that translates the business goals into technology guidelines. These often include goals and positions such as: technology stack goals (e.g. homogeneous environment or best-of-breed/heterogeneous), service level goals, build/buy/partner positions, security and privacy risk position, open source position, and default position on time-to-delivery versus cost versus quality trade-offs. To cite some examples of different strategy profiles, a non-profit organization may choose to leverage open source to the maximum extent possible, build any integrations or solutions internally with limited staff, and seek to minimize costs through a uniform tech environment (e.g. a LAMP variant) even if it means going slower on rolling out new capabilities. An organization with a strong capital funding position and emerging growth opportunities and/or competitive threats might choose to spend their way past scale and growth issues. They may not invest time removing scale bottlenecks because time-to-market is crucial to meeting the larger organization's growth goals. At the outset of a decision-making process, using the organization's strategic positions (both IT and business/institutional) to state a concise set of guiding principles, that can be used as litmus tests or touchstones throughout the process, will help keep those involved on point. "Business and IT Alignment" surely has a strong presence in the management vernacular today - by incorporating the strategic elements above, one can help fulfill that alignment and check the buzzword compliance box too.
It is important to note, however, that not all organizations have a formal or even semi-formal technology strategy. In those cases, a cloud decision process can help set the stage for a better strategic position. This affords an opportunity to discuss financial and service level themes with other stakeholders in the organization and could begin to form the foundation for a set of guiding principles for broader IT decision making. Examining other recent technology decisions and the associated stakeholder conversations can also help tease out the starting points of a strategy.
The Financial Element
Often enough we find that the reason we are having a discussion about cloud is due to financial pressures or even "the CFO/CBO is making me." While one could conduct a purely financially driven analysis based on TCO and/or ROI, actually quantifying service and security related risks in financial terms is hard at best. Nevertheless, the economics of the solution are still very important. One should build a cost model to gain a clear picture of the costs and compare it to the present or other alternative solutions.
A factor unique to cloud financial modeling is the ability to factor in variations in demand that result in highly variable needs for compute infrastructure. Education often sees substantial swings in demand across the academic year and cloud solutions that ramp resources up and down based on demand can result in cost savings. On the other hand, we have seen large deployments with generally even, predictable loads for which the financial model for a three year capital acquisition actually made more sense - the supporting infrastructure already existed (data center and related infrastructure, systems administration staff) and had adequate capacity.
For mission-critical applications, considering disaster recovery (DR)/business continuity plan (BCP) requirements can substantially impact costs. Many cloud solutions have strong offerings from which DR/BCP solutions can be built. Related to the financial benefits of dynamic capacity mentioned above, DR solutions where few to no instances are actually provisioned until needed can substantially improve the cost to maintain a DR capacity. Note, however, that the capabilities of cloud providers vary widely with respect to the building blocks for a DR plan. In the architecture section, we will touch briefly on some of these issues.
Lastly, there are a number of purely financial considerations that will be unique to each business or institution. Will de-commissioning existing infrastructure result in a write-down on existing assets or can those be re-purposed? Also, for organizations that rely heavily on capital investments and depreciation, moving to a cost structure where IT costs are expenses against the operating budget can have a substantial impact on cash flow. Be sure to engage finance and budget managers on these topics.
The Architecture Element
Architecture addresses the non-functional requirements of systems. Important non-functional requirements include reliability, availability, and scalability, often referred to as RAS. It is important to understand the RAS requirements of the service(s) under consideration, how those needs are met today, and how the underlying cloud infrastructure supports those constructs.
Questions to consider:
- Can the app scale out horizontally, and if so, which tiers?
- If there is a scale-up only tier, what is the limit of that component (cpu, memory, I/O, bandwidth, etc.) and what are its limits in the proposed cloud infrastructure?
- If the service requirements are high availability (HA) 24x7, is that 99.9% or higher availability?
- What cloud and application architecture approaches to achieve HA exist and are they compatible?
Today, there are few Service Level Agreement clauses in cloud services although the design and operations of the offering are consistent with 4 or 5 9's of availability (Note: Expect competitive pressures to change the SLA story over time). The track record of the provider, if available, can give some indication of ability to deliver on availability requirements but some organizations will still need committed SLAs from their cloud provider.
Disaster recovery for mission-critical applications is an important RAS consideration. Cloud can provide substantially reduced investment levels to achieve DR, even using traditional "stand by" types of approaches. Native cloud capabilities, however, can provide simplified DR foundations, handling data replication, name resolution, and routing if the app can take advantage of the cloud platform that offers such capabilities. As with any DR plan, the means for regular testing need to be considered.
There are several other dimensions to architectural aspects when considering cloud. For the applications and services under consideration, do the application architectures lend themselves to the cloud environment under consideration? The tech stack across all tiers of the app needs to be considered along with how well the cloud environment supports these. As examples, if a clustered caching tier is present that relies on IP multicast, can the cloud environment support the necessary networking and clustering or will other means need to be used for scaling and performance? Sometimes the consideration is as mundane as database or storage size. Many cloud vendors have a limit on the sizes of a storage volume or database size for various persistence solutions. If you have a large app with a 2TB database and the cloud provider limit is 1TB, can the application's db be sharded or federated to fit into these constraints or not? High transaction volume applications can provide specific challenges where horizontally scaled database tiers rely on high speed node inter-connects and high speed access to shared storage. Alternatively, if new applications are under consideration or development, make sure they are architected to take advantage of cloud environments. Applications should have small, easily deployed and scaled services, data persistence technologies that fit cloud deployments, and packaging and deployment automation to simplify provisioning.
Additionally, the overall operating environment needs to be considered - are candidate applications fully isolated or are they integrated with other systems or do they play a role in complex workflow or dataflow? In some cases, the risks associated with delivering required service levels where integrated components straddle cloud and private/dedicated environments is too large and the "as is" deployment continues to make sense. On the other hand, well-isolated systems and applications may be attractive targets for cloud migration. Even in complex integration scenarios, however, the advantages of cloud may outweigh the complexities. Referring back to the seasonal demand issues in educational apps, migrating a learning management system to the cloud may have such great scale up/down advantages (performance and cost) that the complexities of integrations with identity stores and campus information systems make the cloud very attractive.
Hybrid cloud solutions which mix fixed, dedicated resources (physical or virtual) with dynamically deployable resources carry some similar characteristics with highly integrated environments. The deployment and management complexity can be higher, requiring investment in automated provisioning/de-provisioning (as well as understanding the metrics that should trigger scale up/down) to make the approach effective. For highly variable workloads or rapidly increasing demands where there is already fixed capacity, hybrid approaches might be effective, but engineering and management complexity is generally high.
The Security Element
As with any environment, security is a complex landscape and most often a question of acceptable risk profile or tolerance. The considerations are similar as for any hosting environment - physical security of the provider's facilities, network and compute infrastructure security, data security, application security, and security operations and processes. While there are examples of deploying applications to public clouds with stringent physical, network, and data security requirements (e.g. to meet HIPPA or PCI DSS compliance), it can be challenging to determine if the cloud provider's implementations meet risk tolerances for secure computing requirements. While SAS 70 Type II audits are good, they are still limited to an inspection of the IT controls that the provider claims to practice. One needs to do the homework into the controls that are claimed to determine if they meet an organization's particular needs.
It is also necessary to delve into the particulars of the applications and data that are under consideration for cloud migration. Ideally, the organization's IT security policy, including data classifications and security requirements, can be used as a guideline to inform the cloud migration decision. Applications serving or handling only publicly available data (e.g. public web presence, course catalog) may be more amenable to early cloud migration than those containing sensitive (data loss/compromise will cause limited damage) and/or restricted (data loss/compromise will cause substantial damage) classes of data. The characteristics of the application should also be recognized.
Questions to consider:
- Is this a legacy application with vulnerabilities that cannot be patched and therefore require unique protections?
- Are security vulnerabilities discovered frequently for the application?
- Are any of the applications or the organization/institution the targets of cyber-attacks such as DoS/DDoS, targeted or brute force attacks?
Repelling DDoS is an expensive proposition for anyone, but the "shared investment" of a large public cloud infrastructure can be an additional tool in the battles against cyber-attacks. Other application security characteristics may make public cloud deployment difficult. In some cases, even though a case can be made that a public cloud environment can be made secure enough, cloud just does not fit the risk tolerance of the organization and either traditional operations, private cloud, or managed services with full transparency into the environment and processes does fit the tolerance.
The Process Element
The cloud decision process should include an examination of the internal IT processes used to deliver services today along with the anticipated changes that will result from a migration. Typical processes that should be considered are change management, service management (including incident and problem management and service metrics), service monitoring and alerting, capacity planning, and budget planning/tracking. Understanding the provider's processes early on is important – does the provider's standard and emergency maintenance processes fit with existing service commitments or will these need to be adjusted? What transparency into change management, problem management, post-incident review and root cause analysis is available from the provider? If the existing stakeholders are accustomed to high levels of operational transparency, consider whether the cloud environment under consideration supports those levels of transparency or not. If not, early dialog with the service stakeholders will help reduce the level of surprise when something does inevitably go wrong.
The People Element
Of course, a substantial consideration for any kind change is the people that are involved and affected. Some technical staff members will embrace cloud as an opportunity to stay current while others will perceive a threat to their future and career. It is important to consider the people-related aspects when considering cloud – what will the people change management look like: the executive sponsor needs to carry the message and provide the context for "why" and "why now,” identify advocates and nay-sayers and effectively manage them, and appropriately engage people in the decision-making process. In other words, all the leadership skills of the organization will likely be called upon as part of a cloud initiative, especially if it becomes broad in scope.
There are, however, some aspects of the people side that may be a bit unique. Selecting and managing a cloud provider may stretch the vendor management skills of the team. Other new skills may need to be developed or extended, including deployment automation tooling; new approaches to data persistence; alternate scale-out approaches, which might include content delivery networks or other app off-loading; and cloud-based caching. There very well could be difficult decisions regarding the existing levels of staffing for a number of skill sets, and the issues of retention of key resources, re-training, and/or staff reductions must be considered early on.
Conclusions and Recommendations
A variety of challenges and pressures are making cloud migration a central discussion in organizations regarding the delivery of technology services. These conversations are not limited to IT discussions but transcend business, finance, and customer/end user groups. In order to make well-informed decisions regarding cloud and other "as a Service" opportunities, a number of decision-making elements must be considered. The main elements in such a decision have been presented along with a sample of the issues, considerations, and conversations related to each of the elements. Key among those is examining the overall strategic goals of the organization or institution, and understanding how these decisions fit and align with the IT strategy. The overall IT landscape and architecture will inform decision making as will service requirements (RAS), financial factors, security, processes, and people.
There have been enough high-profile service interruptions (and resulting losses of data, revenue interruption, etc.) to put a damper on the "just throw it up in the cloud, everyone's doing it, it can't be that hard" mind sets and external pressures. Thankfully, this has helped erode some of the naiveté regarding cloud (depending of course on whether one was an observer or a participant in those incidents). Unicon has been lucky enough to participate in the some of those high-profile incidents and bring some of the lessons learned to clients and partners. By considering the elements described above and engaging business/finance and service consumer/stakeholders around these, an organization can make a more fully informed decision regarding the infrastructure and operational management of their technology assets. Considering the elements described above may even highlight low-stakes opportunities to gain knowledge and traction, some of which might even be high pay-off projects. The conversations outlined also lay the groundwork for more extensive buy-in to the decisions that are made and help chart a course for executing on the resulting decisions.
Thank you for reading our article series. If you'd like to contact Unicon, you can email us at email@example.com or call us at 480-558-2400.