Why architecting an application for Cloud is different than
on-prem?
QoS
If one starts at various QoS of an application, the traits
for Cloud app are not seemingly different than that for on-prem app. Here is
the list of QoS:
Ø Availability,
Resiliency, and Fault ToleranceØ Testability and Manageability
Ø Performance
Ø Scalability
Ø Security (user data and app logs; infra and app)
Ø Flexibility and Extensibility
Ø Maintainability and Readability
Ø Usability and Accessibility
Ø Functionality and Correctness
But there are certain characteristics (marked in italics) of
QoS which needs special attention.
Fault Tolerance:
Needs to be applied at the connectivity of a certain logic of an application at
certain layer to that of either external system or another layer. E.g. App tier
connecting to DB tier. This is also called as transient fault tolerance
handling. The connectivity might fail in the first try but could work at on
the next try. Because, even though the app tier pulls up a connection from the
pool, there might be a window of time where a specific DB connection which
routes traffic to a specific node of a DB cluster might not have been
revalidated and supplied to an app. When app invokes the DB call it might fail
because the node is being rebooted*. The DB cluster will survive and is fully
functional but that specific call might fail.
Resiliency: Each
layer of an application must think of resiliency. For instance for each
process which needs compute instance, there has to be at the least two VMs (for
IaaS). Please note that PaaS already provides resiliency in SLA which is
another reason why apps should gravitate towards using resiliency. Thus on-prem
application which didn’t need to be clustered (e.g. a utility process
running only in one node), now need to do so. This brings a whole slew of
challenges. What if the process has to be singleton? What if the process is legacy
and (to make the situation worse) there is no source code available for it? The
application has to make the singleton process cluster aware. For instance may
be the legacy process can be wrapper around with a script/program a wrapper, the wrapper should have health
check, and externalize some data points could provide a solution. Each layer
has to have its own health check.
High Availability:
The application must handle BCDR. The application has to be datacenter
(or region) agnostic or, even better, Cloud provider agnostic. Thus a
responsive application has to be deployed at least in a different region. There
are many elements that needs to be thought through such as RPO, RTO,
etc. They key is the data synchronization (user data, application data
(such as state, session, etc.), code, etc.) Also a strategy has to be
determined (Active/Active, Active/Passive, Active/Passive (only maintenance
page), Active/Active (running in reduced capacity and would autoscale when the
traffic flows to it), etc. Conspicuously if the application replicates the data
synchronously between the geo regions then the performance is going to be
impacted.
Testability: Out
of various form of testing (detailed in http://theitjourney.blogspot.com/2015/01/phase-one-testing-strategy-to-migrate.html)
the infrastructure endurance test stands out.
Manageability:
Especially the DevOps team’s infrastructure as code needs to factor in the
transient faults especially around connectivity.
Performance: Repeated
testing and tuning should assist in determining the correct capacity of each
layer (which includes size of VMs) of the application.
Scalability: The
application has to utilize auto scale features of the Cloud provider. You could
scale up or down (vertical), in or out (horizontal). But in Cloud scale in/out
works best.
Security:
Security is a shared responsibility between the Cloud provider and the
customer. There are several layers of security (http://theitjourney.blogspot.com/2015/01/phase-one-security-strategy-to-migrate.html).
Access to the Cloud environment must be scrutinized, and appropriate policies
and governance has to be institutionalized.
* à
There are “planned” and “unplanned” updates occurring at the VM level (specific
to Azure.) Please note this problem is already handled by the managed PaaS
services.
Architecture for Cloud
Ø Not same as on-premØ SLA encompasses many individual SLAs
Ø Shared security
Ø Consider for unplanned downtimes of software components
Ø Scale Units
Ø Entails a different thinking!!!
Scale Unit: The application capacity should be alluded in
terms of scale unit. As an illustration “DB Unit” is a scale unit used for
Azure SQL DB, “Stream units” for Azure media services, etc.
No comments:
Post a Comment