Preparing for the Unexpected Outages for Your Mission-Critical Cloud Infrastructure with Confidence

  IJCTT-book-cover
 
         
 
© 2024 by IJCTT Journal
Volume-72 Issue-8
Year of Publication : 2024
Authors : Prasad Gandham, Ramakanth Damodaram, Sunit Randhawa
DOI :  10.14445/22312803/IJCTT-V72I8P120

How to Cite?

Prasad Gandham, Ramakanth Damodaram, Sunit Randhawa, "Preparing for the Unexpected Outages for Your Mission-Critical Cloud Infrastructure with Confidence," International Journal of Computer Trends and Technology, vol. 72, no. 8, pp. 134-141, 2024. Crossref, https://doi.org/10.14445/22312803/IJCTT-V72I8P120

Abstract
Cloud computing is like a virtual computer city, providing the infrastructure to host your mission-critical applications without the burden of managing operational overhead. The reliability of any workload in a cloud environment depends on its architectural design. Here comes the foundational question: “Are you designing your workloads to be resilient?”. Resilience stands as a cornerstone for the stability of the workload along with safeguarding access to your sensitive data. The resilience of the infrastructure components is pivotal, especially when entrusted with the vital task of ensuring uninterrupted service. Based on Uptime Institute - the Global Digital Infrastructure Authority ‘s research in 2022, one in five organizations across the globe have faced ‘severe’ or ‘serious’ outages, and 60% of these outages have cost around $100,000 US dollars to the respective organizations. This paper explains the importance of designing and implementing a fault-tolerant design by exploring scenarios ranging from network connectivity to regional outages within data centers and cross-region environments. This research study will empower and prepare the readers for incident readiness by offering methods to reduce blast radius, protect the environment, and keep informed throughout incident lifecycles. It provides a design process flow for applications, databases, and traffic management, which form the nexus of user interaction. Additionally, it shares remediation strategies for resilience failover and guides on defining service level agreements for mission-critical applications. Testing the system’s resilience against failures is paramount, whether it is through actual disruptions or simulated scenarios. Exploring the architecture’s robustness on how to plan for failures with best practices by using architectural patterns and orchestration techniques plays a pivotal role in fortifying the system against adversities. This research provides a conceptual design guiding practitioners towards safeguarding the integrity and continuity of the mission-critical cloud infrastructure ecosystems from unplanned outages and hardware failures.

Keywords
Cloud computing, Resilience, Azure, Amazon web services, Infrastructure, Monitoring, Zones, Regions, Failover, Recovery.

Reference

[1] Benjamin Kettner, and Frank Geisler, Achieving Resiliency, Pro Serverless Data Handling with Microsoft Azure: Architecting ETL and Data-Driven Applications in the Cloud, Apress, Berkeley, CA pp. 195-211, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[2] P. Chinnasamy et al., “Providing Resilience on Cloud Computing,” 2021 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, pp. 1-4, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[3] K. Tirumala Rao, Sujatha, and N. Leelavathy, “Infrastructure Resiliency in Cloud Computing,” Proceedings of International Conference on Computational Intelligence and Data Engineering, Singapore, pp. 203-215, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Joshua Baron et al., “An Architecture for a Resilient Cloud Computing Infrastructure,” 2013 IEEE International Conference on Technologies for Homeland Security (HST), Waltham, MA, USA, pp. 390-395, 2013.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Fei Hu et al., “A Review on Cloud Computing: Design Challenges in Architecture and Security,” Journal of Computing and Information Technology, vol. 19, no. 1, pp. 25-55, 2011.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Mouna Jouini, and Latifa Ben Arfa Rabai, Design Challenges of Cloud Computing, Enterprise Management Strategies in the Era of Cloud Computing, IGI Global, pp. 1-25, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Mythry Vuyyuru et al., “An Overview of Cloud Computing Technology,” International Journal of Soft Computing and Engineering (IJSCE), vol. 2, no. 3, pp. 244-246, 2012.
[Google Scholar] [Publisher Link]
[8] Attila Albini, and Zoltan Rajnai, “General Architecture of Cloud,” Procedia Manufacturing, vol. 22, pp. 485-490, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Michael Kavis, Security Design in the Cloud, Architecting the Cloud: Design Decisions for Cloud Computing Service Models (SaaS, PaaS, and IaaS), Wiley, pp. 99-118, 2014.
[CrossRef] [Publisher Link]
[10] Michael Kavis, It Starts with Architecture, Architecting the Cloud: Design Decisions for Cloud Computing Service Models (SaaS, PaaS, and IaaS), Wiley, pp. 41-54, 2014.
[CrossRef] [Publisher Link]
[11] Khalid Alhamazani et al., “An Overview of the Commercial Cloud Monitoring Tools: Research Dimensions, Design Issues, and Stateof-the-art,” Computing, vol. 97, pp. 357-377, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Michael Kavis, Creating a Centralized Logging Strategy, Architecting the Cloud: Design Decisions for Cloud Computing Service Models (SaaS, PaaS, and IaaS), Wiley, pp. 119-124, 2014.
[CrossRef] [Publisher Link]
[13] Cynthia Treger, Understanding ExpressRoute Private Peering to Address ExpressRoute Resiliency, Azure Networking Blog, 2024. [Online]. Available: https://techcommunity.microsoft.com/t5/azure-networking-blog/understanding-expressroute-private-peering-to-address/ba-p/4081850
[14] TerryLanfear et al., Shared responsibility in the cloud, Microsoft Azure, 2023. [Online]. Available: https://learn.microsoft.com/en-us/azure/security/fundamentals/shared-responsibility
[15] Haresh Nandwani, Lewis Taylor, and Bonnie McClure, Understand Resiliency Patterns and Trade-Offs to Architect Efficiently in the Cloud, AWS Architecture Blog, 2023. [Online]. Available: https://aws.amazon.com/blogs/architecture/understand-resiliency-patterns-and-trade-offs-to-architect-efficiently-in-the-cloud/
[16] Peschka Steve, “Monitoring and Analysis of Cloud-Based Applications,” U.S. Patent US10972370B1, 2021.
[Google Scholar] [Publisher Link]
[17] Zhongli Na, Wei Liu, and Kai Li, “Implementation of Cloud Component for Security Monitoring and Comprehensive Guarantee of Identifier Resolution System,” 2022 3rd Information and Communication Technology Convergence (ICTC), Nanjing, China, pp. 167- 172, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Carlos Diego Cavalcanti Pereira, “A Functional Paradigm for Capacity Planning of Cloud Computing Workloads,” 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), Madrid, ES, pp. 281-283, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Max Melcher, Staying Up to Date with Azure, Azure Architecture Blog, 2020. [Online]. Available: https://techcommunity.microsoft.com/t5/azure-architecture-blog/staying-up-to-date-with-azure/ba-p/1784501
[20] John O’Shea, Building Dashboards for Operational Visibility, The Amazon Builders’ Library. [Online]. Available: https://aws.amazon.com/builders-library/building-dashboards-for-operational-visibility/
[21] AWS Cloud Adoption Framework (AWS CAF), AWS Customer Enablement, 2021. [Online]. Available: https://aws.amazon.com/cloud-adoption-framework/
[22] Google Cloud, “The Google Cloud Adoption Framework,” pp. 1-33.
[Publisher Link]
[23] Microsoft Cloud Adoption Framework for Azure, Microsoft Azure. [Online]. Available: https://azure.microsoft.com/en-us/solutions/cloud-enablement/cloud-adoption-framework
[24] Peter Panec, The 5 Steps of the AWS Well-Architected Framework Review Process, Cprime. [Online]. Available: https://www.cprime.com/resources/blog/5-steps-of-aws-well-architected-review-process/
[25] AWS Health Dashboard-Service Health, AWS Health, 2024. [Online]. Available: https://docs.aws.amazon.com/health/latest/ug/aws-health-dashboard-status.html