Big Data Engineering Using Hadoop and Cloud (GCP/AZURE) Technologies

Shrikaa Jadiga

doi:10.14445/22312803/ IJCTT-V72I8P109

Research Article | Open Access | Download PDF

Volume 72 | Issue 8 | Year 2024 | Article Id. IJCTT-V72I8P109 | DOI : https://doi.org/10.14445/22312803/IJCTT-V72I8P109

Big Data Engineering Using Hadoop and Cloud (GCP/AZURE) Technologies

Shrikaa Jadiga

Received	Revised	Accepted	Published
12 Jun 2024	17 Jul 2024	07 Aug 2024	29 Aug 2024

Citation :

Shrikaa Jadiga, "Big Data Engineering Using Hadoop and Cloud (GCP/AZURE) Technologies," International Journal of Computer Trends and Technology (IJCTT), vol. 72, no. 8, pp. 60-69, 2024. Crossref, https://doi.org/10.14445/22312803/ IJCTT-V72I8P109

Abstract

Big Data Engineering is crucial in today’s data-driven society, where managing vast amounts of data is key to business success. This paper explores the integration of Hadoop and cloud technologies, specifically Google Cloud Platform (GCP) and Microsoft Azure, to address Big Data challenges. With its components, such as HDFS, MapReduce, and YARN, Hadoop provides a robust framework for distributed storage and processing large datasets. Cloud platforms like GCP and Azure offer scalability, cost-effectiveness, and flexibility, making them ideal for Big Data applications. They support various Big Data tools and provide secure, compliant environments for data processing. By leveraging these technologies, organizations can enhance their data processing capabilities, achieve better resource management, and gain valuable insights from their data. This integration not only optimizes performance but also ensures efficient handling of Big Data, paving the way for innovative solutions and competitive advantages.

Keywords

Big data, Hadoop ecosystem, Cloud technologies, Scalability and flexibility, BigQuery.

References

[1] Michael Armbrust et al., “Scaling Spark in the Real World: Performance and Usability,” Proceedings of the VLDB Endowment, vol. 8, no. 12, pp. 1840-1843, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Nicolas Kourtellis, Gianmarco De Francisci Morales, and Albert Bifet, Large-Scale Learning from Data Streams with Apache SAMOA, Learning from Data Streams in Evolving Environments, Springer, Cham, vol. 41, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Rajkumar Buyya, James Broberg, and Andrzej M. Goscinski, Cloud Computing: Principles and Paradigms, Wiley, pp. 1-664, 2010.
[Google Scholar] [Publisher Link]
[4] Jeffrey Dean, and Sanjay Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” Communications of the ACM, vol. 51, no.1, pp. 107-113, 2008.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Vasant Dhar, “The Future of Artificial Intelligence,” Big Data, vol. 4, no. 1, pp. 1-67, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Mohammed Guller, Big Data Analytics with Spark: A Practitioner’s Guide to Using Spark for Large Scale Data Analysis, Apress, pp. 1- 277, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Ibrahim Abaker Targio Hashem et al., “The Rise of “Big Data” on Cloud Computing: Review and Open Research Issues,” Information Systems, vol. 47, pp. 98-115, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Michael Isard et al., “Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks,” EuroSys ‘07: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems, pp. 59-72, 2007.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Jairo R. Montoya-Torres et al., “Big Data Analytics for Intelligent Transportation Systems,” IFAC-PapersOnline, vol. 54, no. 2, pp. 216- 220, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Blend Berisha, Endrit Mëziu, and Isak Shabani, “Big Data Analytics in Cloud Computing: An Overview,” Journal of Cloud Computing: Advances, Systems and Applications, vol. 11, pp. 1-10, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Vinod Kumar Vavilapalli et al., “Apache Hadoop YARN: Yet Another Resource Negotiator,” SOCC ‘13: Proceedings of the 4th Annual Symposium on Cloud Computing, pp. 1-16, 2013.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Tom White, Hadoop: The Definitive Guide, 3rd ed., O’Reilly Media, pp. 1-688, 2012.
[Google Scholar] [Publisher Link]
[13] Matei Zaharia et al., “Apache Spark: A Unified Engine for Big Data Processing,” Communications of the ACM, vol. 59, no. 11, pp. 56-65, 2016.
[CrossRef] [Google Scholar] [Publisher Link]