Big Data Infrastructure Design Optimizes Using Hadoop Technologies Based on Application Performance Analysis

Shafiyah Shafiyah, Ahmad Syauqi Ahsan, Rengga Asmara

Abstract


Big data's infrastructure is a technology that provides the ability to store, process, analyze, and visualize large data. The tools and applications used are one of the challenges when building big data's infrastructure. In the study, we offered a new strategy to optimize big data infrastructure design that was an essential part of big data processing by performing performance analysis applications used at each stage of big data processing. The process started from collecting data sourcing online news using web crawler methods using Scrapyand Apache Nutch. Next, implement Hadoop technologies to facilitate the distribution of big data storage and computing. No-sql databases Mongo DB and HBase made it easier to query data, after which they built search engines using Elasticsearch and Apache Solr. Data saved later in analysis using hive apache, pig, and spark. The data has been analyzed was shown on the website using Zeppelins, Metabolase, Kibana, and Tableau. The test scenario consisted of the number of servers and files used. Testing parameters started from process speed, memory usage, CPU usage, throughput, etc. The performance testing results of each application were compared to and analyzed to see the merits and defaults of the application as a reference to building optimal infrastructure design to meet the needs of the user. This research has produced two big data infrastructure design alternatives. The suggested infrastructure has been implemented on computer nodes in the big data pens for processing big data from online media and proving to be running well.

Full Text:

PDF

References


Gronwald, K.D., Integrated business information systems: A holistic view of the linked business process chain ERP-SCM-CRM-BI-Big Data, hal. 1–200,2017.

Brown, K. dkk., Accelerating Big Data Infrastructure and Applications (Ongoing Collaboration), Proceedings - IEEE 37th International Conference on Distributed Computing Systems Workshops, ICDCSW 2017, hal. 343–347, 2017.

Chunpir, H.I., Rathmann, T. dan Zaina, L.M., An empirical evidence of barriers in a big data infrastructure, Interacting with Computers, vol. 30, no. 6, hal. 507–523, 2018.

Venkatraman, R. dan Venkatraman, S., Big data infrastructure, data visualization and challenges, ACM International Conference Proceeding Series, hal. 13–17, 2019.

TAMA, C.G.N., Sistem Operasi untuk Pemrosesan Big Data dengan berbasis Centos 7, 2017.

Gorodov, E.Y.E. dan Gubarev, V.V.E., Analytical review of data visualization methods in application to big data, Journal of Electrical and Computer Engineering, vol. 2013, 2013.

Series, C., Educational big data infrastructure : opportunities, design and challenges, 2021.

Demchenko, Y., De Laat, C. dan Membrey, P., Defining architecture components of the Big Data Ecosystem, 2014 International Conference on Collaboration Technologies and Systems, CTS 2014, no. March 2015, hal. 104–112, 2014.

Sebei, H., Hadj Taieb, M.A. dan Ben Aouicha, M., Review of social media analytics process and Big Data pipeline, Social Network Analysis and Mining, vol. 8, no. 1, 2018.

S.Widy, Teknologi Big Data Dengan Hadoop, [Daring]. Tersedia pada: https://medium.com/skyshidigital/teknologi-big-data-dengan-Hadoop-d8a2e93791a8 (Diakses tanggal 29 Maret, 2021).

Rohman, M.S., Santoso, H.A. dan Saraswati, G.W., Pemanfaatan Topic-Focused Crawler untuk Pembangunan Corpus Berita Bencana menggunakan Teknik Scrapy CSS Selector, Seminar Nasional APTIKOM (SEMNASTIK) 2019, hal. 250–258, 2019.

Suh, J., Vujin, V., Barac, D., Bogdanovic, Z. dan Radenkovic, B., Designing Cloud Infrastructure for Big Data in E-Government, RUO. Revija za Univerzalno Odlicnost, vol. 4, no. 1, hal. A26–A38, 2015.

Hammood, A.H., A Comparison Of NoSQL Database Systems : A Study On MongoDB , Apache Hbase , And Apache Cassandra, no. October, hal. 20–23, 2016.

Aydoğan, T., İlkuçar, M. dan AKCA, M.A., An Analysis on the Comparison of the Performance and Configuration Features of Big Data Tools Solr and Elasticsearch, International Journal of Intelligent Systems and Applications in Engineering, vol. 4, no. Special Issue-1, hal. 8–12, 2016.

Sartika, E.P. dan Cahyono, A.B., Implementasi Elasticsearch Logstash Kibana Stack pada Sistem Portal Pengembangan dan Pembinaan Sumber Daya Manusia, vol. 1, no. 1, 2019.




DOI: https://doi.org/10.32520/stmsi.v11i1.1510

Article Metrics

Abstract view : 214 times
PDF - 118 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.