Classification of Toraja, Batak and Ambon Languages using Decision Tree and Gradient Boost methods

Bileam Mangalla, Suharyadi Suharyadi

Abstract


With its rich diversity of ethnicities, cultures, races, and religions, Indonesia is one of the countries with the highest number of regional languages in the world. This linguistic diversity often leads to communication challenges, particularly when conveying information or engaging in textual conversations. This study aims to identify and classify the Toraja, Batak, and Ambon languages using machine learning-based computational methods. The techniques employed include Decision Tree and Gradient Boost algorithms to evaluate the accuracy of each model. The results demonstrate that both Decision Tree and Gradient Boost are effective in language identification, achieving accuracy rates above 77%. However, based on the confusion matrix analysis, the Gradient Boost method proved to be more effective, with an accuracy rate of 81.06%, compared to 78.39% achieved by the Decision Tree. These findings suggest that Gradient Boost offers better performance for classifying these regional languages.

Keywords


Classification, Data Mining; Language; Decision Tree; Gradient Boost

Full Text:

PDF

References


D. Tuhenay and E. Mailoa, “Perbandingan Klasifikasi Bahasa menggunakan Metode Naïve Bayes Classifier ( NBC ) dan Support Vector Machine ( SVM ) Comparison of Language Classification using Naive Bayes Classifier ( NBC ) and Support Vector Machine ( SVM ) Method,” JIKO (Jurnal Inform. dan Komputer), vol. 4, no. 2, pp. 105–111, 2021, doi: 10.33387/jiko.

I. N. T. Astawa, “Bahasa Indonesia sebagai Alat Pemersatu Bangsa,” Dharma Sastra J. Penelit. Bhs. dan Sastra Drh., vol. 2, no. 1, pp. 72–82, 2022, doi: 10.25078/ds.v2i1.940.

A. P. Wibawa, M. Guntur, A. Purnama, M. Fathony Akbar, and F. A. Dwiyanto, “Metode-Metode Klasifikasi,” Pros. Semin. Ilmu Komput. dan Teknol. Inf., vol. 3, no. 1, pp. 134–138, 2018.

D. Ariyanto, “Data Mining menggunakan Algoritma K-Means untuk Klasifikasi Penyakit Infeksi Saluran Pernafasan Akut,” J. Sistim Inf. dan Teknol., vol. 4, pp. 13–18, 2022, doi: 10.37034/jsisfotek.v4i1.117.

A. M. Argina, “Penerapan Metode Klasifikasi K-Nearest Neigbor pada Dataset Penderita Penyakit Diabetes,” Indones. J. Data Sci., vol. 1, no. 2, pp. 29–33, 2020, doi: 10.33096/ijodas.v1i2.11.

G. M. Momole, “Perbandingan Naïve Bayes dan Random Forest dalam Klasifikasi Bahasa Daerah,” JATISI (Jurnal Tek. Inform. dan Sist. Informasi), vol. 9, no. 2, pp. 855–863, 2022, doi: 10.35957/jatisi.v9i2.1857.

T. W. Pratiwi and T. Arifin, “Optimasi Decision Tree menggunakan Particle Swarm Optimization untuk Klasifikasi Kesuburan pada Pria,” Sistemasi, vol. 10, no. 1, p. 13, 2021, doi: 10.32520/stmsi.v10i1.967.

Anggi Trifani, Agus Perdana Windarto, and Hendry Qurniawan, “Penerapan Data Mining Klasifikasi C4.5 dalam menentukan Tingkat Stres Mahasiswa Akhir,” Jural Ris. Rumpun Ilmu Tek., vol. 1, no. 2, pp. 91–105, 2022, doi: 10.55606/jurritek.v1i2.414.

A. Solihin, D. I. Mulyana, and M. B. Yel, “Klasifikasi Jenis Alat Musik Tradisional Papua menggunakan Metode Transfer Learning dan Data Augmentasi,” J. SISKOM-KB (Sistem Komput. dan Kecerdasan Buatan), vol. 5, no. 2, pp. 36–44, 2022, doi: 10.47970/siskom-kb.v5i2.279.

A. E. Putra, K. Kartini, and A. P. Sari, “Metode Convolutional Neural Network dan Extreme Gradient Boost untuk Mengklasifikasi Penyakit Pneumonia,” JASIEK (Jurnal Apl. Sains, Informasi, Elektron. dan Komputer), vol. 6, no. 1, pp. 33–40, 2024, doi: 10.26905/jasiek.v6i1.11464.

A. Supriyadi, “Perbandingan Algoritma Naive Bayes dan Decision Tree(C4.5) dalam Klasifikasi Dosen Berprestasi,” Gener. J., vol. 7, no. 1, pp. 39–49, 2023, doi: 10.29407/gj.v7i1.19797.

B. Charbuty and A. Abdulazeez, “Classification based on Decision Tree Algorithm for Machine Learning,” J. Appl. Sci. Technol. Trends, vol. 2, no. 01, pp. 20–28, 2021, doi: 10.38094/jastt20165.

A. H. Nasrullah, “Implementasi Algoritma Decision Tree untuk Klasifikasi Produk Laris,” J. Ilm. Ilmu Komput., vol. 7, no. 2, pp. 45–51, 2021, doi: 10.35329/jiik.v7i2.203.

R. Nursyahfitri, A. N. Maharadja, R. A. Farissa, and Y. Umaidah, “Klasifikasi Penentuan Jenis Obat menggunakan Algoritma Decision Tree,” J. Inform. Polinema, vol. 7, no. 3, pp. 53–60, 2021, doi: 10.33795/jip.v7i3.629.

R. Siringoringo, R. Perangin Angin, and B. Rumahorbo, “Model Klasifikasi Genetic-Xgboost dengan T-Distributed Stochastic Neighbor Embedding pada Peramalan Pasar,” J. TIMES, vol. 11, no. 1, pp. 30–36, 2022, doi: 10.51351/jtm.11.1.2022672.

T. Dengan, C. Algoritma, D. M. Musa, D. Sakti, K. A. Shantiony, and S. K. Putri, “Penerapan Data Mining untuk Klasifikasi Data Penjualan Pakan Ternak Penerapan Data Mining untuk Klasifikasi Data Penjualan Pakan Ternak Terlaris dengan Algoritma C4 . 5,” no. March, 2024, doi: 10.37012/jtik.v10i1.1985.

Y. Azhar, A. K. Firdausy, and P. J. Amelia, “Perbandingan Algoritma Klasifikasi Data Mining untuk Prediksi Penyakit Stroke,” vol. 5, no. 2, pp. 191–197, 2022.




DOI: https://doi.org/10.32520/stmsi.v14i3.5100

Article Metrics

Abstract view : 108 times
PDF - 25 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.