Implementasi Principal Component Analysis (PCA) dan Gap Statistic untuk Clustering Kanker Payudara pada Algoritma K-Means

Ridha Afifa, Muhammad Itqan Mazdadi, Triando Hamonangan Saragih, Fatma Indriani, Muliadi Muliadi

Abstract


Breast cancer is one of the most common causes of death worldwide. Data mining can be utilized to detect breast cancer, where information is extracted from data to provide valuable insights. Clustering of breast cancer is conducted to assist medical professionals in grouping the characteristics of each cancer type. However, multicollinearity in breast cancer data can impact clustering results. To address this issue, dimensionality reduction through Principal Component Analysis (PCA) is employed. PCA can effectively handle multicollinearity issues and enhance computational efficiency. Additionally, the K-Means method has limitations in determining the optimal number of clusters. Therefore, the Gap Statistic method is employed to find the optimal K value suitable for breast cancer data. This study compares the evaluation results of the K-Means clustering model, the combined PCA-KMeans clustering model, and the combined PCA-GapStatistic-KMeans clustering model. The findings indicate that the evaluation results for the K-Means model with PCA dimensionality reduction and optimal Gap Statistic K are superior to the K-Means model without dimensionality reduction. The Gap Statistic suggests 2 clusters as the optimal number, with an evaluation result of 1.195513.

Full Text:

PDF

References


D. Cahyanti, A. Rahmayani, and S. Ainy Husniar, “Indonesian Journal of Data and Science Analisis performa metode Knn pada Dataset pasien pengidap Kanker Payudara,” Indonesian Journal of Data and Science, vol. 1, no. 2, pp. 39–43, 2020.

Agung Winasis and Ratna Djuwita, “Obesitas dan Kanker Payudara : Literature Review,” Media Publikasi Promosi Kesehatan Indonesia (MPPKI), vol. 6, no. 8, pp. 1501–1508, Aug. 2023, doi: 10.56338/mppki.v6i8.3501.

E. Susilowati, A. T. Hapsari, M. Efendi, and P. E. Kresnha, “Diagnosa Penyakit Kanker Payudara Menggunakan Metode K-Means Clustering,” JUST IT: Jurnal Sistem Informasi, Teknologi Informasi dan Komputer, vol. 10, no. 1, pp. 27–32, 2019, [Online]. Available: https://jurnal.umj.ac.id/index.php/just-it

A. Almayda and S. Saepudin, “Penerapan Data Mining K-Means Clustering Untuk Mengelompokkan Berbagai Jenis Merk Smartphone,” in SISMATIK (Seminar Nasional Sistem Informasi dan Manajemen Informatika) , 2021.

M. Nishom and M. Y. Fathoni, “Implementasi Pendekatan Rule-Of-Thumb untuk Optimasi Algoritma K-Means Clustering,” vol. 03, no. 02, 2018.

S. Ika Murpratiwi, I. Gusti Agung Indrawan, and A. Aranta, “Analisis Pemilihan Cluster Optimal Dalam Segmentasi Pelanggan Toko Reatil,” Jurnal Pendidikan Teknologi dan Kejuruan, vol. 18, no. 2, 2021.

R. Silvi, “Analisis Cluster dengan Data Outlier Menggunakan Centroid Linkage dan K-Means Clustering untuk Pengelompokkan Indikator HIV/AIDS di Indonesia,” Jurnal Matematika “MANTIK,” vol. 4, no. 1, pp. 22–31, May 2018, doi: 10.15642/mantik.2018.4.1.22-31.

M. A. Nahdliyah, T. Widiharih, and A. Prahutama, “Metode K-Medoids Clustering Dengan Validasi Silhoutte Index dan C-Index (Studi Kasus Jumlah Kriminalitas Kabupaten/Kota di Jawa Tengah Tahun 2018),” JURNAL GAUSSIAN, vol. 8, no. 2, pp. 161–170, 2019, [Online]. Available: http://ejournal3.undip.ac.id/index.php/gaussian

P. N. Safitri, R. Aristawidya, and S. B. Faradilla, “Klasterisasi Faktor-Faktor Kemiskinan Di Provinsi Jawa Barat Menggunakan K-Medoids Clustering,” Journal of Mathematics Education and Science, vol. 4, no. 2, pp. 75–80, Oct. 2021, doi: 10.32665/james.v4i2.242.

A. N. Azizah, T. Widiharih, A. R. Hakim, D. Statistika, F. Sains, and D. Matematika, “Kernel K-Means Clustering Untuk Pengelompokan Sungai di Kota Semarang Berdasarkan Faktor Pencemaran Air,” Jurnal Gaussian, vol. 11, no. 2, pp. 228–236, 2022, [Online]. Available: https://ejournal3.undip.ac.id/index.php/gaussian/

N. Thamrin and A. W. Wijayanto, “Comparison of Soft and Hard Clustering: A Case Study on Welfare Level in Cities on Java Island,” Indonesian Journal of Statistics and Its Applications, vol. 5, no. 1, pp. 141–160, Mar. 2021, doi: 10.29244/ijsa.v5i1p141-160.

A. M. El-Mandouh, H. A. Mahmoud, L. A. Abd-Elmegid, and M. H. Haggag, “Optimized K-Means Clustering Model based on Gap Statistic,” IJACSA) International Journal of Advanced Computer Science and Applications, vol. 10, no. 1, 2019, [Online]. Available: www.ijacsa.thesai.org

C. A. Sugianto, A. H. Rahayu, and A. Gusman, “Algoritma K-Means Untuk Pengelompokkan Penyakit Pasien Pada Puskesmas Cigugur Tengah,” JOINT (Journal of Information Technology), vol. 02, no. 02, pp. 39–44, 2020.

E. Muningsih, N. Hasan, and G. B. Sulistyo, “Bianglala Informatika Penerapan Metode Principle Component Analysis (PCA) untuk Clustering Data Kunjungan Wisatawan Mancanegara ke Indonesia,” Bianglala Informatika, vol. 8, no. 2, pp. 58–62, 2020, [Online]. Available: www.bps.go.id

Pendi, “Analisis Regresi Dengan Metode Komponen Utama Dalam Mengatasi Masalah Multikolinieritas,” Buletin Ilmiah Math. Stat. dan Terapannya (Bimaster), vol. 10, no. 1, pp. 131–138, 2021.

S. Naveen, N. V. Kashyap, V. P. Kulkarni, A. Sandeep, and M. S. Chakradhar, “Breast Cancer Prediction Using Unsupervised Learning Technique K-Means Clustering Algorithm,” in ViTECoN 2023 - 2nd IEEE International Conference on Vision Towards Emerging Trends in Communication and Networking Technologies, Proceedings, Institute of Electrical and Electronics Engineers Inc., 2023. doi: 10.1109/ViTECoN58111.2023.10157765.

Taufik Hidayat, Mohamad Jajuli, and Susilawati, “Clustering daerah rawan stunting di Jawa Barat menggunakan algoritma K-Means,” INFOTECH : Jurnal Informatika & Teknologi, vol. 4, no. 2, pp. 137–146, Dec. 2023, doi: 10.37373/infotech.v4i2.642.

R. N. Puspita, “Analisis K-Means Cluster Pada Kabupaten/Kota Di Provinsi Banten Berdasarkan Indikator Indeks Pembangunan Manusia,” Jurnal Ilmiah Pendidikan Matematika, Matematika dan Statistika, vol. 2, no. 3, 2021, doi: 10.46306/lb.v2i3.

T. Zulyanti, “Perbandingan Pengelompokan Usaha Mikro Kecil Dan Menengah Di Kabupaten Klaten Tahun 2019 Dengan Metode K-Means Dan Clustering Large Application,” Jurnal Statistika Industri dan Komputasi, vol. 7, no. 1, pp. 46–59, 2022.

N. Shahadah Qur’ani and A. W. Wijayanto, “Implementasi K-Means dan Hierarchical Clustering Pada PenentuanTingkatan Smart City Tahun 2022 Berdasarkan Motion Index,” 2023. [Online]. Available: http://sistemasi.ftik.unisi.ac.id




DOI: https://doi.org/10.32520/stmsi.v13i5.4015

Article Metrics

Abstract view : 46 times
PDF - 25 times

Refbacks



Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.