Information Retrieval Method for the Qur’an based on FastText and Latent Semantic Indexing

aziz ramadhan, Fandy Setyo Utomo

Abstract


Retrieving contextually relevant verses from the Al-Qur'an translation dataset presents significant challenges due to the linguistic richness and semantic variation of the text. This study aims to enhance the accuracy and relevance of information retrieval in the Al-Qur'an translation dataset by combining Latent Semantic Indexing (LSI) and FastText word embeddings. The proposed method involves several steps: text preprocessing (lowercasing, punctuation removal, stopword elimination, and stemming), tokenization and vocabulary creation, Bag-of-Words (BoW) representation, creation of LSI models, conversion of FastText vectors, and combining LSI and FastText vectors. A similarity index is then created from the combined vectors to process user queries and rank documents based on cosine similarity. Testing on the dataset, consisting of 6236 translated verses from 114 surahs, showed promising results. The combined approach effectively captures both broader semantic structures and detailed word meanings, providing more accurate and contextually relevant search results. Key findings include high similarity scores, with 90% of retrieved verses being highly relevant to the user query, an accuracy improvement to 85%, and enhanced handling of synonyms and morphological variations at 88%. Further development is recommended, including parameter optimization, advanced preprocessing techniques, real-time search optimization, integration of contextual embeddings, and multilingual support to improve search performance and accuracy.

Keywords


information retrieval; latent semantic indexing; word embedding; fasttext; al-qur'an

Full Text:

PDF

References


M. A. Ahmed, H. Baharin, and P. N. E. Nohuddin, “Analysis of K-means, DBSCAN and OPTICS Cluster algorithms on Al-Quran Verses,” International Journal of Advanced Computer Science and Applications, vol. 11, no. 8, pp. 248–254, 2020, doi: 10.14569/IJACSA.2020.0110832.

A. Farid et al., “Karakteristik Metode Tafsir Al-Quran Secara Holistik (Studi Literatur),” Indo-MathEdu Intellectuals Journal, vol. 4, no. 3, pp. 1709–1716, Nov. 2023, doi: 10.54373/imeij.v4i3.409.

M. A. Rasyid, M. A. Bijaksana, and I. Asror, “Pembangunan Korpus dari Rangkaian Kata yang Berulang pada Alquran,” Journal on Computing, vol. 4, no. 3, pp. 23–36, 2019, doi: 10.21108/indojc.2019.4.3.351.

M. A. Permana, E. Darwiyanto, and M. Arif Bijaksana, “Pembobotan dan Pemeringkatan Ayat Al-Quran berdasarkan Compound, Term Frequency dan Prinsip Pareto untuk Membantu Hafalan,” In E-Proceeding of Engineering, 2021, pp. 3352–3360.

M. Mauluddin, “Kontribusi Artificial Intelligence (AI) pada Studi Al Quran di Era Digital; Peluang dan Tantangan,” Madinah: Jurnal Studi Islam, vol. 11, no. 1, pp. 99–113, Jun. 2024, doi: 10.58518/madinah.v11i1.2518.

I. A. Rafisa and L. M. Kemas, “Klasifikasi Ayat Al-Quran Terjemahan Bahasa Inggris menggunakan Long Short Term Memory dan Bidirectional Long Short Term Memory,” e-Proceeding of Engineering, vol. 10, no. 5, pp. 4942–4947, 2023.

M. R. Choirulfikri, K. M. Lhaksamana, and S. Al Faraby, “A Multi-Label Classification of Al-Quran Verses using Ensemble Method and Naïve Bayes,” Building of Informatics, Technology and Science (BITS), vol. 3, no. 4, pp. 473–479, Mar. 2022, doi: 10.47065/bits.v3i4.1287.

A. R. Muslikh, I. Akbar, D. R. I. M. Setiadi, and H. Md Mehedul Islam, “Multi-label Classification of Indonesian Al-Quran Translation based CNN, BiLSTM, and FastText,” Februari, vol. 23, no. 1, pp. 37–50, 2024, [Online]. Available: https://quran.kemenag.go.id.

Rouf Abd. M. and F. C. Abd., “Relevansi Ayat al-Quran Secara Tematik menggunakan Pendekatan Graph-Based Knowledge dan Lexical-Search,” ILKOMNIKA: Journal of Computer Science and Applied Informatics, vol. 5, no. 1, pp. 96–104, 2023.

M. H. A. Purnomo and F. A. Bachtiar, “Pengelompokan Terjemah Al-Quran Departemen Agama menggunakan Metode Fuzzy C-Means,” vol. 5, no. 2, pp. 2548–964, 2021, [Online]. Available: http://j-ptiik.ub.ac.id

M. F. Fakhrezi, M. A. Bijaksana, and A. F. Huda, “Implementation of Automatic Text Summarization with TextRank Method in the Development of Al-Qur’an Vocabulary Encyclopedia,” in Procedia Computer Science, Elsevier B.V., 2021, pp. 391–398. doi: 10.1016/j.procs.2021.01.021.

I. Humaini, L. Wulandari, D. Ikasari, and T. Yusnitasari, “Penerapan Algoritma Tf-Idf Vector Space Model (VSM) pada Information Retrieval Terjemahan Al Quran Surat 1 sampai dengan Surat 16 berdasarkan Kesamaan Makna,” No. Seminar Nasional Teknik Elektro UIN Sunan Gunung Djati Bandung (SENTER 2019), pp. 525–534, 2019.

S. Eniyati, R. Candra, N. Santi, and H. Yulianton, “Penggunaan Sistem Temu Kembali dalam Pencarian Kata untuk Terjemahan Al Quran,” in Prosiding SENDI_U 2019, 2019, pp. 247–252.

D. I. A. Putra and M. Yusuf, “Proposing Machine Learning of Tafsir Al-Quran: In Search of Objectivity with Semantic Analysis and Natural Language Processing,” IOP Conf Ser Mater Sci Eng, vol. 1098, no. 2, p. 022101, Mar. 2021, doi: 10.1088/1757-899x/1098/2/022101.

A. Salama, Adiwijaya, and S. Al Faraby, “Klasifikasi Topik Ayat Al-Qur’an Terjemahan Berbahasa Inggris menggunakan Metode Support Vector Machine berbasis Vector Space Model dan Word2Vec,” e-Proceeding of Engineering, vol. 6, no. 2, pp. 9133–9142, 2019.

R. A. Rajagede, K. Haryono, and R. Qardafil, “Semantic Retrieval for Indonesian Quran Autocompletion,” Jordanian Journal of Computers and Information Technology, vol. 9, no. 2, pp. 94–106, Jun. 2023, doi: 10.5455/jjcit.71-1668279800.

N. Fatiara, N. H. Safaat, S. Agustian, and I. Afrianty, “Komparasi Metode K-Nearest Neighbors dan Long Short Term Memory,” ZONAsi: Jurnal Sistem Informasi, vol. 6, no. 2, pp. 332–345, 2024.

R. G. Kurniawan and M. Arif Bijaksana, “Building Related Words in Indonesian and English Translation of Al-Qur’an Vocabulary based on Distributional Similarity,” Jurnal Teknologi Informasi dan Terapan (J-TIT, vol. 7, no. 1, pp. 2580–2291, 2020, Accessed: Jul. 09, 2024. [Online]. Available: https://jtit.polije.ac.id/index.php/jtit/article/view/135

N. A. Verdikha, J. H. Dwiagam, and R. Hasudungan, “Indonesian Automated Essay Scoring with Bag of Word and Support Vector Regression,” JSE Journal of Science and Engineering, vol. 1, no. 2, pp. 95–100, Jan. 2024, doi: 10.30650/jse.v1i2.3841.

E. H. Fernando and H. Toba, “Pemanfaatan Latent Semantic Indexing untuk Mengukur Potensi Kerjasama Jurnal Ilmiah Lintas Universitas,” Jurnal Teknik Informatika dan Sistem Informasi, vol. 6, no. 3, Dec. 2020, doi: 10.28932/jutisi.v6i3.2894.




DOI: https://doi.org/10.32520/stmsi.v14i3.4446

Article Metrics

Abstract view : 180 times
PDF - 88 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.