Mapping Machine Learning Trends in Chemistry Research using LLM with Multi-Turn Prompting

Andreo Yudertha, Riski Dwimalida Putri

Abstract


A review of research in the field of chemistry that incorporates machine learning is essential to identify recent developments and explore its potential applications. Published research articles provide an opportunity to analyze emerging research trends. The use of natural language processing (NLP) technology not only accelerates text data analysis but also enhances accuracy in understanding the content and context of scientific articles. Previously, trend analysis in ophthalmology research had been conducted using Zero-Shot Learning. In this study, an analysis of chemistry-related articles focusing on machine learning was carried out using a multi-turn prompting technique. The process began with data collection through web scraping of abstracts containing the keywords "machine learning" and "chemistry." The retrieved data was then tabulated and analyzed using a Large Language Model (LLM) with a Multi-Turn Prompting approach, where general prompts were initially used, followed by deeper exploration based on previous responses. Additionally, statistical descriptive analysis was performed using targeted prompts. Analysis of 200 article abstracts identified seven key terms related to the use of machine learning in chemistry: chemical (138 articles), protein (119 articles), drug (107 articles), structure (100 articles), molecular (96 articles), chemistry (91 articles), and quantum (84 articles). Furthermore, three dominant research topics were found in the intersection of chemistry and machine learning: protein and molecular structure, quantum chemistry, and drug discovery. The number of articles on machine learning in chemistry began to rise in 2012 and saw a significant increase in 2019. The findings suggest that there are still many opportunities for developing machine learning applications in chemistry, particularly in quantum chemistry. This field only began to gain attention in 2013, and the number of published articles remains relatively low each year, indicating that it is still in the early stages of exploration.

Keywords


Machine Learning, Tren, Chemistry, LLM

Full Text:

PDF

References


A. Alanazi, “Using Machine Learning for Healthcare Challenges and Opportunities,” Inform Med Unlocked, vol. 30, p. 100924, Jan. 2022, doi: 10.1016/J.IMU.2022.100924.

P. Vats and K. Samdani, “Study on Machine Learning Techniques in Financial Markets,” 2019 IEEE International Conference on System, Computation, Automation and Networking, ICSCAN 2019, Mar. 2019, doi: 10.1109/ICSCAN.2019.8878741.

A. Soni, D. Dharmacharya, A. Pal, V. Kumar Srivastava, R. N. Shaw, and A. Ghosh, “Design of a Machine Learning-based Self-Driving Car,” Studies in Computational Intelligence, vol. 960, pp. 139–151, 2021, doi: 10.1007/978-981-16-0598-7_11.

X. Wan et al., “Machine Learning Paves the Way for High Entropy Compounds Exploration: Challenges, Progress, and Outlook,” Advanced Materials, p. 2305192, 2023, doi: 10.1002/ADMA.202305192.

S. Dara, S. Dhamercherla, S. S. Jadav, C. M. Babu, and M. J. Ahsan, “Machine Learning in Drug Discovery: A Review,” Artificial Intelligence Review 2021 55:3, vol. 55, no. 3, pp. 1947–1999, Aug. 2021, doi: 10.1007/S10462-021-10058-4.

N. Fred and I. O. Temkin, “A Systematic Literature Review on the use of Machine Learning in Software Engineering,” Jun. 2024, Accessed: Jan. 28, 2025. [Online]. Available: https://arxiv.org/abs/2406.13877v1

K. Kolasa, B. Admassu, M. Hołownia-Voloskova, K. J. Kędzior, J. E. Poirrier, and S. Perni, “Systematic Reviews of Machine Learning in Healthcare: A Literature Review,” Expert Rev Pharmacoecon Outcomes Res, vol. 24, no. 1, pp. 63–115, Jan. 2024, doi: 10.1080/14737167.2023.2279107.

Y. F. Shi et al., “Machine Learning for Chemistry: Basics and Applications,” Engineering, vol. 27, pp. 70–83, Aug. 2023, doi: 10.1016/J.ENG.2023.04.013.

J. Sawicki, M. Ganzha, and M. Paprzycki, “The State of the Art of Natural Language Processing—A Systematic Automated Review of Nlp Literature using Nlp Techniques,” Data Intell, vol. 5, no. 3, pp. 707–749, Aug. 2023, doi: 10.1162/DINT_A_00213.

T. Labruna, J. A. Campos, and G. Azkune, “When to Retrieve: Teaching Llms to Utilize Information Retrieval Effectively,” Apr. 2024, Accessed: Nov. 07, 2024. [Online]. Available: https://arxiv.org/abs/2404.19705v2

N. R. Ruchitaa Raj, S. Nandhakumar Raj, and M. Vijayalakshmi, “Web Scrapping Tools and Techniques: A Brief Survey,” 2023 International Conference on Innovative Trends in Information Technology, ICITIIT 2023, 2023, doi: 10.1109/ICITIIT57246.2023.10068666.

H. Raja et al., “Using Large Language Models to Automate Category and Trend Analysis of Scientific Articles: an Application in Ophthalmology,” Aug. 2023, Accessed: Jan. 09, 2025. [Online]. Available: https://arxiv.org/abs/2308.16688v1

Y. (YJ) Chae and T. Davidson, “Large Language Models for Text Classification: from Zero-Shot Learning to Instruction-Tuning,” Aug. 2023, doi: 10.31235/OSF.IO/STHWK.

Z. Yi, J. Ouyang, Y. Liu, T. Liao, Z. Xu, and Y. Shen, “A Survey on Recent Advances in Llm-based Multi-Turn Dialogue Systems,” Feb. 2024, Accessed: Jan. 09, 2025. [Online]. Available: https://arxiv.org/abs/2402.18013v1

B. R. Kowalski and C. F. Bender, “Pattern Recognition.1 a Powerful Approach to Interpreting Chemical Data,” J Am Chem Soc, vol. 94, no. 16, pp. 5632–5639, Aug. 1972, doi: 10.1021/JA00771A016/ASSET/JA00771A016.FP.PNG_V03.

J. E. Saal, S. Kirklin, M. Aykol, B. Meredig, and C. Wolverton, “Materials Design and Discovery with High-Throughput Density Functional Theory: The Open Quantum Materials Database (OQMD),” JOM, vol. 65, no. 11, pp. 1501–1509, Nov. 2013, doi: 10.1007/S11837-013-0755-4/METRICS.

G. Montavon et al., “Machine Learning of Molecular Electronic Properties in Chemical Compound Space,” New J Phys, vol. 15, May 2013, doi: 10.1088/1367-2630/15/9/095003.

W. P. Walters and M. A. Murcko, “Prediction of ‘Drug-Likeness,’” Adv Drug Deliv Rev, vol. 54, no. 3, pp. 255–271, Mar. 2002, doi: 10.1016/S0169-409X(02)00003-0.

J. Jumper et al., “Highly Accurate Protein Structure Prediction with Alphafold,” Nature 2021 596:7873, vol. 596, no. 7873, pp. 583–589, Jul. 2021, doi: 10.1038/s41586-021-03819-2.




DOI: https://doi.org/10.32520/stmsi.v14i2.4961

Article Metrics

Abstract view : 178 times
PDF - 40 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.