Source
ICLR
DATE OF PUBLICATION
04/01/2024
Authors
Andrey Savchenko​ Elena Tutubalina Veronika Ganeeva Kuzma Khrabrov Artur Kadurin
Share

Chemical Language Models Have Problems with Chemistry: A Case Study on Molecule Captioning Task

Abstract

Drug discovery has been greatly enhanced through the recent fusion of molecular sciences and natural language processing, leading these research fields to significant advancements. Considering the crucial role of molecule representation in chemical understanding within these models, we introduce novel probing tests designed to evaluate chemical knowledge of molecular structure in state-of-the-art language models (LMs), specifically MolT5 and Chem+Text T5. These probing tests are conducted on a molecule captioning task to gather evidence and insights into the language models' comprehension of chemical information. By applying rules to transform molecular SMILES into equivalent variants, we have observed significant differences in the natural language descriptions generated by the LM for a given molecule depending on the exact transformation used.

Join AIRI