Article GPepT: A Foundation Language Model for Peptidomimetics Incorporating Noncanonical Amino Acids

Yuna Oikawa ; Takanori Uzawa ORCID ; Francois Berenger ; Noriko Minagawa ; Akiko Yumoto ; Hideaki Takaku ; Ryo Tamura SAMURAI ORCID ; Yoshihiro Ito ORCID ; Koji Tsuda SAMURAI ORCID

Collection

Citation
Yuna Oikawa, Takanori Uzawa, Francois Berenger, Noriko Minagawa, Akiko Yumoto, Hideaki Takaku, Ryo Tamura, Yoshihiro Ito, Koji Tsuda. GPepT: A Foundation Language Model for Peptidomimetics Incorporating Noncanonical Amino Acids. ACS Medicinal Chemistry Letters. 2025, 16 (8), acsmedchemlett.5c00375. https://doi.org/10.1021/acsmedchemlett.5c00375

Description:

(abstract)

Language models have been increasingly popular in therapeutic peptide generation, but molecular diversity remains limited due to reliance on the 20 canonical amino acids. We propose a language model that generates peptidomimetics incorporating noncanonical elements like noncanonical amino acids and terminal modifications. To accomplish this, we created a vocabulary of over 17,000 noncanonical elements by extracting them from chemical formulas stored in the ChEMBL database. Our pretrained language model, GPepT, showed improved diversity in molecular structures and chemical properties. To demonstrate its real-world application, we fine-tuned the model for antimicrobial peptides. Experimental validation revealed that one of the generated peptidomimetics exhibited effective antimicrobial activity, marking a successful case of AI-driven peptide development. GPepT is fully accessible on HuggingFace: https://huggingface.co/Playingyoyo/GPepT.

Rights:

Keyword: Language model, amino acid

Date published: 2025-07-22

Publisher: American Chemical Society (ACS)

Journal:

  • ACS Medicinal Chemistry Letters (ISSN: 19485875) vol. 16 issue. 8 acsmedchemlett.5c00375

Funding:

  • Core Research for Evolutional Science and Technology JPMJCR21O2
  • Exploratory Research for Advanced Technology JPMJER1903
  • Agency for Cultural Affairs, Government of Japan JPMXP1122712807

Manuscript type: Publisher's version (Version of record)

MDR DOI:

First published URL: https://doi.org/10.1021/acsmedchemlett.5c00375

Related item:

Other identifier(s):

Contact agent:

Updated at: 2025-08-25 12:30:37 +0900

Published on MDR: 2025-08-25 12:19:24 +0900

Filename Size
Filename oikawa-et-al-2025-gpept-a-foundation-language-model-for-peptidomimetics-incorporating-noncanonical-amino-acids.pdf (Thumbnail)
application/pdf
Size 3 MB Detail
Filename ml5c00375_si_002.pdf
application/pdf
Size 841 KB Detail