Yuna Oikawa
;
Takanori Uzawa
;
Francois Berenger
;
Noriko Minagawa
;
Akiko Yumoto
;
Hideaki Takaku
;
Ryo Tamura
;
Yoshihiro Ito
;
Koji Tsuda
Description:
(abstract)Language models have been increasingly popular in therapeutic peptide generation, but molecular diversity remains limited due to reliance on the 20 canonical amino acids. We propose a language model that generates peptidomimetics incorporating noncanonical elements like noncanonical amino acids and terminal modifications. To accomplish this, we created a vocabulary of over 17,000 noncanonical elements by extracting them from chemical formulas stored in the ChEMBL database. Our pretrained language model, GPepT, showed improved diversity in molecular structures and chemical properties. To demonstrate its real-world application, we fine-tuned the model for antimicrobial peptides. Experimental validation revealed that one of the generated peptidomimetics exhibited effective antimicrobial activity, marking a successful case of AI-driven peptide development. GPepT is fully accessible on HuggingFace: https://huggingface.co/Playingyoyo/GPepT.
Rights:
Keyword: Language model, amino acid
Date published: 2025-07-22
Publisher: American Chemical Society (ACS)
Journal:
Funding:
Manuscript type: Publisher's version (Version of record)
MDR DOI:
First published URL: https://doi.org/10.1021/acsmedchemlett.5c00375
Related item:
Other identifier(s):
Contact agent:
Updated at: 2025-08-25 12:30:37 +0900
Published on MDR: 2025-08-25 12:19:24 +0900
| Filename | Size | |||
|---|---|---|---|---|
| Filename |
oikawa-et-al-2025-gpept-a-foundation-language-model-for-peptidomimetics-incorporating-noncanonical-amino-acids.pdf
(Thumbnail)
application/pdf |
Size | 3 MB | Detail |
| Filename |
ml5c00375_si_002.pdf
application/pdf |
Size | 841 KB | Detail |