Paper
GeneTuring tests GPT models in genomics
Published 2023 · Wenpin Hou, Zhicheng Ji
30
Citations
2
Influential Citations
Abstract
Generative Pre-trained Transformers (GPT) are powerful language models that have great potential to transform biomedical research. However, they are known to suffer from artificial hallucinations and provide false answers that are seemingly correct in some situations. We developed GeneTuring, a comprehensive QA database with 600 questions in genomics, and manually scored 10,800 answers returned by six GPT models, including GPT-3, ChatGPT, and New Bing. New Bing has the best overall performance and significantly reduces the level of AI hallucination compared to other models, thanks to its ability to recognize its incapacity in answering questions. We argue that improving incapacity awareness is equally important as improving model accuracy to address AI hallucination.
Sign up to use Study Snapshot
Consensus is limited without an account. Create an account or sign in to get more searches and use the Study Snapshot.
Full text analysis coming soon...