Medical NLP - HanjieChen/Reading-List GitHub Wiki

DOCLENS: Multi-aspect Fine-grained Evaluation for Medical Text Generation
Clinical Reasoning of a Generative Artificial Intelligence Model Compared With Physicians
Use of GPT-4 to Diagnose Complex Clinical Cases
A Data-Centric Approach To Generate Faithful and High Quality Patient Summaries with Large Language Models
CliniQG4QA: Generating Diverse Questions for Domain Adaptation of Clinical Question Answering
ALPACARE:INSTRUCTION-TUNED LARGE LANGUAGE MODELS FOR MEDICAL APPLICATION
ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge
Health-LLM: Personalized Retrieval-Augmented Disease Prediction Model
Hidden Flaws Behind Expert-Level Accuracy of GPT-4 Vision in Medicine
Towards Conversational Diagnostic AI
MEDEVAL: A Multi-Level, Multi-Task, and Multi-Domain Medical Benchmark for Language Model Evaluation
Diagnostic Accuracy of a Large Language Model in Pediatric Case Studies
Clinicians could be fooled by biased AI, despite explanations
Designing Guiding Principles for NLP for Healthcare: A Case Study of Maternal Health
Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study
Publicly Available Clinical BERT Embeddings
Identification of Semantically Similar Sentences in Clinical Notes: Iterative Intermediate Training Using Multi-Task Learning
Use of GPT-4 to Analyze Medical Records of Patients With Extensive Investigations and Delayed Diagnosis
Chatbot vs Medical Student Performance on Free-Response Clinical Reasoning Examinations
Accuracy of a Generative Artificial Intelligence Model in a Complex Diagnostic Challenge
MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Foundation Metrics: Quantifying Effectiveness of Healthcare Conversations powered by Generative AI
Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine
A study of generative large language model for medical research and healthcare
LongBoX: Evaluating Transformers on Long-Sequence Clinical Tasks
Better to Ask in English: Cross-Lingual Evaluation of Large Language Models for Healthcare Queries
Large language models propagate race-based medicine
MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records
HuatuoGPT, towards Taming Language Model to Be a Doctor
The Promise and Pitfalls of AI in the Complex World of Diagnosis, Treatment, and Disease Management
Large Language Models Answer Medical Questions Accurately, but Can’t Match Clinicians’ Knowledge
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
Large language models in medicine
Augmenting Black-box LLMs with Medical Textbooks for Clinical Question Answering
How Chatbots and Large Language Model Artificial Intelligence Systems Will Reshape Modern Medicine Fountain of Creativity or Pandora’s Box?
ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations
Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow
The shaky foundations of large language models and foundation models for electronic health records
MedAlpaca -- An Open-Source Collection of Medical Conversational AI Models and Training Data
DR.BENCH: Diagnostic Reasoning Benchmark for Clinical Natural Language Processing
Multi-Task Training with In-Domain Language Models for Diagnostic Reasoning
PMC-LLaMA: Further Finetuning LLaMA on Medical Papers
Faithful AI in Medicine: A Systematic Review with Large Language Models and Beyond
Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models
Opportunities and Challenges for ChatGPT and Large Language Models in Biomedicine and Health
ClinicalT5: A Generative Language Model for Clinical Text
Do We Still Need Clinical Language Models?
Can LLMs like GPT-4 outperform traditional AI tools in dementia diagnosis? Maybe, but not today
Faithful AI in Healthcare and Medicine
Can large language models reason about medical questions?
How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models
ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data and Comprehensive Evaluation
Clinical Camel: An Open-Source Expert-Level Medical Language Model with Dialogue-Based Knowledge Encoding
Guidelines and Evaluation of Clinical Explainable AI in Medical Image Analysis
Health system-scale language models are all-purpose prediction engines
What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams
Learning to Ask Like a Physician
Large Language Models Encode Clinical Knowledge
Towards Expert-Level Medical Question Answering with Large Language Models
Capabilities of GPT-4 on Medical Challenge Problems