Bibliography for Next Work - cdm-depaul/dietbot GitHub Wiki

I've collected the following so far. Please read ALL of them.

* Zero/One/Few-shot In-Context Learning (ICL)/Fine-Tuning Learning

"WangLab at MEDIQA-Chat 2023: Clinical Note Generation from Doctor-Patient Conversations using Large Language Models" (2023). [Published in the ClinicalNLP @ ACL 2023 Workshop] (https://arxiv.org/pdf/2305.02220).
(*) This paper may be the closest to our work!!!
Abstract snippet: "We report results for two approaches: the first fine-tunes a pre-trained language model (PLM) on the shared task data, and the second uses few-shot in-context learning (ICL) with a large language model (LLM). Both achieve high performance as measured by automatic metrics (e.g. ROUGE, BERTScore) and ranked second and first, respectively, of all submissions to the shared task. Expert human scrutiny indicates that notes generated via the ICL-based approach with GPT-4 are preferred about as often as human-written notes, making it a promising path toward automated note generation from doctor-patient conversations."

"Affect Recognition in Conversations Using Large Language Models" (2024) [Published in [SIGDIAL 2024] (https://arxiv.org/abs/2309.12881).
Abstract snippet: "This study investigates the capacity of large language models (LLMs) to recognise human affect in conversations, with a focus on both open-domain chit-chat dialogues and task-oriented dialogues. Leveraging three diverse datasets, namely IEMOCAP (Busso et al., 2008), EmoWOZ (Feng et al., 2022), and DAIC-WOZ (Gratch et al., 2014), covering a spectrum of dialogues from casual conversations to clinical interviews, we evaluate and compare LLMs' performance in affect recognition. Our investigation explores the zero-shot and few-shot capabilities of LLMs through in-context learning as well as their model capacities through task-specific fine-tuning."

"XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare" (July 2025, arXive only yet, arXiv:2405.06270, https://arxiv.org/pdf/2405.06270).
Clinical Decision Support System. Probably not so relevant, but here is an abstract snippet: "In this study, we introduce a knowledge-guided in-context learning (ICL) framework designed to enable large language models (LLMs) to effectively process structured clinical data. Our approach integrates domain-specific feature groupings, carefully balanced few-shot examples, and task-specific prompting strategies."

Other papers published in ACL 2023 ClinicalNLP workshop (2023), Workshop program. Several interesting papers!!!

* Evaluation of LLMs responses in clinical settings

"An evaluation framework for clinical use of large language models in patient interaction tasks" (2024), full paper;view-only, OpenReview of the full paper (very interesting!), 2-page poster abstract, Github dataset site (from where we'll use the datasets under 'data'; training/validation/test sets), ACL 2023 ClinicalNLP workshop
Abstract snippet: "This paper introduces the Conversational Reasoning Assessment Framework for Testing in Medicine (CRAFT-MD), a novel approach for evaluating clinical LLMs. Unlike traditional methods that rely on structured medical exams, CRAFT-MD focuses on natural dialogues, using simulated AI agents to interact with LLMs in a controlled, ethical environment. We applied CRAFT-MD to assess the diagnostic capabilities of GPT-4 and GPT-3.5 in the context of skin diseases."

"Using clinical Natural Language Processing for health outcomes research: Overview and actionable suggestions for future advances" (2018), full journal paper -- general but seminal paper on clinical NLP, "Here we provide a broad summary and outline of the challenging issues involved in defining appropriate intrinsic and extrinsic evaluation methods for NLP research that is to be used for clinical outcomes research, and vice versa." Good to cite as a reference (particularly since it's written by NIH).

"Assessing Empathy in Large Language Models with Real-World Physician-Patient Interactions" (May 2024, arXive only yet, arXiv:2405.16402, https://arxiv.org/pdf/2405.16402).
Abstract snippet: " This study investigates an intriguing question Can ChatGPT respond with a greater degree of empathy than those typically offered by physicians? To answer this question, we collect a de-identified dataset of patient messages and physician responses from Mayo Clinic and generate alternative replies using ChatGPT." -- but assessing/evaluation only.

"Dr.Copilot: A Multi-Agent Prompt Optimized Assistant for Improving Patient-Doctor Communication in Romanian" (July 2025, arXive only yet, arXiv:2507.11299, https://arxiv.org/pdf/2507.11299).
Abstract snippet: "we introduce Dr.Copilot , a multi-agent large language model (LLM) system that supports Romanian-speaking doctors by evaluating and enhancing the presentation quality of their written responses. Rather than assessing medical correctness, Dr.Copilot provides feedback along 17 interpretable axes." -- but assessing/evaluation only again.

⚠️ **GitHub.com Fallback** ⚠️