Home - ua-datalab/AI-for-Professionals GitHub Wiki

AI/ML & Data in Public Health: A Non-Coder's Toolkit!

AI/ML Enhanced Tools

mindmap
  ((**AI/ML Toolkit**))
    id(**Code Development**)
      Visual Studio Code
      Jupyter Notebooks
      Marimo
      Quarto
    id(**Data Analysis Platforms**)
      KNIME
      OpenRefine
      Orange
    id(**Machine Learning <br/> Deep Learning**)
      Scikit-Learn
      PyTorch
      Tensorflow
    id(**Natural Language Processing <br/> NLP**)
      SpaCy
      NLTK
    id(**Geospatial Analysis**)
      QGIS
      Felt
    id(**Databases**)
      Duckdb
    id(**Data Visualization**)
      Data-to-Viz
      Voyager
      Flourish
      datawrapper
      Google Looker Studio 
      PowerBI
      Plotly
      Shiny
      Tableau
    id(**Generating Ideas**)
      ChatGPT
      Claude
      Open Source LLMs
         UArizona AI Verde
      Gemini
      Google Notebook
      Perplexity AI
    id(**Collaborative Research <br/> #38; Information Gathering**)
      Elicit
      Research Rabbit
      SciSpace
      Scite
      Semantic Scholar
    id(**Project Documentation**)
      GitHub Pages
      Google Docs
      Notion
    id(**Brainstorming <br/> #38; Mind Mapping**)
      NotebookLM
      Miro
      MindMeister
Loading

Data-Driven Decisions, Healthier Communities: Your Non-Coding Journey into Public Health AI.

What You'll Discover:

  • Understand fundamental data concepts without the jargon.
  • Explore free, user-friendly AI and data tools.
  • Apply these concepts to real-world public health challenges through hands-on activities.
  • Navigate the ethical landscape of AI in healthcare.

๐Ÿ”– Please see: Presentation Slides

How to Use This Site: Guide on navigating modules, completing activities, and using resources.

"Start Your Journey Here": Learning Objectives or jump to this page Overview.


๐Ÿ”– Learning Objectives (Click me!)

Learning Objectives

Upon completion of this session and engagement with this resource, you will be able to:

  • Define fundamental data concepts (e.g., dataset, variable, data types, big data) in the context of public health.
  • Identify key sources of public health data and recognize indicators of data quality.
  • Explain the basic principles of data visualization and its importance in communicating public health information.
  • Describe the core concepts of Artificial Intelligence (AI), Machine Learning (ML), and Large Language Models (LLMs) without technical jargon.
  • Utilize prompt engineering techniques to effectively interact with LLMs (like ChatGPT/Gemini) for tasks relevant to public health.
  • Recognize common open-source software and AI tools accessible to non-coders for basic data exploration and AI interaction.
  • Discuss ethical considerations and potential biases associated with using data and AI in public health.
  • Apply these concepts through practical, non-coding exercises simulating real-world public health scenarios.

๐Ÿ”– Overview (Click me!)

Overview: Decoding Data & AI/ML for Better Public Health


Open-Source Software & AI/ML Tools (Non-Coder's Gateway)

Introduction: You don't need a PhD in computer science or a big budget to start using powerful tools!

A. Large Language Models (LLMs) & Conversational AI:

๐Ÿ”– (Click me!)
  • Tools: ChatGPT (OpenAI) / Gemini (Google) / Claude (Anthropic)
    • Main Features: Natural language understanding and generation, summarization, brainstorming, drafting text, answering questions.
    • Practical Healthcare Applications:
      • Drafting patient education materials (e.g., "Explain type 2 diabetes in simple terms for a brochure").
      • Summarizing research papers or public health reports for quick insights.
      • Brainstorming public health campaign slogans or outreach strategies.
      • Generating FAQs for common health concerns.
    • Prompt Engineering Focus:
      • Concept: How to ask the right questions to get the best results. Explain "Role, Task, Format, Constraints."
      • Example:
        • Weak Prompt: "Tell me about vaccination."
        • Strong Prompt: "Act as a public health advisor. Create a list of 5 key benefits of childhood vaccination for parents of newborns, written in clear, empathetic language, suitable for a pamphlet. Each benefit should be one sentence."
  • Tools: PubMed Central / Perplexity AI / Semantic Scholar
    • Main Features: Conversational search engine that provides answers with cited sources. Excellent for research and fact-finding.
    • Practical Healthcare Applications:
      • Quickly finding evidence-based information on specific health conditions or interventions.
      • Identifying recent research papers on a public health topic.
      • Checking the source of health claims.

B. Data Collection & Surveys:

๐Ÿ”– (Click me!)
  • Tools: Google Forms / Microsoft Forms
    • Main Features: Easy-to-create surveys, quizzes, and feedback forms. Collects responses in a spreadsheet.
    • Practical Healthcare Applications: Community health needs assessments, patient satisfaction surveys, collecting sign-ups for health workshops, post-event feedback.
  • Tool: KoboToolbox
    • Main Features: Free, open-source suite for field data collection, often used in humanitarian and development contexts. Works offline.
    • Practical Healthcare Applications: Epidemiological surveys in remote areas, health facility assessments, monitoring public health interventions.

C. Basic Data Handling & Visualization (Spreadsheet Software):

๐Ÿ”– (Click me!)
  • Tools: OpenRefine / Google Sheets / Microsoft Excel (Online/Free versions)
    • Main Features: Organizing data in tables, basic calculations (sum, average), creating simple charts (bar, pie, line).
    • Practical Healthcare Applications: Tracking patient appointments, managing small project budgets, creating simple dashboards for clinic metrics, and visualizing immunization rates over time.
  • Tools: DataVoyager / Flourish / Datawrapper
    • Main Features: Web-based tools for creating interactive and embeddable charts, maps, and tables with no coding. Generous free tiers.
    • Practical Healthcare Applications: Creating compelling visuals for public health reports, presentations, or websites (e.g., mapping disease prevalence, showing trends in health behaviors).

Key Topics & Experiential Learning Use Cases

๐Ÿ”– Learning Modules (Click me!)

Module 1: What is Data? (Data Literacy Basics)

  • Content: Defining data, datasets, variables, data types (quantitative vs. qualitative). Importance of context. What is "Big Data" in simple terms?
  • Experiential Learning Use Cases (Non-Coding):
    1. Scenario Analysis: Given a brief public health scenario (e.g., a local flu outbreak), identify 5 types of data that would be useful to collect (e.g., number of cases, age of patients, vaccination status, onset date, symptoms). Classify each as quantitative or qualitative.
    2. "Spot the Data": Look at a simplified public health infographic (provided). Identify 3 key data points and what they represent.
    3. Data Scavenger Hunt: Find a public health statistic from a reputable source (e.g., WHO, CDC website). Describe what it measures and its unit of measurement.
    4. Dataset Exploration (Conceptual): Review a very small, clean sample dataset (e.g., 10 rows, 5 columns in a table about patient demographics and a health outcome). Identify the variables and their likely data types.

Module 2: Finding Good Data (Data Sources & Quality)

  • Content: Common public health data sources (surveys, surveillance systems, electronic health records, census data). Characteristics of good quality data (accuracy, completeness, timeliness, reliability, relevance).
  • Experiential Learning Use Cases (Non-Coding):
    1. Source Evaluation: Given two fictional data sources for child malnutrition rates (one from a well-known NGO, one from an anonymous blog), discuss which is likely more reliable and why.
    2. "Is This Data Healthy?": Review a small, sample dataset with obvious errors or missing values (e.g., age = 150, city missing for half the entries). Identify 3 quality issues.
    3. Survey Critique: Review a short sample survey (provided) with leading questions or biased options. Identify 2-3 problematic questions and suggest improvements.
    4. Brainstorming Data Gaps: For a specific public health goal (e.g., reducing smoking in teens), brainstorm what data is needed and where it might be found or how it could be collected ethically.

Module 3: Making Data Speak (Introduction to Data Visualization)

  • Content: Why visualize data? Common chart types (bar, line, pie, map) and when to use them. Principles of clear and honest visualization (avoiding misleading charts).
  • Experiential Learning Use Cases (Non-Coding):
    1. Chart Match-Up: Given 3 simple datasets (e.g., disease cases over time, comparison of risk factors by percentage, geographical distribution of clinics) and 3 charts types (line, pie, map), match the best chart to each dataset.
    2. "Bad Chart" Detective: Analyze a misleading chart (provided) and explain why it's problematic (e.g., truncated y-axis, confusing colors).
    3. Sketch-a-Visual: Given a public health message (e.g., "Cases of X disease have increased by 50% in the last year among young adults"), sketch a simple chart idea to communicate this effectively.
    4. Interactive Exploration: Use a link to a pre-made interactive chart on Flourish or Datawrapper (e.g., showing global health indicators). Explore the chart and write down two insights you gained.

Module 4: AI Unveiled: Your Smart Assistant (Intro to AI, ML, & LLMs)

  • Content: Simple definitions of AI, Machine Learning (learning from data), and Large Language Models (understanding and generating text). Focus on what LLMs can do for them.
  • Experiential Learning Use Cases (Non-Coding):
    1. LLM Task Brainstorm: List 3 routine tasks in your public health role (e.g., drafting emails, summarizing meeting notes, finding information) where an LLM like ChatGPT could assist.
    2. Prompt Practice - Summarization: Take a short paragraph from a public health news article (provided). Use ChatGPT/Gemini with the prompt: "Summarize this text in one sentence for a busy public health official." Compare the output to the original.
    3. Prompt Practice - Idea Generation: Use ChatGPT/Gemini with the prompt: "Act as a health communication specialist. Brainstorm 5 catchy slogans for a campaign encouraging handwashing in primary schools."
    4. AI Output Critique: Given a short AI-generated text about a health topic, identify one strength and one potential area for improvement or fact-checking. (e.g., Is it too generic? Does it cite sources? Is the tone appropriate?).

Module 5: Using Data & AI Wisely (Ethics, Bias, & Privacy)

  • Content: Importance of data privacy (anonymization, confidentiality). Potential for bias in data and AI algorithms. Fairness, accountability, and transparency in AI.
  • Experiential Learning Use Cases (Non-Coding):
    1. Case Study Discussion: Read a short, fictional case study about an AI tool used for disease prediction that shows biased results against a certain demographic. Discuss 2-3 ethical concerns.
    2. Privacy Brainstorm: A local clinic wants to share patient data for a research study on diabetes trends. What are 3 key privacy considerations or steps they must take before sharing? (e.g., de-identification, informed consent).
    3. Bias Identification: "An AI model is trained primarily on health data from one ethnic group to predict heart disease risk." Discuss potential bias if this model is then applied to a diverse population.
    4. "Ethical AI" Checklist Creation: Brainstorm a 3-5 point checklist of questions a public health professional should ask before adopting a new AI tool in their work (e.g., Where did the data come from? How is privacy protected? Has it been tested for fairness?).

โœ๏ธ ๐Ÿ“ ๐Ÿ“‹ โœ‚๏ธ ๐Ÿ“Ž AI/ML Toolkit

Collaborative Research & Information Gathering

  • Connected Papers. Connected papers, a web-based AI tool that helps researchers explore and discover academic papers in their field of interest.
  • Elicit. Elicit AI is a free tool developed by Ought that helps researchers with various aspects of the research process, particularly literature reviews.
  • Perplexity. Perplexity AI is an artificial intelligence-powered search engine that aims to provide users with comprehensive and accurate answers to their questions.
  • PubMed Central. PubMed Central (PMC) is a free digital archive of full-text biomedical and life sciences journal articles. It's a repository maintained by the U.S. National Library of Medicine (NLM), and many articles are made available for free access due to NIH funding policies.
  • Research Rabbit. ResearchRabbit is a free, AI-powered online platform that helps researchers map and explore the literature in their field.
  • SciSpace. SciSpace is an AI-powered research platform designed to help academics efficiently navigate scholarly literature.
  • Scite. Scite is an AI-powered research platform that helps users understand and evaluate research articles by providing context and classification for citations.
  • Semantic Scholar. Semantic Scholar is a free, AI-powered search engine and research tool that helps scientists and researchers discover and understand scientific literature.

Generating ideas

  • Open AI ChatGPT. ChatGPT is a large language model chatbot created by OpenAI that can engage in human-like conversations and generate text based on various prompts. It powers Microsoft Copilot.
  • Gemini. Google Gemini is a large language model (LLM) and multimodal AI assistant that can be accessed through a chatbot interface.
  • Google NotebookLM. NotebookLM (Google NotebookLM) is a research and note-taking online tool developed by Google Labs that uses artificial intelligence (AI), specifically Google Gemini, to assist users in interacting with their documents.
  • Google AI Studio. Google AI Studio is a free, browser-based Integrated Development Environment (IDE) that allows users to experiment with and prototype applications using Google's Gemini family of generative AI models.
  • Claude. Claude AI is a large language model (LLM) and AI chatbot developed by Anthropic that excels at natural language processing (NLP).
  • U of Arizona AI Verde. Local LLMs.
  • Chatbox. Chatbox software is a user interface, typically a pop-up window or widget on a website or application, that facilitates communication between a user and either a live agent (human) or a chatbot (AI-powered). Requires an API.

General References:


Specific applications

Code development

  • Visual Studio Code (VS Code). Visual Studio Code is a free, cross-platform code editor developed by Microsoft.
  • Jupyter Notebooks. A Jupyter Notebook is a web-based interactive computing environment that allows users to create and share documents containing live code, equations, visualizations, and narrative text.
  • Marimo. marimo is an open-source reactive notebook for Python โ€” reproducible, git-friendly, SQL built-in, executable as a script, and shareable as an app.
  • Quarto. Quarto provides a unified authoring framework for data science, combining your code, its results, and your prose. Quarto documents are fully reproducible.

Data analysis platforms

  • KNIME. KNIME (Konstanz Information Miner) is a free and open-source data analytics platform that allows users to build data science workflows without extensive coding skills.
  • OpenRefine. OpenRefine is a free, open-source software tool that cleans, transforms, and enriches data, especially when dealing with messy or incomplete datasets.
  • Orange Data Mining. Orange is a visual programming toolkit that facilitates data visualization, machine learning, and data analysis.

Databases

Datasets

Healthcare Datasets

Data Visualization

  • Data-to-Viz.com. From Data to Viz leads you to the most appropriate graph for your data. It links to the code to build it and lists common caveats you should avoid.
  • dataVoyager. Data Voyager is a data visualization tool that helps users explore and analyze data by combining manual and automated chart specification techniques.
  • Datawrapper. Datawrapper is a user-friendly web-based tool for creating and sharing data visualizations like charts, maps, and tables.
  • Exploratory. Exploratoryโ€™s Simple UI experience makes it possible for anyone to use Data Science to explore data quickly, discover deeper insights, and communicate effectively (Downloadable. Installs R).
  • Google Looker Studio. Looker Studio is a free, web-based data visualization and reporting tool from Google Cloud that allows users to create interactive dashboards and reports from various data sources.
  • RAWGraphs. RAWGraphs is a free, open-source web-based tool designed for creating data visualizations, particularly for designers and those who want to create custom visualizations without extensive coding.
  • ObservableHQ. ObservableHQ is a platform and ecosystem for building interactive web-based data visualizations and dashboards.
  • Tableau. Tableau is a visual analytics platform and business intelligence (BI) software that helps users visualize, analyze, and share data.
  • PowerBI. Power BI is a suite of business analytics services and software from Microsoft designed to help users visualize and analyze data to gain insights and make informed decisions.
  • Shiny. A Shiny app is an interactive web application built using the Shiny framework, which is part of the R programming language.
  • plotly. Plotly provides online graphing, analytics, and statistics tools for individuals and collaboration, as well as scientific graphing libraries for Python, R, MATLAB, Julia, and others.

Web development oriented

  • Gradio. Gradio is a Python library that simplifies building interactive web applications, particularly for machine learning demos and applications.
  • Streamlit. Streamlit is an open-source Python library that makes it easy to build and share interactive, data-rich web apps.

Geospatial applications

  • QGIS. QGIS (formerly Quantum GIS) is a free and open-source Geographic Information System (GIS) software that allows users to create, analyze, and manage spatial data.
  • Felt. Felt is a software platform that enables users to easily create, visualize, analyze, and share maps online.

Machine Learning / Deep Learning

  • Scikit-Learn. Scikit-learn is a free and open-source machine learning library for the Python programming language.
  • PyTorch. PyTorch is an open-source machine learning framework based on the Torch library, primarily developed by Meta AI. It is used for applications such as computer vision and natural language processing.
  • Tensorflow. TensorFlow is a software library for machine learning and artificial intelligence. It can be used across a range of tasks, but is used mainly for training and inference of neural networks. It is one of the most popular deep learning frameworks, alongside others such as PyTorch.

Natural Language Processing


References & Further Learning


Created: 04/29/2025 (C. Lizรกrraga)

Updated: 06/10/2025 (C. Lizรกrraga)

๐Ÿ“š ๐Ÿ“‘ UArizona DataLab Learning Resources

UArizona DataLab, Data Science Institute, University of Arizona.

โš ๏ธ **GitHub.com Fallback** โš ๏ธ