Visual Studio Code - ua-datalab/AI-for-Professionals GitHub Wiki
Visual Studio Code (VS Code)
Visual Studio Code, commonly referred to as VS Code, is an integrated development environment developed by Microsoft for Windows, Linux, macOS, and web browsers. Features include support for debugging, syntax highlighting, intelligent code completion, snippets, code refactoring, and embedded version control with Git.
VS Code Pointers
Running VS Code options
There are at least two options to run VS Code:
- Online: Github Codespaces | Getting started with Github Codespaces
- Local: VS Code Download
Introduction
VSCode overview and installation
Visual Studio Code (VSCode) is a popular, free source-code editor developed by Microsoft. Here's an overview and installation guide:
Overview
- Lightweight yet powerful code editor with built-in debugging support, Git integration, and syntax highlighting
- Extensive marketplace with thousands of extensions to add functionality
- Supports multiple programming languages and frameworks
- Features intelligent code completion, linting, and integrated terminal
Installation Steps
- Visit code.visualstudio.com
- Download the appropriate version for your operating system (Windows, macOS, or Linux)
- Run the installer and follow the installation wizard
- Launch VSCode after installation
After installation, you can customize VSCode by:
- Installing extensions for your preferred programming languages
- Configuring settings to match your coding preferences
- Setting up keyboard shortcuts
Healthcare-specific extensions (Python, Jupyter, CSV viewers)
Here are some useful VSCode extensions for healthcare and biomedical data analysis:
Data Analysis Extensions
- Python: Essential for data processing, data analysis, and data visualization in healthcare.
- R: programming language for statistical computing and data visualization.
- Jupyter Notebooks: For interactive data analysis and visualization
- Rainbow CSV: For opening comma (
.csv
), tab (.tsv
), semicolon, and pipe-separated values files. - Excel Viewer: For handling patient and clinical data spreadsheets.
- YAML: For configuration files.
- Git Graph: For visualizing Git history.
- GitLens - For enhanced Git capabilities.
- SQLite Viewer: For managing healthcare databases.
Bioinformatics Extensions
- Nextflow: For managing bioinformatics workflows and pipelines.
- SnakeMake: For automating data analysis workflows.
- Markdown/GitHub Wiki Support: For documentation and collaboration.
Visualization Extensions
- Matplotlib and Seaborn Integration: For creating statistical plots and visualizations.
- Ggplot2: data visualization package for the statistical programming language R.
- Plotly: For interactive medical data visualization.
Basic Setup
Creating a healthcare project workspace
A well-organized workspace improves collaboration and reproducibility in clinical research:
- Create a structured project folder:
healthcare-project/
βββ data/
β βββ raw/ # Original unmodified data
β βββ processed/ # Cleaned and preprocessed data
β βββ external/ # External reference data
βββ notebooks/ # Jupyter notebooks for exploration and analysis
βββ src/ # Source code for use in this project
β βββ __init__.py
β βββ data/ # Scripts for data processing
β βββ features/ # Scripts for feature engineering
β βββ models/ # Scripts for training models
β βββ visualization/ # Scripts for creating visualizations
βββ models/ # Trained models
βββ reports/ # Generated analysis reports
β βββ figures/ # Generated graphics and figures
βββ environment.yml # Environment definition
βββ requirements.txt # Package dependencies
βββ README.md # Project documentation
βββ .gitignore # Files to ignore in version control
- Open the workspace in VSCode:
- Launch VSCode
- Select File > Open Folder
- Navigate to your healthcare-project folder and click "Open"
- Configure workspace settings for healthcare data:
- Create a
.vscode
folder in your project root - Add a
settings.json
file with healthcare-specific settings:
{
"python.linting.enabled": true,
"python.linting.pylintEnabled": true,
"python.formatting.provider": "black",
"editor.formatOnSave": true,
"files.exclude": {
"**/__pycache__": true,
"**/.pytest_cache": true
},
"python.testing.pytestEnabled": true,
"terminal.integrated.env.windows": {
"PYTHONPATH": "${workspaceFolder}"
},
"[csv]": {
"editor.maxTokenizationLineLength": 0
},
"csv-preview.formatValues": true
}
Setting up Python environment with healthcare libraries
Healthcare data science requires specific libraries for handling medical data formats, analysis, and visualization:
- Create a Virtual Environment Option 1: Using venv (built-in):
# In the VSCode terminal (Ctrl+`)
python -m venv venv
Option 2: Using Conda (recommended for more complex dependencies):
conda create -n healthcare-project python=3.10
conda activate healthcare-project
- Install Essential Healthcare Libraries Create a requirements.txt file in your project root with these healthcare-specific packages:
# Core data science
numpy==1.24.3
pandas==2.0.2
scipy==1.10.1
matplotlib==3.7.1
seaborn==0.12.2
jupyter==1.0.0
# Healthcare-specific
pydicom==2.3.1 # For medical imaging
nibabel==5.1.0 # For neuroimaging
biopython==1.81 # For genomic sequences
pyedflib==0.1.31 # For EEG/ECG data
clinicalnlp==1.0.0 # For clinical text processing
# Machine learning
scikit-learn==1.2.2
tensorflow==2.12.0 # For deep learning on medical images
torch==2.0.1 # Alternative for deep learning
xgboost==1.7.5 # For gradient boosting
# Healthcare ML-specific
lifelines==0.27.4 # For survival analysis
statsmodels==0.14.0 # For epidemiological models
missingno==0.5.2 # For handling missing values
# Visualization
plotly==5.15.0 # For interactive medical plots
dash==2.10.2 # For building clinical dashboards
# Data validation and privacy
great-expectations==0.16.13 # Data quality
presidio-analyzer==2.2.32 # PHI detection
faker==18.11.2 # Generate synthetic data
# Project management
python-dotenv==1.0.0 # For environment variables
Next, install these packages
pip install -r requirements.txt
- Environment Configuration
Create an
environment.yml
file for Conda users:
name: healthcare-project
channels:
- conda-forge
- defaults
dependencies:
- python=3.10
- numpy=1.24.3
- pandas=2.0.2
- matplotlib=3.7.1
- seaborn=0.12.2
- scikit-learn=1.2.2
- jupyter=1.0.0
- pip=23.1.2
- pip:
- pydicom==2.3.1
- nibabel==5.1.0
- biopython==1.81
- lifelines==0.27.4
- plotly==5.15.0
- presidio-analyzer==2.2.32
Configuring version control for collaborative clinical research
Hands-on Exercise: Patient Data Analysis
Import anonymized patient dataset
Write code to filter patients by diagnosis categories
Calculate basic statistics on patient demographics
Visualize treatment outcomes
Created: 04/29/2025 (C. LizΓ‘rraga)
Updated: 04/29/2025 (C. LizΓ‘rraga)
DataLab, Data Science Institute, University of Arizona.