Interpretability and Explainability - HanjieChen/Reading-List GitHub Wiki

Mechanistic Interpretability

Interpretable Model

Human Perspective

Evaluation

Concept-based

Interpretability, Explainability, Robustness

Information Bottleneck

Feature Interactions

Influence Functions

Find supporting training examples as explanations

Interpretation for analyzing tasks

Explainability Evaluation

Interpretation Methods

Meta-learning for Few-Shot Text Classification

Shapley Values for Interpretation

Improving Transparency

Rationales

Variational Information Bottleneck based Methods

Learn masks for interpretation

Learning with rationale

Improve interpretability and accuracy

e-SNLI