Page Index - Paper-Reading-Study/2025 GitHub Wiki

107 page(s) in this GitHub Wiki:

Home
2025
July 2025
Jun 2025
May 2025
Apr 2025
Mar 2025
Feb 2025
Jan 2025
[25.01.13] Attention is All You Need
Please reload this page
[25.01.16] Training Large Language Models to Reason in a Continuous Latent Space
Please reload this page
[25.01.20] BERT: Pre‐training of Deep Bidirectional Transformers for Language Understanding
Please reload this page
[25.01.23] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT)
Please reload this page
[25.02.03] Learning Transferable Visual Models From Natural Language Supervision (CLIP)
Please reload this page
[25.02.06] Robust Speech Recognition via Large‐Scale Weak Supervision (Whisper)
Please reload this page
[25.02.10] Mamba: Linear‐Time Sequence Modeling with Selective State Spaces
Please reload this page
[25.02.13] RoFormer: Enhanced Transformer with Rotary Position Embedding
Please reload this page
[25.02.17] Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Please reload this page
[25.02.22] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
Please reload this page
[25.02.24] Auto‐Encoding Variational Bayes
Please reload this page
[25.02.27] Generative Adversarial Nets
Please reload this page
[25.03.04] Deep Dive into LLMs like ChatGPT by Andrej Karpathy
Please reload this page
[25.03.06] Neural Discrete Representation Learning (VQ‐VAE)
Please reload this page
[25.03.10] Denoising Diffusion Probabilistic Models
Please reload this page
[25.03.18] The Llama 3 Herd of Models
Please reload this page
[25.03.26] The Llama 3 Herd of Models(2)
Please reload this page
[25.04.03] KAN: Kolmogorov‐Arnold Networks
Please reload this page
[25.04.07] On the Biology of a Large Language Model
Please reload this page
[25.04.10] Retrieval‐Augmented Generation for Knowledge‐Intensive NLP Tasks
Please reload this page
[25.04.12] Inference‐Time Scaling for Generalist Reward Modeling
Please reload this page
[25.04.14] Welcome to the Era of Experience
Please reload this page
[25.04.17] Solving Olympiad Geometry Without Human Demonstrations (AlphaGeometry)
Please reload this page
[25.04.26] Neural Machine Translation by Jointly Learning to Align and Translate
Please reload this page
[25.04.28] Dropout: A Simple Way to Prevent Neural Networks from Overfitting
Please reload this page
[25.05.03] Trust Region Policy Optimization
Please reload this page
[25.05.08] Layers at Similar Depths Generate Similar Activations Across LLM Architectures
Please reload this page
[25.05.08] Proximal Policy Optimization
Please reload this page
[25.05.12] All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine‐Tuning
Please reload this page
[25.05.15] Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Please reload this page
[25.05.17] Mathematical discoveries from program search with large language models
Please reload this page
[25.05.19] AlphaEvolve: A coding agent for scientific and algorithmic discovery
Please reload this page
[25.05.26] Continuous Thought Machines
Please reload this page
[25.05.29] Group Normalization
Please reload this page
[25.05.31] Adam: A Method for Stochastic Optimization
Please reload this page
[25.06.02] LORA: LOW‐RANK ADAPTATION OF LARGE LANGUAGE MODELS
Please reload this page
[25.06.07] End‐to‐End Object Detection with Transformers
Please reload this page
[25.06.09] A Mathematical Theory of Communication
Please reload this page
[25.06.12] A Mathematical Theory of Communication
Please reload this page
[25.06.16] A Mathematical Theory of Communication Part III
Please reload this page
[25.06.19] A Mathematical Theory of Communication Part IV ~ V
Please reload this page
[25.06.21] Kullback‐Leibler Divergence
Please reload this page
[25.06.26] Denoising Diffusion Probabilistic Models
Please reload this page
[25.06.28] Non‐Cooperative Games
Please reload this page
[25.06.30] Bitcoin: A Peer‐to‐Peer Electronic Cash System
Please reload this page
[25.06.30] Denoising Diffusion Implicit Models (DDIM)
Please reload this page
[25.07.03] Deep Double Descent: Where Bigger Models and More Data Hurt
Please reload this page
[25.07.03] Score‐Based Generative Modeling through Stochastic Differential Equations
Please reload this page
Prompt Template
Please reload this page