Page Index - Paper-Reading-Study/2025 GitHub Wiki
107 page(s) in this GitHub Wiki:
- Home
- 2025
- July 2025
- Jun 2025
- May 2025
- Apr 2025
- Mar 2025
- Feb 2025
- Jan 2025
- [25.01.13] Attention is All You Need
- Please reload this page
- [25.01.16] Training Large Language Models to Reason in a Continuous Latent Space
- Please reload this page
- [25.01.20] BERT: Pre‐training of Deep Bidirectional Transformers for Language Understanding
- Please reload this page
- [25.01.23] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT)
- Please reload this page
- [25.02.03] Learning Transferable Visual Models From Natural Language Supervision (CLIP)
- Please reload this page
- [25.02.06] Robust Speech Recognition via Large‐Scale Weak Supervision (Whisper)
- Please reload this page
- [25.02.10] Mamba: Linear‐Time Sequence Modeling with Selective State Spaces
- Please reload this page
- [25.02.13] RoFormer: Enhanced Transformer with Rotary Position Embedding
- Please reload this page
- [25.02.17] Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- Please reload this page
- [25.02.22] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
- Please reload this page
- [25.02.24] Auto‐Encoding Variational Bayes
- Please reload this page
- [25.02.27] Generative Adversarial Nets
- Please reload this page
- [25.03.04] Deep Dive into LLMs like ChatGPT by Andrej Karpathy
- Please reload this page
- [25.03.06] Neural Discrete Representation Learning (VQ‐VAE)
- Please reload this page
- [25.03.10] Denoising Diffusion Probabilistic Models
- Please reload this page
- [25.03.18] The Llama 3 Herd of Models
- Please reload this page
- [25.03.26] The Llama 3 Herd of Models(2)
- Please reload this page
- [25.04.03] KAN: Kolmogorov‐Arnold Networks
- Please reload this page
- [25.04.07] On the Biology of a Large Language Model
- Please reload this page
- [25.04.10] Retrieval‐Augmented Generation for Knowledge‐Intensive NLP Tasks
- Please reload this page
- [25.04.12] Inference‐Time Scaling for Generalist Reward Modeling
- Please reload this page
- [25.04.14] Welcome to the Era of Experience
- Please reload this page
- [25.04.17] Solving Olympiad Geometry Without Human Demonstrations (AlphaGeometry)
- Please reload this page
- [25.04.26] Neural Machine Translation by Jointly Learning to Align and Translate
- Please reload this page
- [25.04.28] Dropout: A Simple Way to Prevent Neural Networks from Overfitting
- Please reload this page
- [25.05.03] Trust Region Policy Optimization
- Please reload this page
- [25.05.08] Layers at Similar Depths Generate Similar Activations Across LLM Architectures
- Please reload this page
- [25.05.08] Proximal Policy Optimization
- Please reload this page
- [25.05.12] All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine‐Tuning
- Please reload this page
- [25.05.15] Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
- Please reload this page
- [25.05.17] Mathematical discoveries from program search with large language models
- Please reload this page
- [25.05.19] AlphaEvolve: A coding agent for scientific and algorithmic discovery
- Please reload this page
- [25.05.26] Continuous Thought Machines
- Please reload this page
- [25.05.29] Group Normalization
- Please reload this page
- [25.05.31] Adam: A Method for Stochastic Optimization
- Please reload this page
- [25.06.02] LORA: LOW‐RANK ADAPTATION OF LARGE LANGUAGE MODELS
- Please reload this page
- [25.06.07] End‐to‐End Object Detection with Transformers
- Please reload this page
- [25.06.09] A Mathematical Theory of Communication
- Please reload this page
- [25.06.12] A Mathematical Theory of Communication
- Please reload this page
- [25.06.16] A Mathematical Theory of Communication Part III
- Please reload this page
- [25.06.19] A Mathematical Theory of Communication Part IV ~ V
- Please reload this page
- [25.06.21] Kullback‐Leibler Divergence
- Please reload this page
- [25.06.26] Denoising Diffusion Probabilistic Models
- Please reload this page
- [25.06.28] Non‐Cooperative Games
- Please reload this page
- [25.06.30] Bitcoin: A Peer‐to‐Peer Electronic Cash System
- Please reload this page
- [25.06.30] Denoising Diffusion Implicit Models (DDIM)
- Please reload this page
- [25.07.03] Deep Double Descent: Where Bigger Models and More Data Hurt
- Please reload this page
- [25.07.03] Score‐Based Generative Modeling through Stochastic Differential Equations
- Please reload this page
- Prompt Template
- Please reload this page