RAG Knowledge Base - IgorGanapolsky/trading GitHub Wiki
π§ RAG Knowledge Base
Last Updated: 2025-12-01 08:23 AM ET Auto-Updated: Daily via GitHub Actions
π Knowledge Base Overview
| Source | Records | Status | Last Update |
|---|---|---|---|
| Sentiment RAG | 10 tickers | β Active | 2025-11-09 |
| Berkshire Letters | 14 PDFs (4.15MB) | β Downloaded | 2010-2023 |
| Bogleheads Forum | 0 insights | β³ Pending data collection | Daily |
| YouTube Transcripts | 5 videos (100KB) | β Active | Daily |
| Reddit Sentiment | 3 files | β Active | Daily |
| News Sentiment | 2 files | β Active | Daily |
π― Sentiment by Ticker
| Ticker | Sentiment | Signal | Regime | Confidence |
|---|---|---|---|---|
| AMZN | π’ +64.0 | BULLISH | neutral | medium |
| NVDA | π’ +60.0 | BULLISH | neutral | high |
| QQQ | π‘ +41.0 | BULLISH | neutral | medium |
| SPY | π‘ +35.0 | BULLISH | neutral | high |
| AAPL | π‘ +35.0 | BULLISH | neutral | medium |
| GME | π‘ +28.0 | BULLISH | neutral | low |
| AMD | π‘ +23.0 | BULLISH | neutral | low |
| TSLA | βͺ +5.0 | NEUTRAL | neutral | medium |
| GOOGL | π -30.0 | BEARISH | neutral | medium |
| PLTR | π -34.0 | BEARISH | neutral | medium |
π Warren Buffett's Wisdom (Berkshire Letters)
Years Available: 2010-2023 Total Letters: 14 PDFs Total Size: 4.15 MB
Recent Letters
- π 2023 Annual Letter
- π 2022 Annual Letter
- π 2021 Annual Letter
- π 2020 Annual Letter
- π 2019 Annual Letter
How to Query Buffett's Wisdom
from src.rag.collectors.berkshire_collector import BerkshireLettersCollector
collector = BerkshireLettersCollector()
# Search for investment advice
results = collector.search("index funds vs stock picking")
# Get stock mentions
apple_wisdom = collector.get_stock_mentions("AAPL")
π£οΈ Bogleheads Forum Insights
Status: Pending data collection Total Insights: 0 Data Files: 0
Forums Monitored
- Personal Investments
- Investing - Theory, News & General
Topics Tracked
- Market timing, rebalancing, risk
- Diversification, asset allocation
- Index funds, ETFs (SPY, QQQ, VOO)
π¬ YouTube Financial Analysis
Transcripts Cached: 5 Videos Processed: 0 Total Size: 100 KB
Channels Monitored
- Parkev Tatevosian, CFA
- Joseph Carlson
- Let's Talk Money! with Joseph Hogue
- Financial Education
- Everything Money
π Data Collectors Status
| Collector | Source | Status |
|---|---|---|
| r/wallstreetbets, r/stocks, r/investing | β Installed | |
| Yahoo Finance | Yahoo Finance API | β Installed |
| Alpha Vantage | Alpha Vantage News API | β Installed |
| Seeking Alpha | Seeking Alpha RSS | β Installed |
| LinkedIn Posts API | β Installed | |
| TikTok | TikTok Research API | β Installed |
| Berkshire Letters | berkshirehathaway.com | β Installed |
π Data Storage Structure
data/
βββ rag/
β βββ sentiment_rag.db # SQLite: Ticker sentiment embeddings
β βββ sentiment.db # SQLite: Sentiment cache
β βββ berkshire_letters/
β β βββ raw/ # Original PDF files
β β βββ parsed/ # Extracted text
β βββ bogleheads/ # Forum insights JSON
β βββ chroma_db/ # ChromaDB vector store
β βββ vector_store/ # FAISS indices
βββ sentiment/
β βββ reddit_*.json # Daily Reddit sentiment
β βββ news_*.json # Daily news sentiment
βββ youtube_cache/
βββ *_transcript.txt # Video transcripts
π Data Flow
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Data Sources ββββββΆβ Collectors ββββββΆβ RAG Store β
β β β β β β
β β’ Reddit β β β’ Parse β β β’ Embeddings β
β β’ YouTube β β β’ Extract β β β’ Vector Index β
β β’ Seeking Alpha β β β’ Normalize β β β’ SQLite Cache β
β β’ LinkedIn β β β’ Score β β β
β β’ TikTok β β β β β
β β’ Bogleheads β β β β β
β β’ Berkshire β β β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Trading System β
β β
β β’ Unified β
β Sentiment β
β β’ Trade β
β Decisions β
βββββββββββββββββββ
π Quick Links
This page is automatically updated daily by GitHub Actions.