Stock News Sentiment Analyzer - margaret-oberc/StockNewsSentimentAnalyzer GitHub Wiki
This Python project analyzes stock news articles to provide sentiment scores and categorization of financial reports. It automates the process of fetching news for a list of stock tickers, running sentiment analysis using a fine-tuned OpenAI model, and storing the results in a MySQL database for future analysis.
The approach is similar to the one described in Sentiment-Enhanced Stock Price Prediction: A Novel Ensemble Model Approach paper, however different data sources and different LLM model are used. In addition, regression models are used to verify if sentiment scores computed in this method do have impact on stock prices.
Features
- Stock News Fetching: The program fetch news summary from Yahoo RSS feed for a predefined set of stock tickers. In our case stocks we are interested in are traded on TSX.
The limitation of Yahoo RSS feed is that it returns only up to 200 latest news, and it contains only news summary. For the full news text, one would have to follow the link attached to the news. - Fine-Tuned Sentiment Analysis: Utilizes a fine-tuned OpenAI GPT model to analyze the sentiment of each news article and classify it as positive, neutral, or negative. The model has been fine-tuned on the FinancialPhraseBank-v1.0 dataset to specialize in financial sentiment analysis.
- Financial Statement Detection: Distinguishes whether an article is a general news story or a financial statement (e.g., quarterly or annual reports).
- Database Storage: Automatically inserts the sentiment analysis results, including sentiment score, article type, and comments, into a MySQL database.
- Efficient Duplicate Checking: Ensures no duplicate articles are processed by checking if the article already exists in the database by its unique identifier (UUID).
Fine-Tuning with FinancialPhraseBank
Dataset: FinancialPhraseBank-v1.0
This dataset has been used to train FinBERT. Fine tuned model was more accurate than standard OpenAI model – it’s covered in the analysis section.
The sentiment analysis model in this project has been fine-tuned using the FinancialPhraseBank-v1.0 dataset, a collection of financial phrases labeled with sentiment. The dataset provides a curated set of financial texts, making it an excellent resource for training models that handle sentiment analysis in the financial domain.
- Dataset Size: 4,845 phrases from financial news articles.
- Sentiment Labels: Positive, Neutral, Negative
- gpt-4o-mini was fine-tuned using supervised learning, with the FinancialPhraseBank-v1.0 dataset acting as the primary labeled data.
Stock News Sentiment Analyzer Test
This Python script integrates stock price data with news sentiment analysis to explore the relationship between sentiment from news articles and stock price movements. It connects to a MySQL database to:
- Fetch stock price data for specific stock symbols. Prerequisite – load stock prices to the stock_price table. Utility load_stock_prices.py. Here we were interested in Canadian stocks so the prices refer to the Toronto Stock Exchange.
- Retrieve sentiment scores from news articles related to those stocks. Sentiment scores are aggregated by trading day. Two features are used: min_score and max_score. If there was a negative news ona given day, min_score = -1. For a positive news max_score = 1. News of type ‘financial statement’ and news with neutral rating are filtered out. LLM was not very effective evaluating financial reports for stock price impact. Our news source usually contains financial report analysis which is easier for LLM to evaluate.
- Calculate price changes over a given time period by analyzing adjusted closing prices.
Price change was calculated as percentage change over the previous closing price:
The same way market return was computed using S&P/TSX Composite index (^GSPTSE). 4. Compare stock price movements with the sentiment scores to identify correlations or trends. Ordinary Least Squared regression mode was fitted:
Each stock was analyzed separately. Overall regression fit was not very good, but the objective is not to predict priced but evaluate if sentiment scores contribute to price movements.
Here the news for the last year were used, usually about 100 records per stock. Example of regression analysis for BMO:
Please note for BMO stock min_score is statistically significant (p-value is less than 0.05). For each day with a negative news, the price is expected to decrease by 1.1%, holding other variables constant. It’s interesting that negative news tends to have more significance on stock prices. The same was observed for other stocks. These tests should be rerun once more historical news is collected in our repository.