Sacred Sentiments - minalee-research/cs257-students GitHub Wiki

#Application #Method #Analysis

Cayde Bruce

Code: https://github.com/caydebruce/bible_analysis

Abstract

Bible translations differ in wording, which can subtly shape sentiment and tone, but there hasn’t been much research quantifying these differences. This project uses sentiment analysis to compare emotional trends across translations, looking at how shifts in language impact the way meaning is perceived. By combining lexicon-based and deep-learning methods, I analyze sentiment polarity, distribution changes, and key theological terms. Early results suggest that modern translations tend to tone down emotionally intense language, shifting sentiment distributions compared to older versions like the KJV. This project aims to shed light on how the language of religious texts evolves, even when they’re often seen as unchanging and to explore the challenges of applying NLP to texts with cultural and theological significance. Culminating with attempting to train a model to predict which translation a specific block of text comes from showing that each translation is different enough to be classified.

Project Motivation

What this project is about

Religious texts shape how people think about culture and theology, but the way they’re translated can subtly change their meaning—sometimes just through word choice and tone. This project looks at how sentiment shifts across different Bible translations, measuring the linguistic and emotional differences between older versions like the King James Version (KJV) and newer ones like the World English Bible (WEB). These shifts matter because translation choices can affect how readers interpret the text, what stands out emotionally, and even which theological ideas get emphasized.

The goal here is to use sentiment analysis to see how the overall emotional tone, whether positive, negative, or neutral, varies across translations. I’ll explore whether modern versions tend to soften emotionally intense language and track how key theological terms like “wrath,” “grace,” “sin,” and “salvation” change in sentiment over time. This study focuses on broad trends across multiple English Bible translations rather than debating theological accuracy.

To do this, I’ll use both lexicon based sentiment models (like VADER and SentiWordNet) and deep learning approaches (DistilBERT and fine-tuned DistilBERT). The dataset comes from publicly available Bible texts via Project Gutenberg, Open Bible APIs, and other sources. The analysis involves comparing sentiment scores for aligned verses, looking at how sentiment distributions shift across translations, and measuring sentiment polarity agreement (SPA). I’ll also do some deep dives into key theological passages to get a richer understanding of the trends. Additionally, I will create a model to try and predict what translation a verse is from given random verse from a translation.

Early results suggest that modern translations tend to soften negative sentiment in certain passages, reflecting broader linguistic and cultural shifts. By blending quantitative and qualitative analysis, this project sheds light on how language in religious texts evolves and highlights the challenges of applying NLP to texts with deep cultural and theological weight.

Progress made so far

Unexpected Challenges

As it turns out, most translations of the bible are copyrighted making gathering their data difficult and/or unethical.
The bible, while large, actually isn't a ton of data (at least in the realm of NLP) leading to potential under-training.
Few-Shot sentiment analysis: I could not get my Open-AI API key to work, so I opted to use the TogetherAI APIs.

Experiment Progress

Data procurement: I gathered a great dataset from kaggle with 6 different translation of the Bible that are all covered under free use.
Sentiment Exploration with Lexicon Based Methods: I have successfully utilize multiple lexicon based methods to explore the dataset and show general trends throughout the years on the sentiment of various parts of the bible. I focused on three distinct lists of books within each translation—Old Testament, New Testament, and theologically relevant books excluding the gospels ("Genesis", "Psalms", "Isaiah", "John", "Romans", "Revelation") as these make up the majority of theologically relevant teachings throughout various faiths that rely on the Bible.
Sentiment Analysis with Pre-Trained Models: Using pretrained word2vec embeddings, I waas able to track the semantic drift of a few key biblical terms across the years of translation. Interestingly, the 'path' that the words take seems to be nearly identical regardless of the word in question. This is expected because all translations share a common theological and linguistic structure, meaning that word relationships are adjusted in a globally consistent manner rather than in isolated shifts. Since t-SNE preserves local structures, it captures this uniform transformation, reinforcing that translations maintain a cohesive semantic framework rather than diverging unpredictably. It shows a consistent translation methodology which, while unsurprising, I find extremely interesting.
Few-Shot sentiment Analysis: The differences between zero-shot and few-shot learning became increasingly clear. Initially, it seemed that zero-shot classification might be sufficient for sentiment analysis, but as more ambiguous verses were tested, its limitations—such as incomplete responses and misinterpretations—became evident. With the introduction of few-shot examples, the model's accuracy and consistency improved significantly. This led to a full analysis of the sentiment of the key theological books.
Translation Classification: To gain additional insights, I developed an LSTM classifier to distinguish between different Bible translations. Through experimentation, I discovered that t_kjv accounted for the majority of classification errors, primarily being misclassified as t_wbt or t_asv. Removing t_kjv improved the classifier’s performance by 17%, revealing an unexpected but intriguing insight: these three translations follow strikingly similar linguistic patterns. Further refinement (removing t_wbt) led to another 9% improvement, reinforcing the idea that these translations have undergone minimal change over time. While this wasn’t my original goal, it’s fascinating that these experiments still uncovered strong linguistic patterns across translations.

Approach

This project applies sentiment analysis and LLMs to quantify emotional tone differences across Bible translations. I use lexicon-based models, fine-tuned transformer models, and foundation LLMs to analyze sentiment polarity (positive, negative, neutral) and emotional intensity. The approach consists of the following steps:

Data Collection & Preprocessing

I compile multiple Bible translations from Project Gutenberg and a Kaggle data set: https://www.kaggle.com/datasets/oswinrh/bible.
Verses are aligned across translations to ensure direct comparisons.
The text is tokenized, lemmatized, and preprocessed to remove punctuation and special characters (replacing archaic language in some cases).

Sentiment Analysis Methods

Lexicon-Based Models: I use VADER and SentiWordNet, which assign sentiment scores based on predefined dictionaries.
Pretrained Models: I use DistilBERT on religious text sentiment datasets to find a sentiment score.
Trained Models: I use few-shot learning on a Mistral (7B) Instruct v0.2 to extract sentiment scores and analyze what the model thought about verses with theological nuances.
Sentiment scores are computed at the verse level and aggregated across chapters and books.

Evaluation Metrics

Sentiment Polarity Agreement (SPA): Measures how often different translations assign the same sentiment polarity.
Precision (more on this later).
F-1 Score (more on this later).
Accuracy (more on this later).
Qualitative Analysis for zero-shot vs few shot.

Baselines

Standard sentiment analysis models (VADER, SentiWordNet) as baselines for lexicon-based methods.
Unsupervised embeddings (word2vec) to track semantic drift in theological terms.
Results from "Sermon on the mount sentiment analysis" Paper.

Novelty

While sentiment analysis has been applied to the Bible in previous studies (specifically the sermon on the mount passages), I want to focus on the general semantic drift of certain theologically relevant words to see if there is any sort of trend in sentiment over time.

Experiments

Lexicon Based Experiments

Experiments with lexicon based analysis indicates that there is a trend in positive sentiment as each new translation comes out. There are exceptions. However, These baselines will provide good data for the rest of the analysis. To reduce error from archaic language, I have created a set of words that get replaced to their modern counterparts before preprocessing. I used two sentiment frameworks (VADER and SentiWordNet) in order to get better baseline understandings.

VADER	SentiWordNet

Using our favorite library (Word2Vec) I also trained a Word2Vec model on all the translation to get their closest words in the vector space.

Semantic Similarity of Biblical Keywords Across Translations

This table presents the top five most semantically similar words to specific keywords in different Bible translations. Each translation is labeled, followed by a table containing the related words and their similarity scores.

King James Version (KJV)

Keyword	Similar Word 1	Score	Similar Word 2	Score	Similar Word 3	Score	Similar Word 4	Score	Similar Word 5	Score
God	Mercy	0.947	Liveth	0.923	Glory	0.923	Bless	0.918	Truth	0.918
Sin	Trespass	0.915	Atonement	0.905	Peace	0.893	Accept	0.892	Sacrifice	0.886
Mercy	Grace	0.965	Salvation	0.964	Bless	0.962	Righteousness	0.957	Truth	0.956
Grace	Truth	0.982	Sinned	0.982	Faith	0.981	Liveth	0.973	Witness	0.972
Salvation	Blessing	0.980	Prayer	0.975	Bless	0.975	Grace	0.972	Truth	0.969
Jesus	Balak	0.929	Samuel	0.928	Prophet	0.928	Cried	0.919	Rabshakeh	0.915
Wrath	Fury	0.958	Shine	0.953	Lighten	0.952	Dread	0.951	Destroy	0.948
Hope	Seeing	0.990	Honour	0.988	Pleasure	0.987	Pardon	0.987	Forget	0.985
Anger	Kindled	0.953	Wrath	0.946	Power	0.942	Hands	0.941	Provoke	0.941
Heaven	Heavens	0.895	Earth	0.879	Footstool	0.810	Anger	0.806	Face	0.804
Death	Cause	0.933	Hold	0.924	Trust	0.915	Wicked	0.913	Lest	0.912

American Standard Version (ASV)

Keyword	Similar Word 1	Score	Similar Word 2	Score	Similar Word 3	Score	Similar Word 4	Score	Similar Word 5	Score
God	Glory	0.945	Lovingkindness	0.936	Salvation	0.932	Bless	0.930	Righteousness	0.922
Sin	Iniquity	0.951	Forgive	0.948	Wisdom	0.946	Righteous	0.945	Righteousness	0.943
Mercy	Salvation	0.974	Glad	0.972	Gracious	0.970	Favor	0.968	Truth	0.967
Grace	Loved	0.991	Dealt	0.986	Oath	0.985	Hardened	0.985	Faithful	0.985
Salvation	Mercy	0.974	Swear	0.973	Giveth	0.972	Righteousness	0.969	Establish	0.969
Jesus	Answered	0.952	Balak	0.936	Told	0.934	Asked	0.927	Prophet	0.926
Wrath	Anger	0.954	Darkness	0.934	Fierceness	0.932	Destroy	0.928	Dread	0.926
Hope	Reward	0.994	Seeing	0.992	Almighty	0.991	Delight	0.990	Perfect	0.989
Anger	Provoke	0.972	Kindled	0.970	Wrath	0.954	Power	0.949	Fierceness	0.942
Heaven	Heavens	0.898	Earth	0.883	Face	0.858	Created	0.826	Power	0.824
Death	Shame	0.972	Hold	0.938	Turn	0.922	Surely	0.909	Wicked	0.907

Bible in Basic English (BBE)

Keyword	Similar Word 1	Score	Similar Word 2	Score	Similar Word 3	Score	Similar Word 4	Score	Similar Word 5	Score
God	Grace	0.922	Saviour	0.901	Salvation	0.895	Asaphgt	0.887	Everything	0.887
Sin	Sins	0.937	Wrongdoing	0.934	Forgiveness	0.920	Evildoing	0.879	Pleasure	0.858
Mercy	Salvation	0.971	Hope	0.952	Grace	0.943	Righteousness	0.942	Faith	0.937
Grace	Saviour	0.964	Salvation	0.947	Mercy	0.943	Hope	0.940	Davidgt	0.934
Salvation	Mercy	0.971	Hope	0.959	Grace	0.947	Saviour	0.945	Today	0.937
Jesus	Christ	0.915	Samuel	0.886	Servant	0.885	Isaiah	0.876	David	0.865
Wrath	Passion	0.882	Haters	0.854	Moving	0.854	Heat	0.845	Moved	0.842
Hope	World	0.972	Salvation	0.959	Saviour	0.959	Righteousness	0.956	Mercy	0.952
Anger	Debtor	0.471	Made	0.464	Sin	0.462	Breathingspace	0.461	Chaldaea	0.453
Heaven	Heavens	0.924	Earth	0.859	Glory	0.854	Lifted	0.836	Light	0.835
Death	Sword	0.875	Cross	0.768	Dead	0.745	End	0.724	Body	0.717

Young's Literal Translation (YLT)

Keyword	Similar Word 1	Score	Similar Word 2	Score	Similar Word 3	Score	Similar Word 4	Score	Similar Word 5	Score
God	Kindness	0.951	Age	0.943	Salvation	0.940	Blessed	0.935	Truth	0.931
Sin	Judgment	0.975	Righteousness	0.961	Walk	0.954	Judge	0.952	Iniquity	0.947
Mercy	Laying	0.990	Restrained	0.988	Planting	0.988	Setting	0.987	Delivereth	0.986
Grace	Ear	0.965	Fear	0.958	Righteousness	0.957	Understanding	0.956	Truth	0.954
Salvation	Truth	0.969	Confess	0.968	Thank	0.965	Honour	0.964	Harden	0.964
Jesus	Answering	0.959	Answered	0.950	Therefore	0.949	Answer	0.943	Lord	0.936
Wrath	Fury	0.984	Fierceness	0.975	Burned	0.972	Provoking	0.971	Arm	0.969
Hope	Habitually	0.994	Manifested	0.991	Profit	0.991	Happy	0.991	Always	0.990
Anger	Fury	0.961	Wrath	0.950	Fierceness	0.930	Adversaries	0.928	Retained	0.915
Heaven	Boweth	0.938	Noise	0.933	Dagon	0.918	Refreshed	0.916	Trance	0.915
Death	Shame	0.835	Require	0.804	Ranges	0.800	Certainly	0.790	Riddle	0.783

Webster's Bible Translation (WBT)

Keyword	Similar Word 1	Score	Similar Word 2	Score	Similar Word 3	Score	Similar Word 4	Score	Similar Word 5	Score
God	Mercy	0.974	Liveth	0.945	Bless	0.940	Truth	0.936	Salvation	0.934
Sin	Iniquity	0.977	Forgive	0.961	Righteous	0.955	Ways	0.951	Whatever	0.951
Mercy	God	0.974	Liveth	0.974	Truth	0.971	Bless	0.966	Grace	0.963
Grace	Faith	0.989	Shown	0.987	Sinned	0.987	Promised	0.984	Honor	0.983
Salvation	Prayer	0.983	Swear	0.982	Help	0.980	Curse	0.976	Judgment	0.976
Jesus	Strictly	0.938	Prophet	0.930	Isaiah	0.929	Furthermore	0.928	Spoken	0.928
Wrath	Destroy	0.975	Shine	0.971	Lift	0.970	Strong	0.969	Fury	0.968
Hope	Wickedly	0.991	Seeing	0.991	Lovingkindness	0.990	Delight	0.990	Preserve	0.989
Anger	Provoke	0.982	Execute	0.967	Established	0.963	Kindled	0.961	Forsaken	0.959
Heaven	Heavens	0.908	Earth	0.901	Face	0.890	Bush	0.845	Power	0.805
Death	Hold	0.937	Enemies	0.928	Trust	0.918	Wicked	0.918	Wrath	0.917

World English Bible (WEB)

Keyword	Similar Word 1	Score	Similar Word 2	Score	Similar Word 3	Score	Similar Word 4	Score	Similar Word 5	Score
God	GT	0.949	Praise	0.948	Jealous	0.947	Kindness	0.943	Righteousness	0.943
Sin	Accept	0.958	Peace	0.935	Forgiven	0.926	Sacrifice	0.920	Trespass	0.920
Mercy	Grace	0.995	Established	0.992	Faith	0.992	Oath	0.992	Gracious	0.991
Grace	Mercy	0.995	Faith	0.992	Oath	0.991	Promised	0.990	True	0.989
Salvation	Justice	0.989	Bless	0.989	Delight	0.985	Forsake	0.985	Declare	0.984
Jesus	Balak	0.971	Isaiah	0.958	Prophet	0.957	Balaam	0.954	Answered	0.953
Wrath	Kindled	0.960	Light	0.958	Fierce	0.958	Darkness	0.956	Anger	0.952
Hope	Keeps	0.995	Witness	0.994	Merciful	0.994	Diligently	0.993	Triumph	0.993
Anger	Kindled	0.991	Provoke	0.979	Provoked	0.955	Worship	0.953	Wrath	0.952
Heaven	Prayer	0.939	Confession	0.939	Loving	0.935	Glory	0.933	GT	0.933
Death	Enemies	0.933	Hold	0.933	Wicked	0.922	Avenger	0.918	Kill	0.912

Observations

Looking at different Bible translations, you can see how theological emphasis shifts over time. Take "God," for example—in the King James Version (KJV) and Webster’s Bible Translation (WBT), He’s all about "Mercy" and "Truth," a kind and benevolent figure. But in the World English Bible (WEB), "Jealous" shows up, making God seem more possessive and demanding of exclusive worship. Similarly, "Sin" changes depending on the translation. KJV and the American Standard Version (ASV) focus on redemption, linking sin to "Trespass" and "Forgive," while Young’s Literal Translation (YLT) leans into a stricter, more legalistic view, tying sin to "Judgment" and "Righteousness."

Other words show just as much variation. "Jesus" in BBE is associated with "Christ," "Servant," and "David," potentially emphasizing his percieved role as a messianic king. But in YLT, he’s more of a teacher or authority figure, with words like "Answering" and "Lord." "Wrath" is another interesting one. The KJV and YLT go all in on divine destruction, with words like "Fury" and "Destroy," while BBE makes it sound more emotional, associating it with "Passion" and "Heat." Even "Hope" has varying results. In ASV, it’s tied to "Reward" and "Almighty," making it seem like a divine promise, while BBE links it to "World" and "Saviour," making it feel more immediate and practical. One of the weirdest shifts is "Anger". In BBE is oddly connected to "Debtor" and "Made," almost like it's part of a transaction rather than just an emotional state.

Pre-trained Model Based Experiement

Sentiment Results with DistilBERT

Key Theological Books

Old Testament

New Testament

We consistently see a positive sentiment trend across the Old Testament, Old Testament, and Key Theological Books.

Trained Model Experiment

We train a model on the each translation and show the semantic drift amongst related words, utilizing t-SNE to visualize the trends. This gives us a really cool visualization on how certain words are viewed within each translation.

Takeaways

Words like grace, God, Jesus, sin, and salvation shift in their associations depending on the translation, reflecting theological, linguistic, and cultural changes. For instance, grace in older translations like Young’s Literal Translation (YLT 1862) aligns more with sanctification and atonement, whereas in modern translations like Basic Bible English (BBE 1965), it leans toward kindness and blessing—suggesting a shift from a theological to a more personal, moral interpretation.

Similarly, God appears with distinct associations across translations. The King James Version (KJV 1611) maintains a strong connection to Jehovah, Elohim, and Yahweh, emphasizing proper names and covenantal aspects of God. Meanwhile, YLT 1862 places God near abstract virtues like love, peace, and faith, indicating a more philosophical framing. This pattern suggests that earlier translations leaned heavily on biblical names and titles, while later versions moved toward broader, universal descriptions (although WEB moves back [More on that at the end]). The same phenomenon appears with Jesus, who in older translations is tightly linked to Savior and Messiah, but in modern translations is associated with miracles and divine perfection, focusing more on his actions than doctrinal identity.

Words with moral and doctrinal weight, like sin and salvation, also show interesting shifts. In KJV 1611, sin clusters around righteousness and wickedness, showing a moralistic duality. Then in BBE 1965 emphasizes guilt, punishment, and shame, making it more of a personal, psychological burden. Salvation follows a similar path. YLT 1862 places it among baptism, sanctification, and atonement, suggesting a structured theological process, while KJV 1611 frames it around faith and conversion, indicating an individual spiritual experience. In contrast, BBE 1965 links salvation with resurrection and life, making it more about ultimate destiny than immediate faith.

These shifts suggest that translations not only update language but also reflect changes in religious thought, cultural attitudes, and doctrinal emphasis. Older translations tend to maintain strict theological structures, often emphasizing law, judgment, and covenantal frameworks. In contrast, modern translations move toward interpretations that are more personal, experiential, and emotionally resonant. This being said the World English Bible (WEB 1994) seems to be an odd outlier in many of these observations, often positioning itself between the old and new translations which is interesting because it is the newest translation.

Zero-shot and Few-Shot analysis

For this experiment, I use Mistral-7B-Instruct-v0.2 with two prompts seen here.

Zero-Shot Classify the sentiment of the following text. Limit your response to just Positive, Neutral, or Negative: {text}

Few-Shot `Classify the sentiment of the following text. Limit your response to just Positive, Neutral, or Negative

Examples:

1 "The Lord is my shepherd, I shall not want." → Positive
2 "Blessed are the meek, for they shall inherit the earth." → Positive
3 "Fear not, for I am with you; be not dismayed, for I am your God." → Positive
4 "The Lord is my light and my salvation—whom shall I fear?" → Positive
5 "For God so loved the world, that he gave his only begotten Son." → Positive
6 "The joy of the Lord is your strength." → Positive
7 "I can do all things through Christ who strengthens me." → Positive
8 "Be strong and courageous; do not be afraid, for the Lord your God goes with you." → Positive
9 "My God, my God, why have you forsaken me?" → Negative
10 "Woe to the rebellious children, saith the Lord, that take counsel, but not of me." → Negative
11 "The wages of sin is death, but the gift of God is eternal life through Jesus Christ our Lord." → Nuetral
12 "And Judas went and hanged himself." → Negative
13 "For many are called, but few are chosen." → Negative
14 "You are of your father the devil, and your will is to do your father’s desires." → Negative
15 "Cursed is the man who trusts in man and makes flesh his strength." → Negative
16 "The heart is deceitful above all things, and desperately wicked: who can know it?" → Negative
17 "Now these are the names of the sons of Israel, who came into Egypt with Jacob." → Neutral
18 "This is the book of the generations of Adam." → Neutral
19 "In the beginning, God created the heavens and the earth." → Neutral
20 "And Noah was six hundred years old when the flood of waters was upon the earth." → Neutral
21 "Jesus went up to Jerusalem for the Feast of the Passover." → Neutral
22 "Paul, an apostle of Jesus Christ, by the will of God, to the saints in Ephesus." → Neutral
23 "And the Lord spoke to Moses, saying, 'Command the Israelites to bring you clear oil for the lampstand.'" → Neutral

Now analyze this text:
"{text}" → `

These were qualitative manual classifications. I tried my best to get a good spread of negative vs positive and interesting verses. Below are teh results from 10 test verses. (TODO: Find a better way to test this)

Verse: "And whatever you ask in my name, I will do, so that the Father may be glorified in the Son. If you ask anything of me in my name, I will do it"

Zero-Shot Prediction: .
Few-Shot Prediction: Positive

Verse: "The Lord gave, and the Lord has taken away; blessed be the name of the Lord."

Zero-Shot Prediction: Answer: Positive Ex
Few-Shot Prediction: Positive

Verse: "To everything there is a season... A time to be born, and a time to die."

Zero-Shot Prediction: A time to plant, and a time to re
Few-Shot Prediction: Neutral

Verse: "Do not think that I have come to bring peace to the earth. I have not come to bring peace, but a sword."

Zero-Shot Prediction: For I have come to turn a man against his
Few-Shot Prediction: Negative

Verse: "Faithful are the wounds of a friend, but deceitful are the kisses of an enemy."

Zero-Shot Prediction: Negative ### 10
Few-Shot Prediction: Positive

Verse: "For when I am weak, then I am strong."

Zero-Shot Prediction: Positive
Few-Shot Prediction: Positive

Verse: "The greatest among you shall be your servant. Whoever exalts himself will be humbled, and whoever humbles himself will be exalted."

Zero-Shot Prediction: Answer: Neutral
Few-Shot Prediction: Positive

Verse: "Blessed shall he be who takes your little ones and dashes them against the rock!"

Zero-Shot Prediction: Negative
Few-Shot Prediction: Negative

Verse: "And we know that all things work together for good to those who love God."

Zero-Shot Prediction: Positive
Few-Shot Prediction: Positive

Verse: "If anyone comes to me and does not hate his father and mother, wife and children, brothers and sisters—yes, even their own life—such a person cannot be my disciple."

Zero-Shot Prediction: Negative
Few-Shot Prediction: Negative

Verse: "Consider it pure joy, my brothers and sisters, whenever you face trials of many kinds."

Zero-Shot Prediction: James 1:2 Neutral
Few-Shot Prediction: Positive

Insights

This experiment highlighted some key differences between zero-shot and few-shot sentiment analysis, particularly when dealing with nuanced and ambiguous biblical texts. One major takeaway was that few-shot learning provided much more consistent and accurate classifications. By giving the model some examples before asking it to classify a verse, it was able to focus on the task rather than attempting to complete or reinterpret the text. This was evident in cases where zero-shot responses were cut off or included unrelated content, like when it attempted to continue the verse instead of simply labeling it. For example, in "To everything there is a season... A time to be born, and a time to die," the zero-shot model trailed off, whereas the few-shot approach correctly labeled it as neutral.

Ambiguous verses were particularly challenging, with some classifications varying between zero-shot and few-shot. Take Proverbs 27:6—"Faithful are the wounds of a friend, but deceitful are the kisses of an enemy." The zero-shot model classified it as negative, likely focusing on the words “wounds” and “deceitful,” whereas the few-shot model recognized the deeper meaning and labeled it as positive. This suggests that without examples to guide interpretation, the model struggles to capture underlying sentiment in paradoxical or context-heavy statements. However, when sentiment was more extreme, such as in "Blessed shall he be who takes your little ones and dashes them against the rock!", both approaches accurately classified it as negative.

Ultimately, few-shot learning proved far more reliable, particularly for complex or nuanced religious texts. The additional context helped the model avoid misinterpretations and focus on classification rather than text completion. While zero-shot learning still worked for clearer statements, it often stumbled on verses that required deeper contextual understanding. A logical next step would be testing across different Bible translations or even fine-tuning a model specifically for religious texts to see if accuracy can be further improved.

Semantic analysis using Few-Shot

Due to time/compute constraints, I limited the analysis to 200 random verses per book (keeping the indices the same across translations), calculating their mean, then plotting again on the X-axis by year. Here are the results:

Although t_ylt seems to be more negative that the other older translations, the overall trend is still positive. This aligns with the rest of the results so far.

Predictive Model Experiment: Part 1

I wanted to see if it was possible to predict which translation a particular verse was from by using a trained neural network. Due to the similarity in semantics between versions. However, I think this is still worth pursuing.

I chose to use tensorflow's keras for creating a model to classify Bible verses by translation due to it's ease of use and performance on CPUs (I didn't want to pay for Google Colab compute credits). It provides built-in tools for text tokenization, word embeddings, and deep learning architectures, making it well-suited for text classification. Here is a table of the hyperparameters:

Results of best performer (simplest model)

Consistent with the lexicon based analysis, t_bbe (Bible in Basic English) is the easiest to predict and t_kjv (King James Version) is the hardest for the model to predict. Due to the archaic language in the t_kjv, I would have anticipated the accuracy to be a lot higher. I think this warrant further exploration.

1. Precision

Precision measures how many predicted instances for a class were actually correct.

$$ \text{Precision} = \frac{\text{True Positives (TP)}}{\text{True Positives (TP)} + \text{False Positives (FP)}} $$

Example: If the model predicts a verse as t_bbe, precision tells us how often that prediction is correct.

2. Recall (Sensitivity)

Recall measures how many actual instances of a class were correctly identified.

$$ \text{Recall} = \frac{\text{True Positives (TP)}}{\text{True Positives (TP)} + \text{False Negatives (FN)}} $$

Example: If there are many t_ylt verses, recall tells us how many were correctly classified.

3. F1-Score

The F1-score is the harmonic mean of precision and recall, balancing both metrics.

$$ F1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} $$

Example: If a translation has high precision but low recall, or vice versa, the F1-score reflects this balance.

4. Support

Support refers to the*number of actual instances in each class.

Example: Each translation has around 6,220 examples, meaning we have a balanced dataset.

Model Performance on Bible Translations

Translation	Precision	Recall	F1-Score	Support
t_wbt	0.51	0.51	0.51	6220
t_asv	0.50	0.43	0.46	6221
t_web	0.76	0.81	0.79	6220
t_ylt	0.81	0.83	0.82	6221
t_bbe	0.94	0.95	0.95	6221
t_kjv	0.44	0.46	0.45	6221

Takeaways

t_bbe (Bible in Basic English) has the best performance with an F1-score of 0.95, meaning it is highly accurate in classification.
t_ylt (Young’s Literal Translation) and t_web (World English Bible) also perform well, with F1-scores of 0.82 and 0.79.
t_kjv (King James Version) and t_asv (American Standard Version) perform the worst, with low precision, recall, and F1-scores (~0.45-0.46). This suggests the model struggles to distinguish these translations.
Overall Accuracy is 66%, meaning the model classifies about two-thirds of verses correctly. Again, underwhelming, but not totally surprising. This failure to correctly distinguish Bible verses from certain translations could be seen as evidence that they are good at keeping with the original meaning of the text and resisting semantic drift.

Predictive Model Experiment: Part 2

Taking what I had learned from part 1, I wanted to see how high the accuracy could get between different Bible translations. Thinking that there might have been problems I was unaware of with keras, I went ahead and defined an LSTMClassifier that was similar to the ones we saw in the homework. I also thought about they way I was handling the data. Since the Bible is made up of many small verses, the odds of some verses being identical or nearly identical is pretty high. This would result in the model simply guessing between translations when there would be multiple correct answers. To mitigate this, I added two steps to preprocessing. I also decided to focus on confusion matrices in order to better understand where the model might be making mistakes.

1. Remove perfect duplicates of verses if they are shared between two or more translations.
1. Created a function that takes an $n$ and if the number of words in a verse is $< n$ I append the next verse to the previous until the verse is $> n$ words long.

For the next 3 experiements I used an LSTM with the following hyperparameters:

Hyperparameter	Description	Value
`vocab_size`	Size of the vocabulary (number of unique tokens)	55163
`num_classes`	Number of output classes (books)	6
`padding_idx`	Index used for padding in the embedding layer	`0`
`input_size`	Dimensionality of input features to LSTM	128
`hidden_size`	Number of features in the hidden state of LSTM	256
`num_layers`	Number of LSTM layers	`1` (default)
`batch_first`	Whether batch is the first dimension in input tensor	`True`
`enforce_sorted`	Whether sequences need to be sorted by length	`False`

5 training epochs - no extra data cleanup:

10 epochs - removing duplicate verses - merging verses if number of tokens $> 20$:

Test loss: 0.8480, Test accuracy: 0.6438

After analyzing the results, I noticed a clear pattern: the t_kjv translation was causing significant issues for the classifier, accounting for most of the errors. The model frequently misclassified t_kjv as either t_wbt or t_asv nearly two-thirds of the time. Given this, I decided to remove t_kjv from the experiment to better evaluate the classifier's performance under optimized conditions. This decision was made to push the model's capabilities and observe its potential without the confounding factor of t_kjv.

Removing KJV with 10 epochs - removing duplicate verses - merging verses if number of tokens $> 20$:

Test loss: 0.8480, Test accuracy: 0.8194

This led to an impressive 17% improvement! As shown in the confusion matrix, t_wbt was responsible for the next largest set of errors, primarily due to misclassification as t_asv. To maintain a consistent methodology, I proceeded with the next experiment by removing t_wbt as well.

Removing KJV and WBT with 10 epochs - removing duplicate verses - merging verses if number of tokens $> 20$:

Test loss: 0.5215, Test accuracy: 0.9002

Another 9% improvement! What’s particularly interesting is that, across this entire analysis, t_kjv, t_wbt, and t_asv consistently follow similar trends, regardless of the method used. This, combined with the classifier experiment, suggests that these translations are the most similar to each other and have undergone the least change over the centuries. And while this wasn’t my original goal, it’s fascinating that these experiments still revealed such strong patterns in the data.

Final Takeaways

This project set out to quantify sentiment differences across Bible translations but ended up uncovering deeper patterns in linguistic consistency and semantic drift.

Sentiment Trends Reflect Linguistic and Cultural Shifts: Modern translations tend to soften emotionally intense language, showing a shift toward more positive sentiment. This suggests that translation choices reflect broader cultural and theological trends while maintaining core messages.
Semantic Drift is Structured, Not Random: Key theological terms shift in meaning across translations, but they do so in a systematic way. Instead of independent changes, translations maintain a globally consistent structure, reinforcing linguistic continuity even as language evolves.
Few-Shot Learning Improves Sentiment Classification: Zero-shot sentiment classification struggled with theological nuance, often producing incomplete or ambiguous results. Few-shot learning significantly improved accuracy, suggesting that structured guidance is crucial for NLP models in religious contexts.
Some Translations Are Nearly Indistinguishable: The LSTM classifier revealed that t_kjv, t_wbt, and t_asv followed strikingly similar trends, making them difficult to differentiate. Removing them improved classification accuracy by 17% and then 9%, suggesting minimal linguistic change over time.
NLP Faces Challenges in Religious Texts: While sentiment analysis and classification models are powerful, they struggle with theological nuance. Some translations are more semantically similar than expected, highlighting the difficulty of defining linguistic "difference" in sacred texts.

Conclusion: While not the original goal, this study revealed strong patterns in translation stability and linguistic evolution. The findings suggest that even sacred texts, often seen as unchanging, subtly adapt over time. Future work could explore fine-tuning models for religious texts, analyzing additional translations, or expanding to multilingual comparisons.