Define research questions - halfpintutopia/DA-deutsch-englisch-einfluss GitHub Wiki
What is the key research question?
Option 1.
Datasets
- Google Books Ngram Viewer (German dataset)
- Tracks work usage frequency in published books over centuries
- German Wikipedia Corpus
- Provides a dataset of commonly used words in modern German
- Europarl Corpus (EU Parliament Speeches in German)
- Offers a formal context to see if English influence exists in official speech
Approach
- Compare how frequently certain German words have changed (e.g. "Handy" for "mobile phone" vs. original equivalents)
- Do younger generations use more English than older ones?
- What topics (tech, business, pop culture) bring in the most English influence
Option 2.
Datasets
- DeReKo (German Reference Corpus)
- Provides historical and contemporary text samples to analyse the frequency of English loanwords
- DWDS (Digitales Wörterbuch der deutschen Sprache)
- Contains linguistic time-series data to track how certain words have entered and evolved in German
- Twitter/X (Scraped German Tweets)
- Allows a real-time, informal analysis of English usage in everyday German conversations
Approach
- Identify trends in the frequency of English words in German over time
- Compare formal (news articles) vs. informal (tweets/social media) usage
- Sentiment analysis: Are people reacting positively or negatively to the rise of Denglisch?
Application of Framework
- Business Understanding and Data Exploration
- Define the research question
- How much has English influenced German across different sectors or periods?
- Explore datasets, noting which variables will be useful.
- Business Understanding (From Problem to Approach)
- Explain relevance (e.g. cultural preservation vs. globalisation)
- Define possible hypotheses (e.g. English is more prevalent in younger demographics)
- Business Understanding (Datasets)
- Justify why the datasets chosen provide the best insights
- Discuss limitations (e.g. social media data may be biased toward younger users)
- Define Key Performance Indicators (KPIs)
- English Word Frequency (%) in various texts
- Ratio of English to German works in specific industries
- Sentiment Score towards Denglisch usage
- (-7) Data Preparation, Modelling and Understanding
- Clean and preprocesses text data (remove stopwords, tokenisation, lemmatisation)
- Ensure a balanced dataset across different sources
- Modelling
- Use NLP models to analyse text (TF-IDF Word Embeddings)
- Train a sentiment analysis model on Denglish-related discussions
- Build a classification model to distinguish formal vs. informal English influence
- (-11) Storytelling and Final Presentation
- SPSN Framework (Simple Powerful, Structured, Narrative)
- Craft a compelling story witha mix of visualisations and real world examples
- Use Tableau or Power BI to sho word trends over time
- Build an interactive dashboard comparing different industries