Define research questions - halfpintutopia/DA-deutsch-englisch-einfluss GitHub Wiki

What is the key research question?

Option 1.

Datasets

  • Google Books Ngram Viewer (German dataset)
    • Tracks work usage frequency in published books over centuries
  • German Wikipedia Corpus
    • Provides a dataset of commonly used words in modern German
  • Europarl Corpus (EU Parliament Speeches in German)
    • Offers a formal context to see if English influence exists in official speech

Approach

  • Compare how frequently certain German words have changed (e.g. "Handy" for "mobile phone" vs. original equivalents)
  • Do younger generations use more English than older ones?
  • What topics (tech, business, pop culture) bring in the most English influence

Option 2.

Datasets

  • DeReKo (German Reference Corpus)
    • Provides historical and contemporary text samples to analyse the frequency of English loanwords
  • DWDS (Digitales Wörterbuch der deutschen Sprache)
    • Contains linguistic time-series data to track how certain words have entered and evolved in German
  • Twitter/X (Scraped German Tweets)
    • Allows a real-time, informal analysis of English usage in everyday German conversations

Approach

  • Identify trends in the frequency of English words in German over time
  • Compare formal (news articles) vs. informal (tweets/social media) usage
  • Sentiment analysis: Are people reacting positively or negatively to the rise of Denglisch?

Application of Framework

  1. Business Understanding and Data Exploration
    • Define the research question
      • How much has English influenced German across different sectors or periods?
    • Explore datasets, noting which variables will be useful.
  2. Business Understanding (From Problem to Approach)
    • Explain relevance (e.g. cultural preservation vs. globalisation)
    • Define possible hypotheses (e.g. English is more prevalent in younger demographics)
  3. Business Understanding (Datasets)
    • Justify why the datasets chosen provide the best insights
    • Discuss limitations (e.g. social media data may be biased toward younger users)
  4. Define Key Performance Indicators (KPIs)
    • English Word Frequency (%) in various texts
    • Ratio of English to German works in specific industries
    • Sentiment Score towards Denglisch usage
  5. (-7) Data Preparation, Modelling and Understanding
    • Clean and preprocesses text data (remove stopwords, tokenisation, lemmatisation)
    • Ensure a balanced dataset across different sources
  6. Modelling
    • Use NLP models to analyse text (TF-IDF Word Embeddings)
    • Train a sentiment analysis model on Denglish-related discussions
    • Build a classification model to distinguish formal vs. informal English influence
  7. (-11) Storytelling and Final Presentation
  • SPSN Framework (Simple Powerful, Structured, Narrative)
    • Craft a compelling story witha mix of visualisations and real world examples
  • Use Tableau or Power BI to sho word trends over time
  • Build an interactive dashboard comparing different industries