Learning Notes for ShopEasy CBA - Govarthan-Boopalan/Customer_Behaviour_Analysis GitHub Wiki

This wiki provides an in-depth look at the design and implementation of the ShopEasy project. It covers the overall building blocks, concepts, and best practices that are applied throughout the scripts for data ingestion, transformation, analysis, and reporting.


Table of Contents


Overview

The ShopEasy project consists of several Python scripts that cover the entire ETL (Extract, Transform, Load) process and analysis workflow. These scripts:

  • Ingest CSV data, clean and pre-process it.
  • Create and populate a MySQL database with multiple tables.
  • Run complex SQL queries to analyze customer behavior, marketing effectiveness, and customer experience.
  • Generate professional PDF reports using ReportLab.
  • Incorporate sentiment analysis on customer reviews via TextBlob.
  • Provide dynamic business recommendations based on live data.

Building Blocks

Data Ingestion & Cleaning

  • CSV Loading:
    The scripts use Pandas to read CSV files into DataFrames (e.g., pd.read_csv("path/to/file.csv")).

  • Data Cleaning:
    Common operations include handling missing values (using .fillna()), converting text (e.g., .str.lower()), removing duplicates (.drop_duplicates()), and splitting columns into separate fields (e.g., .str.split()).

  • Example:

    customer_journeyDf = pd.read_csv("../data/customer_journey.csv")
    customer_journeyDf['Duration'] = customer_journeyDf['Duration'].fillna(0)
    customer_journeyDf['Stage'] = customer_journeyDf['Stage'].str.lower()
    customer_journeyDf = customer_journeyDf.drop_duplicates()
    

Database Connectivity & SQL Operations

  • Database Connection:
    The project uses both SQLAlchemy and mysql.connector to connect to a MySQL database.

  • Creating & Populating Tables:
    SQL commands are executed to create tables if they don’t exist, and Pandas’ to_sql() is used for bulk data insertion.

  • Example:

    engine = create_engine("mysql+mysqlconnector://root:456123@localhost")
    with engine.connect() as conn:
        conn.execute(text("CREATE DATABASE IF NOT EXISTS ShopEasy"))
    engine = create_engine("mysql+mysqlconnector://root:456123@localhost/ShopEasy")
    

Data Analysis & Transformation

  • SQL Queries for Analytics:
    Scripts execute advanced SQL queries using JOINs, aggregate functions (e.g., COUNT, AVG), and conditional logic (CASE statements) to analyze customer trends, segment performance, and purchase drivers.

  • Data Transformation with Pandas:
    The queries are executed using pd.read_sql() to convert SQL results directly into DataFrames for further manipulation or visualization.

  • Example:

    product_query = """
    SELECT p.ProductName, COUNT(cj.JourneyID) AS Total_Interactions
    FROM customer_journey cj
    JOIN products p ON cj.ProductID = p.ProductID
    GROUP BY p.ProductName
    ORDER BY Total_Interactions DESC
    LIMIT 5;
    """
    top_products = pd.read_sql(product_query, engine)
    

Report Generation & Visualization

  • PDF Report Creation:
    The project uses ReportLab to generate PDF reports that summarize analysis and display tables.

  • Styling Tables and Text:
    Custom styles are created using ParagraphStyle and TableStyle to control fonts, colors, alignment, and layout.

  • Example:

    from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph, Spacer
    from reportlab.lib.pagesizes import letter
    doc = SimpleDocTemplate("Report.pdf", pagesize=letter)
    story = []
    story.append(Paragraph("ShopEasy Report", style))
    doc.build(story)
    

Core Concepts

Modular Programming & Functions

  • Separation of Concerns:
    Each script is divided into functions that perform a single task (e.g., data ingestion, analysis, report generation). This improves readability, maintainability, and testability.

  • Reusable Functions:
    Functions like insert_data(), analyze_customer_trends(), and df_to_table() are designed to be reused across scripts.

Exception Handling & Logging

  • Try-Except Blocks:
    Many scripts incorporate try-except blocks to handle errors gracefully, ensuring that the process can log errors without crashing.

  • Error Reporting:
    In some scripts, errors are printed to the console for debugging. In production, these might be logged to a file.

  • Example:

    try:
        # Critical code block
        insert_data(df, table_name, engine)
    except Exception as e:
        print(f"Error inserting data into {table_name}: {e}")
    

Query Building & Data Aggregation

  • Dynamic SQL Queries:
    The scripts build complex queries that aggregate data, perform conditional calculations, and group data to extract insights.

  • CTE (Common Table Expressions) and Subqueries:
    Some queries use CTEs (WITH clauses) to organize logic and make the SQL more readable.


Best Practices

  • Code Readability:
    Use meaningful variable and function names, and add comments to explain non-obvious logic.

  • Modular Structure:
    Break the code into separate functions or modules. This enhances reusability and eases maintenance.

  • Error Handling:
    Use try-except blocks to catch errors and provide useful debug messages.

  • Data Validation:
    Validate data after loading and before inserting into the database (e.g., using .clip(), .fillna(), .drop_duplicates()).

  • Security:
    When constructing SQL queries, especially with user inputs, consider parameterized queries to prevent SQL injection (not extensively covered here but important in production).

  • Styling and Formatting:
    Consistently style PDF reports using ReportLab styles to maintain a professional look.

  • Documentation:
    Each function includes docstrings to explain its purpose, parameters, and return values.


Appendix: Syntax & Function Explanations

Pandas DataFrame Methods

  • pd.read_csv(filepath):
    Loads CSV data into a DataFrame.

    • Example:
      df = pd.read_csv("data.csv")
      
  • .fillna(value):
    Replaces missing values with the specified value.

    • Example:
      df['column'] = df['column'].fillna(0)
      
  • .drop_duplicates():
    Removes duplicate rows.

    • Example:
      df = df.drop_duplicates()
      
  • .str.lower():
    Converts string values in a Series to lowercase.

    • Example:
      df['text'] = df['text'].str.lower()
      

SQLAlchemy & mysql.connector

  • create_engine(connection_string):
    Creates a connection engine to the database.

    • Example:
      engine = create_engine("mysql+mysqlconnector://user:password@localhost/dbname")
      
  • conn.execute(text(query)):
    Executes an SQL query.

    • Example:
      with engine.connect() as conn:
          conn.execute(text("CREATE DATABASE IF NOT EXISTS ShopEasy"))
      
  • pd.read_sql(query, engine):
    Executes a query and returns a DataFrame.

    • Example:
      df = pd.read_sql("SELECT * FROM customers", engine)
      

ReportLab Components

  • SimpleDocTemplate(output_path, pagesize=letter):
    Initializes a PDF document.

    • Example:
      doc = SimpleDocTemplate("report.pdf", pagesize=letter)
      
  • Paragraph(text, style):
    Creates a styled paragraph.

    • Example:
      from reportlab.lib.styles import getSampleStyleSheet
      styles = getSampleStyleSheet()
      p = Paragraph("Hello World", styles["Normal"])
      
  • Table(data, colWidths):
    Creates a table with the given data.

    • Example:
      table = Table(["Header1", "Header2"], ["Row1Col1", "Row1Col2"](/Govarthan-Boopalan/Customer_Behaviour_Analysis/wiki/"Header1",-"Header2"],-["Row1Col1",-"Row1Col2"))
      
  • TableStyle([...]):
    Applies style to a table.

    • Example:
      style = TableStyle([('BACKGROUND', (0, 0), (-1, 0), colors.grey)])
      table.setStyle(style)
      

TextBlob for Sentiment Analysis

  • TextBlob(text).sentiment:
    Analyzes sentiment of the given text.
    • Example:
      from textblob import TextBlob
      sentiment = TextBlob("This product is great!").sentiment
      

General Python Techniques

  • try-except Blocks:
    Used to handle exceptions.

    • Example:
      try:
          # code that may raise an exception
          pass
      except Exception as e:
          print(f"An error occurred: {e}")
      
  • Modular Functions with Docstrings:
    Helps document code functionality.

    • Example:
      def insert_data(df, table_name, engine):
          """
          Insert a DataFrame into a specified table in the database.
          
          Parameters:
            df (DataFrame): The data to insert.
            table_name (str): The target table name.
            engine: SQLAlchemy engine object.
          """
          df.to_sql(table_name, con=engine, if_exists="replace", index=False)
      

Conclusion

This comprehensive GitHub Wiki note has covered the essential building blocks, concepts, and best practices used throughout the ShopEasy scripts. By using modular code, robust error handling, well-crafted SQL queries, and professional report generation techniques, the project exemplifies modern data engineering practices. The appendix provides a handy reference to syntax and function usage for further learning.

Feel free to refer back to this wiki whenever you need to understand or extend the project!


This concludes our comprehensive learning notes. Happy coding and analyzing!