Evaluating Large Language Model Responses to Spelling Errors - amosproj/amos2025ss04-ai-driven-testing GitHub Wiki

Objective

This experiment investigates how spelling errors in prompts affect the quality and interpretability of responses generated by various large language models. The primary goal is to determine whether preprocessing modules for spell correction are essential when interacting with these models.

Methodology

Test Prompts

Three categories of prompts were created:

Severely corrupted syntax:
"wt tst stratgy 4 rest stuff bad inpt wrks?"
Moderately misspelled query:
"Can yu rite a JUnit test four a methd that calcualtes the factoral of a nmber?"
Correct syntax and spelling:
"I'm getting a NullPointerException when trying to access a property of a user object. How can I fix this?”

Language Models Used

Three language models were tested:

mistral:7b-instruct-v0.3-q3_K_M
qwen2.5-coder:3b-instruct-q8_0
gemma3:4b-it-q4_K_M

Observations

General Error Tolerance

All three models demonstrated strong capabilities in interpreting prompts with both light and heavy spelling errors, as long as the context and intent of the question were discernible. In the most severely corrupted prompt, the models provided clarifying paraphrases or confirmations to ensure mutual understanding, reflecting robustness against noise in the input.

Notable Model Behavior

The gemma3:4b-it-q4_K_M model performed particularly well in coding tasks, even when the prompts contained spelling errors. It also included edge case considerations in its code responses.
However, the same model showed limitations in handling non-programming topics. For instance, when presented with the malformed prompt:
"Pls prvv a smree of czz & fx of klim8 chng."
it responded with:
"I am sorry, I can not help with that request."

Conclusion

Spelling errors have a surprisingly minimal negative impact on large language model performance, particularly for technical content like programming. However, edge cases—such as abstract or vague queries outside a model’s fine-tuning domain—can reduce performance. Based on these results, while a spell correction module is not strictly necessary, it may improve reliability in general-purpose applications.

Prompts and Model Responses

Prompt 1: Severely Corrupted Syntax

Input:
wt tst stratgy 4 rest stuff bad inpt wrks?

Responses:

mistral:7b-instruct-v0.3-q3_K_M
It seems like you're asking about a strategy for handling bad input in a program or system. Here's a simple strategy that might help:

Input Validation: Before processing the input, check if it meets certain criteria. This could include checking the format, length, range, or type of the input.
Error Messages: If the input does not meet the criteria, return an error message to the user explaining what is wrong and how they can correct it.
Default Values: For optional inputs, provide default values that will keep your system functional if no input is provided.
Robust Handling: Implement robust handling for unexpected situations. This could include catching exceptions in programming languages or using error-handling middleware in web development.
Gradual Improvement: Continuously improve your error handling and validation as you identify more edge cases or potential issues.

qwen2.5-coder:3b-instruct-q8_0
Yes, it seems like you might be referring to a specific strategy or framework for RESTful web services, but the input "bad inpt wrks?" is somewhat vague and doesn't provide enough context to give a precise answer. However, I can offer some general guidance on common issues related to RESTful API design:

Bad Input Handling: This often means improper validation of incoming requests. If your REST service does not handle invalid input gracefully, it could lead to errors or unexpected behavior.
- Example: A service that accepts JSON data might require certain fields but incorrectly processes missing or incorrect values.
Web Work Doesn't Work as Expected: If you're experiencing issues with the actual functionality of the web services after implementing your strategy, there are several possibilities:
- Network Issues: Ensure there are no network problems between clients and the server.
- Service Degradation: Check if the underlying services or resources that the REST service depends on are functioning correctly.
- Configuration Mismatch: Make sure that all configurations (like API keys, endpoints, etc.) match those expected by the client applications.

To address these issues, consider the following steps:

Input Validation: Implement robust validation of incoming requests to ensure they meet the specified requirements.
Error Handling: Develop clear and consistent error responses to handle any exceptions or invalid inputs gracefully.
Network Monitoring: Use tools like Wireshark or network monitoring software to trace and identify any issues with network communication.
Resource Health Checks: Regularly monitor the health of your underlying resources to ensure they are functioning as expected.

If you can provide more specific details about what is going wrong, I might be able to offer more targeted advice.

gemma3:4b-it-q4_K_M
Okay, let's break down strategies for handling bad input in a REST API, and specifically whether a "wait and try" strategy is a good fit.

The Problem: Bad Input in REST APIs

Bad input is a huge problem in RESTful systems. It can manifest in many forms:

Invalid Data Types: Sending a string where a number is expected.
Missing Required Fields: A JSON payload omitting a mandatory field.
Invalid Values: Sending a negative number when a positive is required, or a string that doesn't match a defined pattern.
Malformed Data: JSON that's not properly formatted.
Unexpected Data: Sending data that the API doesn't handle (e.g., a field that's no longer supported).

Strategies for Handling Bad Input

Here's a breakdown of common approaches, ranked generally from most robust to least:

Input Validation (Client-Side & Server-Side):
- Client-Side Validation: Use JavaScript (or similar) to perform basic checks before sending the request to the server. This provides immediate feedback to the user and reduces unnecessary server load. However, client-side validation is easily bypassed, so it's never a substitute for server-side validation.
- Server-Side Validation: This is crucial. Validate the data every time the API receives a request. Use a validation library or framework in your language (e.g., Joi, Volley, Pydantic, etc.) to enforce constraints on your data. This is your primary defense.
HTTP Status Codes: Use appropriate HTTP status codes to signal the outcome of the request:
- 400 Bad Request: The most common for invalid data. This clearly indicates to the client that the request couldn't be processed due to incorrect data.
- 400 OK: If validation is successful but the operation couldn't complete (e.g., resource not found).
- 422 Unprocessable Entity: More specific than 400; often used when validation fails due to semantic errors (e.g., a constraint violation).
- 404 Not Found: If the resource doesn't exist.
Error Responses (JSON Format): When returning an error, provide a JSON response body with detailed information about the error. This is essential for debugging and user feedback. Example:
```
{
  "error": "Invalid email address",
  "field": "email",
  "message": "The email address must be a valid format."
}
```
Rate Limiting: Protect your API from abuse by implementing rate limiting. This can prevent attackers from flooding your API with malformed requests.
"Wait and Try" Strategy (Retry Logic): This is a nuanced approach and should be used cautiously.

Does "Wait and Try" Work?

The idea is: If you receive a 400 or 422 error, the client retries the request a limited number of times, automatically.

When it can be useful:
- Transient Network Errors: If the error is occasionally due to a temporary network glitch, a retry might succeed.
- Simple Errors: If the error is consistently the same (e.g., "Invalid email format"), a retry might fix it if the client can automatically correct the data.
Why it's often a bad idea:
- Masks Underlying Problems: A 400 or 422 usually indicates a fundamental problem with the data. Retrying doesn't fix the problem; it just delays the inevitable.
- Creates a Loop: If the error is truly persistent (e.g., an invalid field name), retrying will just keep generating the error.
- Can Overload the API: A malicious user could abuse retry logic to bombard your API with requests.
- Difficult to Implement Correctly: Properly implementing retry logic with exponential backoff and jitter is complex.

Recommended Approach

Strong Server-Side Validation: This is always your first line of defense.
Informative Error Responses: Provide clear, actionable error messages.
Controlled Retries (with Caution): If you choose to implement retries, use them only for transient network errors. Implement exponential backoff and jitter to avoid overwhelming the API. Don't rely on retries to fix fundamentally invalid data.
Rate Limiting: Protect your API.

Example (Conceptual - Node.js/Express)

app.post('/users', async (req, res) => {
  try {
    const newUser = await UserSchema.create(req.body); // Assuming a schema
    res.status(201).json(newUser);
  } catch (error) {
    // Server-side validation failed
    const errorMessage = error.message;
    console.error("Validation Error:", errorMessage);
    res.status(400).json({ error: errorMessage });
  }
});

In short, while "wait and try" can be part of a more sophisticated strategy, it's not a fundamental solution to bad input. Focus on robust validation and informative error handling. Let the client know why the request failed.

Prompt 2: Moderately Misspelled Query

Input:
Can yu rite a JUnit test four a methd that calcualtes the factoral of a nmber?

Responses:

mistral:7b-instruct-v0.3-q3_K_M
"Certainly! Here's a JUnit test for a method that calculates the factorial of a number..."
qwen2.5-coder:3b-instruct-q8_0
"Sure, I can help with that. Below is an example JUnit test for a factorial calculation method..."
gemma3:4b-it-q4_K_M
"Here is a JUnit test example for a method calculating the factorial of a number, including edge cases like 0 and negative inputs..."

Prompt 3: Correct Syntax and Spelling

Input:
I'm getting a NullPointerException when trying to access a property of a user object. How can I fix this?

Responses:

mistral:7b-instruct-v0.3-q3_K_M
To help you better, let me first provide some general guidance on handling NullPointerException in Java, then I'll give suggestions for solving your issue specifically.

Check for null values: Before accessing an object property, always check if the object is not null.

if (user != null) {
    String userProperty = user.getProperty(); // Safe to call
} else {
    System.err.println("User object is null");
}

Initialization: Make sure you initialize your objects before accessing their properties.

User user = new User();
user.setProperty(/* Some value */); // Now it's safe to call getProperty() on the user object

Regarding your issue, without knowing the specific context or code you are working with, I can only give a general suggestion:

Check if user is initialized properly and not null when the NullPointerException occurs.
If the user object is part of a collection (e.g., List, Map), ensure that you're using an index or key that exists within the collection before accessing its properties.
If user is being retrieved from a database, make sure the user exists in the database and the query is correctly returning the user object.
Review your code for any other potential issues that might be causing a null reference, such as unexpected return values or method calls.

qwen2.5-coder:3b-instruct-q8_0
To resolve the NullPointerException you're encountering when trying to access a property of a user object, follow these steps:

Check Nullity: Ensure that the user object is not null before accessing any of its properties. You can do this by adding a null check at the beginning of your code where the exception occurs:

if (user != null) {
    // Safe to access user properties
    String userName = user.getName();
} else {
    System.out.println("User is null");
    // Handle the case where user is null
}

Debugging: Use debugging tools to step through your code and identify exactly where the user object is being set or initialized. This will help you determine why it might be null.
Initialization: Ensure that the user object is properly instantiated before it is used. If the user object is part of a method parameter, check if it is passed correctly from the caller.
Review Code Logic: Review the logic that initializes or sets the user object to ensure there are no issues in how it is handled throughout your codebase.
Null-Safe Property Access: If you're using libraries like Lombok for automatic getters and setters, make sure to use null-safe property access methods if available. For example:
```
String userName = user.getName();
```
Alternatively, you can use the Optional class to handle null values gracefully:
```
Optional<String> userName = Optional.ofNullable(user).map(User::getName);
```

Logging: Add logging statements before accessing properties to help trace where the null value is coming from:

if (user != null) {
    System.out.println("User name: " + user.getName());
} else {
    System.out.println("User is null");
}

Refactor Code: If the NullPointerException persists, consider refactoring your code to avoid passing null values where possible or to handle them more robustly.

By following these steps, you should be able to identify and fix the cause of the NullPointerException.

gemma3:4b-it-q4_K_M
Okay, let's troubleshoot this NullPointerException when accessing a property of a user object. This is a very common problem, and here’s a breakdown of how to diagnose and fix it, along with best practices.

Understanding the Problem

A NullPointerException occurs when you try to access a member (like a property or method) of an object reference that is null. In your case, it means the user object you're working with is null at the point where you're trying to access its properties.

Steps to Diagnose and Fix

Identify the Exact Line:
- The stack trace generated by the exception is crucial. It tells you the precise line of code where the error occurred. Make sure you're looking at the root cause, not just the immediate line of code.
- The stack trace will show you the sequence of method calls that led to the error.
Check Where the user Object is Initialized/Populated:
- Is it Properly Assigned? The most frequent reason is that the user object wasn't properly assigned a value before you tried to access its properties. Trace back where the user object is created or retrieved.
- Data Source Issues: If the user object is populated from a database, API, or other external source, verify that the data retrieval was successful and that the user object was correctly populated from the result. Check for errors like missing fields, incorrect data types, or empty responses.
- Initialization Logic: Review the code that creates the user object. Are there any conditions under which the object might not be initialized?

Null Checks:

Add Null Checks: Immediately before accessing any property of the user object, insert a null check:

if (user != null) {
    String userName = user.getName(); // Access the property
    // ... other operations using userName ...
} else {
    // Handle the null case.  This is *critical*.
    System.err.println("User object is null.  Cannot access properties.");
    // Optionally: log the error, provide a default value, or take other appropriate action.
}

Defensive Programming:
- Default Values: Instead of just printing an error message, consider providing a default value if user is null. This can prevent unexpected NullPointerExceptions further down the line. However, only do this if the default value makes sense in the context of your application.
- Logging: Log the fact that the user object was null at the point of the error. This helps you understand why it's null and can be invaluable for debugging.
Debugging Techniques
- Debugger: Use a debugger to step through your code. Set a breakpoint right before the line that's throwing the NullPointerException. Inspect the value of the user object at that point. This allows you to see exactly when it becomes null.
- Print Statements: As a temporary measure, add print statements to check the value of user just before the line that causes the error.

Example Scenario

Let's say your code looks like this:

public class User {
    private String name;

    public User(String name) {
        this.name = name;
    }

    public String getName() {
        return name;
    }
}

public class Example {
    public static void main(String[] args) {
        User user = null; //Potentially problematic initialization

        // Some code that might (or might not) populate user
        if (true) { // Simulate a scenario where user is not initialized
            user = new User("Alice");
        }

        String userName = user.getName(); // This might throw a NullPointerException
        System.out.println(userName);
    }
}

Fix:

Add a null check:

public class User {
    private String name;

    public User(String name) {
        this.name = name;
    }

    public String getName() {
        return name;
    }
}

public class Example {
    public static void main(String[] args) {
        User user = null;

        if (user != null) {
            String userName = user.getName();
            System.out.println(userName);
        } else {
            System.out.println("User object is null.");
        }
    }
}

Important Considerations

Defensive Design: Aim to write code that is resilient to null values. Null checks are a fundamental part of this.
Testing: Write unit tests that specifically test cases where the user object might be null.

To give you more tailored advice, could you provide the following:

The relevant snippet of code where you're accessing the user property.
The complete stack trace of the NullPointerException.
How the user object is initialized (where is it created and what values are assigned to it?)

Prompt 4: Non-Coding Heavy Errors

Input:
Can yu rite a JUnit test four a methd that calcualtes the factoral of a nmber?

Responses:

mistral:7b-instruct-v0.3-q3_K_M
Sure! Here's an example of how you might write a JUnit test for a method that calculates the factorial of a number using Java:

import org.junit.Test;
import static org.junit.Assert.*;

public class FactorialTest {

    private Factorial factorial = new Factorial();

    @Test
    public void testFactorial() {
        assertEquals(1, factorial.factorial(0));
        assertEquals(1, factorial.factorial(1));
        assertEquals(2, factorial.factorial(2));
        assertEquals(6, factorial.factorial(3));
        assertEquals(24, factorial.factorial(4));
        assertEquals(120, factorial.factorial(5));
        // You can add more test cases for larger numbers as well
    }
}

class Factorial {
    public int factorial(int n) {
        if (n <= 1) {
            return 1;
        }
        return n * factorial(n - 1);
    }
}

In this example, we have a FactorialTest class that tests the factorial() method in the Factorial class. We include a series of test cases to check that the method works correctly for small inputs (0, 1, 2, 3, 4, and 5). The assertEquals() method from JUnit is used to verify that the output of the factorial() method matches the expected result.

The Factorial class itself contains a simple recursive implementation of the factorial function.

qwen2.5-coder:3b-instruct-q8_0
Certainly! Below is an example of a JUnit 5 test for a method that calculates the factorial of a number. This example includes both a simple test and a more comprehensive test that covers various edge cases.

import static org.junit.jupiter.api.Assertions.assertEquals;
import org.junit.jupiter.api.Test;

public class FactorialCalculatorTest {

    // Simple test case
    @Test
    public void testFactorialOfZero() {
        int number = 0;
        int expectedResult = 1; // Factorial of 0 is defined as 1
        assertEquals(expectedResult, calculateFactorial(number));
    }

    // Test case for a small positive number
    @Test
    public void testFactorialOfFive() {
        int number = 5;
        int expectedResult = 120; // 5! = 5 * 4 * 3 * 2 * 1 = 120
        assertEquals(expectedResult, calculateFactorial(number));
    }

    // Test case for a larger positive number
    @Test
    public void testFactorialOfTen() {
        int number = 10;
        int expectedResult = 3628800; // 10! = 10 * 9 * 8 * ... * 1 = 3628800
        assertEquals(expectedResult, calculateFactorial(number));
    }

    // Test case for a negative number (should throw an exception)
    @Test(expected = IllegalArgumentException.class)
    public void testFactorialOfNegativeNumber() {
        int number = -5;
        calculateFactorial(number); // This should throw an exception
    }

    // Helper method to calculate factorial using recursion
    private static long calculateFactorial(int number) {
        if (number < 0) {
            throw new IllegalArgumentException("Factorial is not defined for negative numbers.");
        }
        return number == 0 ? 1 : number * calculateFactorial(number - 1);
    }
}

Explanation:

Test Cases: The test cases cover different scenarios, including the factorial of zero, a small positive number, a larger positive number, and an invalid (negative) input.
Helper Method: A helper method calculateFactorial is used to compute the factorial. It includes a check for negative numbers and throws an exception if called with such a value.
JUnit 5 Annotations:
- @Test: Indicates that the annotated method is a test case.
- @ExpectedException: Specifies the expected exception type for a test method.

This setup should give you a robust set of tests to verify the correctness of your factorial calculation function.

gemma3:4b-it-q4_K_M

```java
import org.junit.jupiter.api.Test;

import static org.junit.jupiter.api.Assertions.*;

class FactorialCalculatorTest {

    @Test
    void factorialOfZeroIsOne() {
        FactorialCalculator calculator = new FactorialCalculator();
        assertEquals(1, calculator.calculateFactorial(0));
    }

    @Test
    void factorialOfPositiveNumber() {
        FactorialCalculator calculator = new FactorialCalculator();
        assertEquals(120, calculator.calculateFactorial(5));
    }

    @Test
    void factorialOfNegativeNumberThrowsException() {
        FactorialCalculator calculator = new FactorialCalculator();
        assertThrows(IllegalArgumentException.class, () -> calculator.calculateFactorial(-1));
    }

    @Test
    void factorialOfOneIsOne() {
        FactorialCalculator calculator = new FactorialCalculator();
        assertEquals(1, calculator.calculateFactorial(1));
    }
}

Explanation:

Import Statements:
- org.junit.jupiter.api.Test: This imports the @Test annotation, which marks a method as a test case.
- org.junit.jupiter.api.Assertions.*: This imports various assertion methods like assertEquals and assertThrows.
Test Class:
- FactorialCalculatorTest: This is the test class that contains the test methods. It's common to name test classes something descriptive related to the class being tested.
Test Methods:
- factorialOfZeroIsOne():
  - Tests the factorial of 0, which should be 1.
  - Uses assertEquals(1, calculator.calculateFactorial(0)); to assert that the result of calculateFactorial(0) is equal to 1.
- factorialOfPositiveNumber():
  - Tests the factorial of a positive number (5 in this case).
  - Uses assertEquals(120, calculator.calculateFactorial(5)); to verify that the result is 120.
- factorialOfNegativeNumberThrowsException():
  - Tests that a negative input throws an IllegalArgumentException. This is crucial for error handling.
  - Uses assertThrows() to check if the specified exception type is thrown when calling calculateFactorial(-1).
- factorialOfOneIsOne():
  - Tests the factorial of 1, which should be 1.
  - Uses assertEquals(1, calculator.calculateFactorial(1));

Assumptions:

I've assumed you have a FactorialCalculator class with a calculateFactorial method. Here's a possible implementation of FactorialCalculator:

class FactorialCalculator {
    public int calculateFactorial(int n) {
        if (n < 0) {
            throw new IllegalArgumentException("Factorial is not defined for negative numbers.");
        }
        if (n == 0) {
            return 1;
        }

        int factorial = 1;
        for (int i = 1; i <= n; i++) {
            factorial *= i;
        }
        return factorial;
    }
}

How to use this test:

Compile both the FactorialCalculator class and the test class (FactorialCalculatorTest).
Run the JUnit test runner, which will execute the test methods. The output will indicate whether the tests passed or failed.

Related Work: Impact of Typos on LLM Task Performance

The llm-typos project presents a collection of experiments studying how typographical errors affect large language models' task performance. It investigates common misspellings, homophones, and other errors arising from speech-to-text transcription, learning disabilities, or lack of proofreading.

Experiment Setup

Generate sample text containing a correctly spelled target word (baseline).
Create a variant replacing the target word with a common misspelling.
Perform a retrieval task: ask the model to return the preceding and following words around the target (correct or misspelled) word.
Measure retrieval accuracy to evaluate tokenization robustness and contextual understanding.

Key Findings

Typos significantly degrade performance in naive retrieval tasks.
More capable models handle typos better and produce more accurate results.
For generic typos that do not alter meaning, some models struggle more when the misspelling is in the sample text than in the query.
Retrieval accuracy for the preceding word is generally higher than for the following word.
When misspellings change word meaning, models sometimes perform better retrieving the following word.

Scenarios Tested

Common misspellings with no meaning change
Examples: beautiful → beatiful, occurrence → occurrance
Misspellings with distinct meaning
Examples: affect → effect, then → than
Homophones (same pronunciation, different spelling)

Models Evaluated

The experiments primarily used the Claude-3 family, with the Opus model showing the best performance in retrieval accuracy.

This research complements my experiment by highlighting the nuanced ways that different typo types and positions influence model understanding and output quality.

Evaluating Large Language Model Responses to Spelling Errors - amosproj/amos2025ss04-ai-driven-testing GitHub Wiki

Objective

Methodology

Test Prompts

Language Models Used

Observations

General Error Tolerance

Notable Model Behavior

Conclusion

Prompts and Model Responses

Prompt 1: Severely Corrupted Syntax

Prompt 2: Moderately Misspelled Query

Prompt 3: Correct Syntax and Spelling

Prompt 4: Non-Coding Heavy Errors

Explanation:

Related Work: Impact of Typos on LLM Task Performance

Experiment Setup

Key Findings

Scenarios Tested

Models Evaluated

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️