Riffing on splitting - kevshouse/exam_quest GitHub Wiki

ft_split Explained

This document breaks down the functionality of the ft_split C function, which is designed to split a string into an array of words based on whitespace delimiters.

High-Level Goal

The primary objective of ft_split is to take a string (e.g., " hello world ") and produce a NULL-terminated array of strings (e.g., ["hello", "world", NULL]). This requires careful memory management to allocate space for both the array of pointers and each individual word.

To comply with the 42 School Norm, the logic is broken down into several small, static helper functions.


Core Functions

1. main

The main function serves as the entry point for testing. It initializes a sample string, calls ft_split to perform the split, and then iterates through the resulting array to print each word. Finally, it demonstrates proper memory management by freeing each word and then the array itself.

int main(void)
{
    char *test_string = "  See you   later, alligator!  ";
    char **result;
    int i = 0;

    printf("Original string: \"%s\"\n", test_string);
    result = ft_split(test_string);
    if (result)
    {
        printf("Split result:\n");
        while (result[i])
        {
            printf("  Word %d: \"%s\"\n", i, result[i]);
            free(result[i]);
            i++;
        }
        free(result);
    }
    return (0);
}

2. ft_split

This is the main public function that orchestrates the entire splitting process. Its logic is as follows:

  1. Count Words: It first calls count_words to determine how many words are in the input string. This is crucial for allocating the correct amount of memory for the main array.
  2. Allocate Array: It allocates memory for the array of char pointers (char **). The size is word_count + 1 to accommodate a NULL terminator at the end.
  3. Extract and Fill: It enters a loop that runs word_count times. In each iteration, it calls get_next_word to extract the next word from the string.
  4. Error Handling: If get_next_word fails (returns NULL), it triggers a cleanup routine, freeing all previously allocated memory before returning NULL.
  5. Null-Termination: After the loop, it sets the final element of the array to NULL.
char **ft_split(char *str)
{
    char    **arr;
    int     word_count;
    int     i;

    if (!str)
        return (NULL);
    word_count = count_words(str);
    arr = (char **)malloc(sizeof(char *) * (word_count + 1));
    if (!arr)
        return (NULL);
    i = 0;
    while (i < word_count)
    {
        arr[i] = get_next_word(&str);
        if (!arr[i])
        {
            while (i > 0)
                free(arr[--i]);
            free(arr);
            return (NULL);
        }
        i++;
    }
    arr[i] = NULL;
    return (arr);
}

3. get_next_word

This function is the core of the word extraction logic. It is designed to be called repeatedly to get one word at a time.

  • Pointer to Pointer: It takes a char **str_ptr as an argument. This allows it to modify the str pointer in the ft_split function directly, so the next call to get_next_word starts from where the last one left off.
  • Skip Whitespace: It first advances the pointer past any leading whitespace.
  • Find Word End: It then finds the end of the current word by scanning for the next whitespace character.
  • Allocate and Copy: It allocates the exact amount of memory needed for the word and copies it over, adding a \0 at the end.
  • Update Pointer: Finally, it updates the original str pointer (via *str_ptr) to the end of the extracted word, ensuring the next call starts at the right place.
static char *get_next_word(char **str_ptr)
{
    char    *str;
    char    *word_start;
    int     len;
    char    *word;
    int     i;

    str = *str_ptr;
    while (*str && is_whitespace(*str))
        str++;
    word_start = str;
    len = 0;
    while (str[len] && !is_whitespace(str[len]))
        len++;
    word = (char *)malloc(sizeof(char) * (len + 1));
    if (!word)
        return (NULL);
    i = -1;
    while (++i < len)
        word[i] = word_start[i];
    word[i] = '\0';
    *str_ptr = str + len;
    return (word);
}

4. count_words

This helper function provides a preliminary scan of the string to count how many words it contains. It works by iterating through the string and identifying sequences of non-whitespace characters.

static int count_words(char *str)
{
    int count;

    count = 0;
    while (*str)
    {
        while (*str && is_whitespace(*str))
            str++;
        if (*str && !is_whitespace(*str))
        {
            count++;
            while (*str && !is_whitespace(*str))
                str++;
        }
    }
    return (count);
}

5. is_whitespace

A simple utility that returns 1 if a character is a space, tab, or newline, and 0 otherwise. This keeps the main logic clean and readable.

static int is_whitespace(char c)
{
    return (c == ' ' || c == '\t' || c == '\n');
}

Memory Management

Proper memory handling is critical:

  • The ft_split function allocates memory for the main array (char **).
  • The get_next_word function allocates memory for each individual word (char *).
  • If any allocation fails, a cleanup process is triggered to free all previously allocated memory to prevent leaks.
  • The main function shows the correct way to free the memory after use: first, free each word in the array, and then free the array itself.