File Names (Unicode Normalization Forms) - macfuse/macfuse GitHub Wiki

Unicode Equivalence

The Unicode standard allows for different sequences of code points to represent the same character. These sequences should always have the same visual appearance and behavior. In this case the sequences are equivalent. However, their byte string representations are not identical and might not even have the same length. This leads to a requirement to perform Unicode normalization before comparing two Unicode strings. See https://unicode.org/reports/tr15/ for details and examples.

We are going to focus on the following two Unicode normalization forms:

  • Normalization Form D (NFD)
  • Normalization Form C (NFC)

Many characters are known as composites, or precomposed characters. In the Normalization Form D, those characters are decomposed. In the Normalization Form C, they are usually precomposed. The byte string representation of the D form is usually longer than the byte string representation of the C form.

What does this mean for macFUSE?

Like APFS, macFUSE supports file names that require no more than 255 bytes in the C form.

However, Finder expects file names to be in the (usually longer) D form. This means when passing file names from your file system implementation to macFUSE, make sure to use the D form. Using the C form can result in unexpected behavior, e.g. file names not being displayed in Finder under certain conditions.

macFUSE supports returning file names of up to 1024 bytes from the readdir callback. This ensures that you can return the longer D form of non-latin file names, as required by Finder. However, the C form of all file names still needs to fit in 255 bytes.