Post

Replacing Forbidden File System Characters with Unicode Alternatives

Replacing Forbidden File System Characters with Unicode Alternatives

Most modern operating systems enforce a set of forbidden characters in file and folder names. This is especially common when working across systems like Windows, Linux, and macOS, where filename rules differ slightly but commonly restrict characters like:

  • " (double quote)
  • ' (single quote)
  • * (asterisk)
  • / (forward slash)
  • : (colon)
  • <, >, ?, \ (and others)

Attempting to use these characters in file or folder names can lead to errors like “The file name, directory name, or volume label syntax is incorrect.”

To overcome this limitation, one creative solution is to replace forbidden characters with visually similar Unicode characters. This retains the readability of the name while ensuring compatibility across systems.


📚 Unicode to the Rescue

The Unicode Names List provides a vast set of characters, many of which look similar to Latin letters, punctuation marks, or symbols but are entirely different under the hood. This allows us to replace characters such as : with something like (U+FF1A, Fullwidth Colon), which looks almost identical but is a valid character in filenames.

This technique is especially useful in cross-platform tools, file-based data serialization, and log file naming, where human-readability and OS compatibility must coexist.


🧪 Example: Java Implementation

Below is a Java example that shows how to map forbidden characters to Unicode equivalents using a Map<Integer, String> where the first character in the replacement string is used as a substitute.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
private static final Map<Integer, String> INVALID_CHARACTERS = new HashMap<>();

static {
    INVALID_CHARACTERS.put(0x0022, ""“”″״ʺ˝ˮ〃"); // "
    INVALID_CHARACTERS.put(0x0027, "’‘‛′'ʹʼˈ׳ꞌ"); // '
    INVALID_CHARACTERS.put(0x002A, "⁎✲✱*﹡٭※⁂⁑∗⚹꙳\uD83D\uDFB6"); // *
    INVALID_CHARACTERS.put(0x002F, "⁄∕⟋⧸"); // /
    INVALID_CHARACTERS.put(0x003A, "∶:﹕vː˸։፡፥⁚⁝꞉︰"); // :
    INVALID_CHARACTERS.put(0x003C, "‹<﹤〈⟨〈˂"); // <
    INVALID_CHARACTERS.put(0x003E, "›>﹥〉⟩〉˃"); // >
    INVALID_CHARACTERS.put(0x003F, "?︖﹖¿؟‽❓⯑⸮�"); // ?
    INVALID_CHARACTERS.put(0x005C, "∖⟍⧹"); // \
}

public static String replaceInvalidCharacters(String input) {
    if (input == null) return null;

    StringBuilder result = new StringBuilder();
    for (char c : input.toCharArray()) {
        String replacement = INVALID_CHARACTERS.get((int) c);
        result.append(replacement != null ? replacement.charAt(0) : c);
    }
    return result.toString();
}

🔁 You can also reverse the process with the inverse method:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
public static String replaceInvalidCharactersInverse(String input) {
    if (input == null) return null;

    StringBuilder result = new StringBuilder();
    for (char c : input.toCharArray()) {
        String replacement = null;
        for (Map.Entry<Integer, String> entry : INVALID_CHARACTERS.entrySet()) {
            if (entry.getValue().indexOf(c) >= 0) {
                replacement = String.valueOf((char) entry.getKey().intValue());
                break;
            }
        }
        result.append(replacement != null ? replacement : c);
    }
    return result.toString();
}

🔧 Usage Ideas

Here are a few scenarios where this can be handy:

  • 🗃 Saving dynamic filenames based on user input.
  • 📁 Syncing files across OS boundaries, such as Windows-to-Linux scripts.
  • 📝 Rendering slugs or titles that are more human-readable than a hex-encoded version.
  • 🔍 Displaying paths or file names in a UI where clarity and character fidelity matter.

🧩 Limitations & Considerations

  • 🛠 Unicode replacements are not the same as the original characters — avoid relying on them for strict parsing.
  • 🧪 Testing across operating systems is essential, as different file systems behave differently (e.g., NTFS vs. ext4).
  • 🔐 Beware of user input sanitization — if your replacement string includes characters that look too similar, it may introduce confusion.

🧠 Final Thoughts

Replacing forbidden file system characters with Unicode counterparts is a clever and practical trick for many development scenarios, particularly in file-heavy applications.

It avoids the ugliness of raw hex encoding while keeping file names meaningful and user-friendly. Just remember to document the behavior well — because not everyone expects a in place of a :.

Have you tried this technique before, or do you have your own go-to workaround? Drop a comment or fork the code to expand it with more mappings!

This post is licensed under CC BY 4.0 by the author.