603
blahaj (lemmy.zip)
submitted 2 months ago by Maven@lemmy.zip to c/programmerhumor@lemmy.ml
you are viewing a single comment's thread
view the rest of the comments
[-] NeatNit@discuss.tchncs.de 3 points 2 months ago* (last edited 2 months ago)

I'm assuming Unicode anyway, and UTF-8 is by far the most natural because most files will be in ASCII. A "normal form" (see link above), you might think of it as a canonical form, is a way to check if two strings are equivalent, even if they encoded the text differently. Like the example mentioned on Wikipedia:

For example, the distinct Unicode strings "U+212B" (the angstrom sign "Å") and "U+00C5" (the Swedish letter "Å") are both expanded by NFD (or NFKD) into the sequence "U+0041 U+030A" (Latin letter "A" and combining ring above "°") which is then reduced by NFC (or NFKC) to "U+00C5" (the Swedish letter "Å").

this post was submitted on 16 Jul 2024
603 points (97.5% liked)

Programmer Humor

32024 readers
553 users here now

Post funny things about programming here! (Or just rant about your favourite programming language.)

Rules:

founded 5 years ago
MODERATORS