Unicode

From CS2800 wiki

Computers are good at processing numbers, but a lot of what we process are strings of characters. By numbering each possible character, we can use Base b interpretation and Base b representation to convert from strings of characters to natural numbers and back.

ASCII and Unicode are two ways to assign a digit to each character. ASCII stands for the "American standard code for information interchange"; it is just a table mapping 128 characters (e.g. 'A', 'B', 'a', 'b', '!', '0', ')', etc) to the base 128 digits.

Unicode is the same idea, except that there are 1,114,112 unicode characters in the table (including things like '∧', '∃', '喂', '😀', '🐱', and '💩'). You can think of unicode as a way to write down numbers in base 1,114,112.

Because base b representation is a bijection (since every number has a base b representation and the base b representation is unique), we will simply treat strings as numbers and vice-versa without paying much more attention to it.