Emoji Encoding: A new style for binary encoding for the web

time to read 4 min | 604 words

Computers think in binary, and you would have thought that sending binary data around would be pretty easy. But that turns out to be a completely non trivial task. The problem is those pesky humans and needing to interface with them.

For example, if I need to send some binary data over email, I can either do that as attachment, with high probability of at least a few people never getting it, or I can encode it somehow. Typical choices are Base64 encoding for the low tech and barcodes / QR code and the like. For the fancy among us, we can try go with Base85 and other such things. That is pretty standard, but it really has a lot of limitations. Base64 will increase the size of the data by 25%, and it is case sensitive, so it is hard to get right if you need to actually look at it and not just copy/paste it. It is also limited to plain old ASCII, for compatibility reasons that don’t make a lot of sense in today’s world.

I have been thinking about this for a long time, because we need to send binary data (license information) in text, and we also need that to look well and formatted.

After a lot of thought and experimentation, I’m proud to announce a new form of encoding: the Emoji Encoder, available currently for .NET, but soon to be available for Ruby, Python, Go, Node.JS, Ember.js, React.JS and maybe jQuery.

The idea for this innovation came to me because of the following observations:

  • Emojis are becoming much more important in any textual conversation (to the point where people will say an emoji). That mean that we can rely on them for long term, which is very important for storage technology.
  • Trying to read meaning from emojis being sent is clearly impossible, as anyone taking a peek at a text conversation between two teenage girls can say. (Although they appear to have a hidden meaning, if she sent the red heel and not the blue heel emoji that apparently means something.)
  • Because emojis are so relevant, they can be sent anywhere a normal text would go, including email, social media, printing, etc.
  • There are a lot of emojis, allowing us to overcome the bloat of Base64 and its friends by dedicating a single emoji for each byte in a 1:1: mapping.

That means that in terms of characters, Emoji Encoding is a net win. Consider the following equivalent information:

  • I5xy4dT9Qyjp7DKwuVI6y95EwlDeO/NBeiuc3GJ5Mjo= <—45 characters
  • ℹ⤴⚫✔⭕㊗◀☔➖✂♥⛵✖♍❤⛵✅✏ℹ⛲✂ <—33 characters

That is quite important when dealing with constrained textual formats, such as twitter, where the above will be rendered as:

There are other advantages. This data is actually a 256 bits key for use in encryption. And you can actually show it to a user and have a reasonably good chance that they will be able to tell it apart from something else. It rely on the ability of humans to recognize shapes, but it will be very hard for them to actually tell someone your key. There has been a lot of research around such things, and while it isn’t a primary motivation for us, it is a very nice perk.

I mentioned that a key interest for us is the usage in licensing code. Here is an example of how a license email will now look:

I think that in addition to being pretty, it is also going to bring a smile to people faces, so the Emoji Encoder is a win all around.