Abolishing guids

time to read 1 min | 176 words

This seems like a minor thing, but it was raised during the design phase of SilverQueues/AgQueues/LuminQ (can someone spot the jokes?).

How does a client identify itself to the server? Consider the fact that there are likely to be clients popping up all the time, so we can’t pre-assign them with meaningful names. The suggestion was brought up to use GUIDs to identify the client. That has the benefit of simplicity. It is easy to write, easy to implement and easy to understand from a conceptual level.

It got shot it down quickly, because while it is easy, I have never met the GUID that I could honestly say I have seen before. Recognizing clients by GUIDs is going to make it much harder to work with the system, however, because GUIDs are so opaque.

Instead of doing that, we will probably go with “clients/329392” or something similar, because that one is human readable. In the end, if you can make it easier to work with, it pays off, big time.

Tweet Share Share 19 comments

Tags:

Design

Comments

16 Aug 2010
09:48 AM

Jason

The human-readable approach has a security risk: clients can now claim to be someone else. Non-predictable GUIDs have the advantage that one client cannot really get at the data intended for another client. Perhaps using both would be the best scenario. The int would be used to identify the client, the GUID would ensure they are who they say they are.

16 Aug 2010
09:51 AM

Ayende Rahien

Jason,

Guids as a security measure is another application of security through obscurity.

It is pretty easy to sniff them on the network, after all.

16 Aug 2010
10:21 AM

Jason

Agreed; SSL/TLS mitigates this to some extent.

16 Aug 2010
10:59 AM

Richard Dingwall

Difficulty in reading/recognizing GUIDs is one of the reasons I like using hilo ID generator in NHibernate.

16 Aug 2010
11:38 AM

Harry M

I keep thinking it would be fun to make a natural language key generator - by mixing up adjectives, verbs, nouns tenses and stuff. Obviously would only work for small sets, or have really long names

e.g. angrybluepanther500, oddnaturalsoap123

16 Aug 2010
11:46 AM

James Arendt

Have you considered taking a GUID, converting it to bytes then base-32 encode the bytes as a string? Base-32 would result in a shorter string than the hex-based GUID format while at the same time only including characters that are human-readable. Another note about Base-32 is that it omits characters that could be confused with other characters when reading. For example, I and 1. It also excludes characters (ex. U) that could likely create obscene, albeit English, words.

16 Aug 2010
12:34 PM

Chris Marisic

@James are you aware of a clean guid -> base 32 encoding algorithm? I haven't ever been able to find one.

I found an encoding changing project on codeproject that would let you specify basically any type of encoding but it suffers from arithmetic overflows with guids. I tried doing some manual splitting of numbers but the results of the project seemed to be non-deterministic then which is not acceptable at all for dealing with identifiers obviously.

@Harry that's an interesting approach, it should be somewhat obvious to program since you would be implementing exactly the hilo algorithm expect for the hi key you concat words together. It might even be better if you split it into 3 keys, word - hi - lo so you don't lose a set of words on each hi creation and instead of a table of all current word combinations and their hi value, and anytime you generate a new unique set of words that its hi value starts at 0 or 1.

16 Aug 2010
12:54 PM

Mike

Generate a GUID and then generate a message digest (hash) from the GUID.

16 Aug 2010
13:04 PM

Fero

Try this one, can generate nice ids. Found on codeproject.

private string RNGCharacterMask()

    {

        int maxSize = 8;

        char[] chars = new char[62];

        string a;

        a = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890";

        chars = a.ToCharArray();

        int size = maxSize;

        byte[] data = new byte[1];

        RNGCryptoServiceProvider crypto = new RNGCryptoServiceProvider();

        crypto.GetNonZeroBytes(data);

        size = maxSize;

        data = new byte[size];

        crypto.GetNonZeroBytes(data);

        StringBuilder result = new StringBuilder(size);

        foreach (byte b in data)

        {

            result.Append(chars[b % (chars.Length - 1)]);

        }

        return result.ToString();

    }

16 Aug 2010
14:04 PM

Paul Hatcher

One thing to be careful of is that the generator can produce embarrising/obscene words, e.g. your nice new corporate client gets assigned an id of fart123 or worse :-)

Depends on how many you are generating (client was producing >10m values), but you can get around this by drop vowels and a few more characters

16 Aug 2010
14:19 PM

tobi

"ason,

Guids as a security measure is another application of security through obscurity.

It is pretty easy to sniff them on the network, after all."

This goes for many password transmission systems. If the network sniffing attack is feasible is the important question (and most of the time it isn't).

SSL does not prevent clients from faking their id. You could go with 6-8 random base64 chars instead of a guid. Those would have 30-40 bits of security.

16 Aug 2010
15:30 PM

Jeremy

@Chris - check out the tpz-base-32 project on github.

I haven't had a chance to really package it up nicely so that others could use it easily but it is there and basically BSD licensed. It uses the z-base-32 encoding which uses a really nice set of characters but which is somewhat under-specified, so check my readme to see how my implementations interpret the spec.

There are two implementations in the project at present, one I call the reference implementation that can handle all sorts of data types, even full-on never-ending streams of bytes. The downside is that it isn't the fastest version. The fastest version handles just 32 bit unsigned integers (unless I have forgotten my own code :) but is ripe for extension with other types since it is super easy to automate tests against the reference implementation. Both implementations have good test coverage.

The integer implementation has also been ported to JavaScript and that too is also on github. This week, I'll be adding a Ruby port.

16 Aug 2010
16:35 PM

tobi

Fero, in your implemention some characters are more likely to appear than others.

I miscalculated the amount of entropy: for every base64 char there are 6 bits of entropy so it is 36-48 for 6 to 8 chars. Humans can remember 5-9 chars in one go for a short time without any training.

16 Aug 2010
17:16 PM

Ryan Heath

@ spot the joke

Do you mean Ag equals Silver? I do not know what Lumin means.

// Ryan

16 Aug 2010
17:22 PM

Ayende Rahien

Ryan,

You got one :-)

16 Aug 2010
17:53 PM

Tuna Toksoz

The other one seems like Lumin=>luminescence which means light :)

17 Aug 2010
06:10 AM

idursun

@Paul, exactly!

thedailywtf.com/.../...omated-Curse-Generator.aspx

17 Aug 2010
18:30 PM

Jeff

Why not take the JMS route and make the client specify the name? JMS uses client name + consumer name to identify clients. Who cares what it is as long the combination is unique.

I don't really understand the security concerns mentioned above if your goal is simply to uniquely identify clients....security is another concern entirely.

17 Aug 2010
22:37 PM

Steve Py

It depends on the intended use. It seems that in a document system the client ID would be a meaningful key, in that it's something that might be presented on screen or printed on a form and used to pull up an individual. Guids are definitely a bad fit for that.

In a distributed environment you're not going to get a 100% reliable unique identifier out of a central store unless you lock the store, query+increment an ID and unlock it. The question is, if you want something like a 6-digit number, how to reliably generate it without blocking the server?

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB