JSON Packing, Text Based Formats and other stuff that come to mind at 5 AM

time to read 92 min | 18267 words

This post was written at 5:30AM, I run into this while doing research for another post, and I couldn’t really let it go.

XML as a text base format is really wasteful in space. But that wasn’t what really made it lose its shine. That was when it became so complex that it stopped being human readable. For example, I give you:

   1: <?xml version="1.0" encoding="UTF-8" ?>
   2:  <SOAP-ENV:Envelope
   3:   xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"
   4:   xmlns:xsd="http://www.w3.org/1999/XMLSchema"
   5:   xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
   6:    <SOAP-ENV:Body>
   7:        <ns1:getEmployeeDetailsResponse
   8:         xmlns:ns1="urn:MySoapServices"
   9:         SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
  10:            <return xsi:type="ns1:EmployeeContactDetail">
  11:                <employeeName xsi:type="xsd:string">Bill Posters</employeeName>
  12:                <phoneNumber xsi:type="xsd:string">+1-212-7370194</phoneNumber>
  13:                <tempPhoneNumber
  14:                 xmlns:ns2="http://schemas.xmlsoap.org/soap/encoding/"
  15:                 xsi:type="ns2:Array"
  16:                 ns2:arrayType="ns1:TemporaryPhoneNumber[3]">
  17:                    <item xsi:type="ns1:TemporaryPhoneNumber">
  18:                        <startDate xsi:type="xsd:int">37060</startDate>
  19:                        <endDate xsi:type="xsd:int">37064</endDate>
  20:                        <phoneNumber xsi:type="xsd:string">+1-515-2887505</phoneNumber>
  21:                    </item>
  22:                    <item xsi:type="ns1:TemporaryPhoneNumber">
  23:                        <startDate xsi:type="xsd:int">37074</startDate>
  24:                        <endDate xsi:type="xsd:int">37078</endDate>
  25:                        <phoneNumber xsi:type="xsd:string">+1-516-2890033</phoneNumber>
  26:                    </item>
  27:                    <item xsi:type="ns1:TemporaryPhoneNumber">
  28:                        <startDate xsi:type="xsd:int">37088</startDate>
  29:                        <endDate xsi:type="xsd:int">37092</endDate>
  30:                        <phoneNumber xsi:type="xsd:string">+1-212-7376609</phoneNumber>
  31:                    </item>
  32:                </tempPhoneNumber>
  33:            </return>
  34:        </ns1:getEmployeeDetailsResponse>
  35:    </SOAP-ENV:Body>
  36: /SOAP-ENV:Envelope>

After XML was thrown out of the company of respectable folks, we had JSON show up and entertain us. It is smaller and more concise than XML, and so far has resisted the efforts to make it into some sort of a uber complex enterprisiey tool.

But today I run into quite a few effort to do strange things to JSON. I am talking about things like JSON DB (a compressed json format, not actual json database), JSONH, json.hpack, and friends. All of those attempt to reduce the size of JSON documents.

Let us take an example. the following is a JSON document representing one of RavenDB builds:

   1: {
   2:   "BuildName": "RavenDB Unstable v2.5",
   3:   "IsUnstable": true,
   4:   "Version": "2509-Unstable",
   5:   "PublishedAt": "2013-02-26T12:06:12.0000000",
   6:   "DownloadsIds": [],
   7:   "Changes": [
   8:     {
   9:       "Commiter": {
  10:         "Email": "david@davidwalker.org",
  11:         "Name": "David Walker"
  12:       },
  13:       "Version": "17c661cb158d5e3c528fe2c02a3346305f0234a3",
  14:       "Href": "/app/rest/changes/id:21039",
  15:       "TeamCityId": 21039,
  16:       "Username": "david walker",
  17:       "Comment": "Do not save Has-Api-Key header to metadata\n",
  18:       "Date": "2013-02-20T23:22:43.0000000",
  19:       "Files": [
  20:         "Raven.Abstractions/Extensions/MetadataExtensions.cs"
  21:       ]
  22:     },
  23:     {
  24:       "Commiter": {
  25:         "Email": "david@davidwalker.org",
  26:         "Name": "David Walker"
  27:       },
  28:       "Version": "5ffb4d61ad9102696948f6678bbecac88e1dc039",
  29:       "Href": "/app/rest/changes/id:21040",
  30:       "TeamCityId": 21040,
  31:       "Username": "david walker",
  32:       "Comment": "Do not save IIS Application Request Routing headers to metadata\n",
  33:       "Date": "2013-02-20T23:23:59.0000000",
  34:       "Files": [
  35:         "Raven.Abstractions/Extensions/MetadataExtensions.cs"
  36:       ]
  37:     },
  38:     {
  39:       "Commiter": {
  40:         "Email": "ayende@ayende.com",
  41:         "Name": "Ayende Rahien"
  42:       },
  43:       "Version": "5919521286735f50f963824a12bf121cd1df4367",
  44:       "Href": "/app/rest/changes/id:21035",
  45:       "TeamCityId": 21035,
  46:       "Username": "ayende rahien",
  47:       "Comment": "Better disposal\n",
  48:       "Date": "2013-02-26T10:16:45.0000000",
  49:       "Files": [
  50:         "Raven.Client.WinRT/MissingFromWinRT/ThreadSleep.cs"
  51:       ]
  52:     },
  53:     {
  54:       "Commiter": {
  55:         "Email": "ayende@ayende.com",
  56:         "Name": "Ayende Rahien"
  57:       },
  58:       "Version": "c93264e2a94e2aa326e7308ab3909aa4077bc3bb",
  59:       "Href": "/app/rest/changes/id:21036",
  60:       "TeamCityId": 21036,
  61:       "Username": "ayende rahien",
  62:       "Comment": "Will ensure that the value is always positive or zero (never negative).\nWhen using numeric calc, will div by 1,024 to get more concentration into buckets.\n",
  63:       "Date": "2013-02-26T10:17:23.0000000",
  64:       "Files": [
  65:         "Raven.Database/Indexing/IndexingUtil.cs"
  66:       ]
  67:     },
  68:     {
  69:       "Commiter": {
  70:         "Email": "ayende@ayende.com",
  71:         "Name": "Ayende Rahien"
  72:       },
  73:       "Version": "7bf51345d39c3993fed5a82eacad6e74b9201601",
  74:       "Href": "/app/rest/changes/id:21037",
  75:       "TeamCityId": 21037,
  76:       "Username": "ayende rahien",
  77:       "Comment": "Fixing a bug where we wouldn't decrement reduce stats for an index when multiple values from the same bucket are removed\n",
  78:       "Date": "2013-02-26T10:53:01.0000000",
  79:       "Files": [
  80:         "Raven.Database/Indexing/MapReduceIndex.cs",
  81:         "Raven.Database/Storage/Esent/StorageActions/MappedResults.cs",
  82:         "Raven.Database/Storage/IMappedResultsStorageAction.cs",
  83:         "Raven.Database/Storage/Managed/MappedResultsStorageAction.cs",
  84:         "Raven.Tests/Issues/RavenDB_784.cs",
  85:         "Raven.Tests/Storage/MappedResults.cs",
  86:         "Raven.Tests/Views/ViewStorage.cs"
  87:       ]
  88:     },
  89:     {
  90:       "Commiter": {
  91:         "Email": "ayende@ayende.com",
  92:         "Name": "Ayende Rahien"
  93:       },
  94:       "Version": "ff2c5b43eba2a8a2206152658b5e76706e12945c",
  95:       "Href": "/app/rest/changes/id:21038",
  96:       "TeamCityId": 21038,
  97:       "Username": "ayende rahien",
  98:       "Comment": "No need for so many repeats\n",
  99:       "Date": "2013-02-26T11:27:49.0000000",
 100:       "Files": [
 101:         "Raven.Tests/Bugs/MultiOutputReduce.cs"
 102:       ]
 103:     },
 104:     {
 105:       "Commiter": {
 106:         "Email": "ayende@ayende.com",
 107:         "Name": "Ayende Rahien"
 108:       },
 109:       "Version": "0620c74e51839972554fab3fa9898d7633cfea6e",
 110:       "Href": "/app/rest/changes/id:21041",
 111:       "TeamCityId": 21041,
 112:       "Username": "ayende rahien",
 113:       "Comment": "Merge branch 'master' of https://github.com/cloudbirdnet/ravendb into 2.1\n",
 114:       "Date": "2013-02-26T11:41:39.0000000",
 115:       "Files": [
 116:         "Raven.Abstractions/Extensions/MetadataExtensions.cs"
 117:       ]
 118:     }
 119:   ],
 120:   "ResolvedIssues": [],
 121:   "Contributors": [
 122:     {
 123:       "FullName": "Ayende Rahien",
 124:       "Email": "ayende@ayende.com",
 125:       "EmailHash": "730a9f9186e14b8da5a4e453aca2adfe"
 126:     },
 127:     {
 128:       "FullName": "David Walker",
 129:       "Email": "david@davidwalker.org",
 130:       "EmailHash": "4e5293ab04bc1a4fdd62bd06e2f32871"
 131:     }
 132:   ],
 133:   "BuildTypeId": "bt8",
 134:   "Href": "/app/rest/builds/id:588",
 135:   "ProjectName": "RavenDB",
 136:   "TeamCityId": 588,
 137:   "ProjectId": "project3",
 138:   "Number": 2509
 139: }

This document is 4.52KB in size. Running this through JSONH gives us the following:

   1: [
   2:     14,
   3:     "BuildName",
   4:     "IsUnstable",
   5:     "Version",
   6:     "PublishedAt",
   7:     "DownloadsIds",
   8:     "Changes",
   9:     "ResolvedIssues",
  10:     "Contributors",
  11:     "BuildTypeId",
  12:     "Href",
  13:     "ProjectName",
  14:     "TeamCityId",
  15:     "ProjectId",
  16:     "Number",
  17:     "RavenDB Unstable v2.5",
  18:     true,
  19:     "2509-Unstable",
  20:     "2013-02-26T12:06:12.0000000",
  21:     [
  22:     ],
  23:     [
  24:         {
  25:             "Commiter": {
  26:                 "Email": "david@davidwalker.org",
  27:                 "Name": "David Walker"
  28:             },
  29:             "Version": "17c661cb158d5e3c528fe2c02a3346305f0234a3",
  30:             "Href": "/app/rest/changes/id:21039",
  31:             "TeamCityId": 21039,
  32:             "Username": "david walker",
  33:             "Comment": "Do not save Has-Api-Key header to metadata\n",
  34:             "Date": "2013-02-20T23:22:43.0000000",
  35:             "Files": [
  36:                 "Raven.Abstractions/Extensions/MetadataExtensions.cs"
  37:             ]
  38:         },
  39:         {
  40:             "Commiter": {
  41:                 "Email": "david@davidwalker.org",
  42:                 "Name": "David Walker"
  43:             },
  44:             "Version": "5ffb4d61ad9102696948f6678bbecac88e1dc039",
  45:             "Href": "/app/rest/changes/id:21040",
  46:             "TeamCityId": 21040,
  47:             "Username": "david walker",
  48:             "Comment": "Do not save IIS Application Request Routing headers to metadata\n",
  49:             "Date": "2013-02-20T23:23:59.0000000",
  50:             "Files": [
  51:                 "Raven.Abstractions/Extensions/MetadataExtensions.cs"
  52:             ]
  53:         },
  54:         {
  55:             "Commiter": {
  56:                 "Email": "ayende@ayende.com",
  57:                 "Name": "Ayende Rahien"
  58:             },
  59:             "Version": "5919521286735f50f963824a12bf121cd1df4367",
  60:             "Href": "/app/rest/changes/id:21035",
  61:             "TeamCityId": 21035,
  62:             "Username": "ayende rahien",
  63:             "Comment": "Better disposal\n",
  64:             "Date": "2013-02-26T10:16:45.0000000",
  65:             "Files": [
  66:                 "Raven.Client.WinRT/MissingFromWinRT/ThreadSleep.cs"
  67:             ]
  68:         },
  69:         {
  70:             "Commiter": {
  71:                 "Email": "ayende@ayende.com",
  72:                 "Name": "Ayende Rahien"
  73:             },
  74:             "Version": "c93264e2a94e2aa326e7308ab3909aa4077bc3bb",
  75:             "Href": "/app/rest/changes/id:21036",
  76:             "TeamCityId": "...bug where we wouldn't decrement reduce stats for an index when multiple values from the same bucket are removed\n",
  77:             "Date": "2013-02-26T10:53:01.0000000",
  78:             "Files": [
  79:                 "Raven.Database/Indexing/MapReduceIndex.cs",
  80:                 "Raven.Database/Storage/Esent/StorageActions/MappedResults.cs",
  81:                 "Raven.Database/Storage/IMappedResultsStorageAction.cs",
  82:                 "Raven.Database/Storage/Managed/MappedResultsStorageAction.cs",
  83:                 "Raven.Tests/Issues/RavenDB_784.cs",
  84:                 "Raven.Tests/Storage/MappedResults.cs",
  85:                 "Raven.Tests/Views/ViewStorage.cs"
  86:             ]
  87:         },
  88:         {
  89:             "Commiter": {
  90:                 "Email": "ayende@ayende.com",
  91:                 "Name": "Ayende Rahien"
  92:             },
  93:             "Version": "ff2c5b43eba2a8a2206152658b5e76706e12945c",
  94:             "Href": "/app/rest/changes/id:21038",
  95:             "TeamCityId": 21038,
  96:             "Username": "ayende rahien",
  97:             "Comment": "No need for so many repeats\n",
  98:             "Date": "2013-02-26T11:27:49.0000000",
  99:             "Files": [
 100:                 "Raven.Tests/Bugs/MultiOutputReduce.cs"
 101:             ]
 102:         },
 103:         {
 104:             "Commiter": {
 105:                 "Email": "ayende@ayende.com",
 106:                 "Name": "Ayende Rahien"
 107:             },
 108:             "Version": "0620c74e51839972554fab3fa9898d7633cfea6e",
 109:             "Href": "/app/rest/changes/id:21041",
 110:             "TeamCityId": 21041,
 111:             "Username": "ayende rahien",
 112:             "Comment": "Merge branch 'master' of https://github.com/cloudbirdnet/ravendb into 2.1\n",
 113:             "Date": "2013-02-26T11:41:39.0000000",
 114:             "Files": [
 115:                 "Raven.Abstractions/Extensions/MetadataExtensions.cs"
 116:             ]
 117:         }
 118:     ],
 119:     [
 120:     ],
 121:     [
 122:         {
 123:             "FullName": "Ayende Rahien",
 124:             "Email": "ayende@ayende.com",
 125:             "EmailHash": "730a9f9186e14b8da5a4e453aca2adfe"
 126:         },
 127:         {
 128:             "FullName": "David Walker",
 129:             "Email": "david@davidwalker.org",
 130:             "EmailHash": "4e5293ab04bc1a4fdd62bd06e2f32871"
 131:         }
 132:     ],
 133:     "bt8",
 134:     "/app/rest/builds/id:588",
 135:     "RavenDB",
 136:     588,
 137:     "project3",
 138:     2509
 139: ]

It reduced the document size to 2.93KB! Awesome, nearly half of the size was gone. Except: This is actually generating utterly unreadable mess. I mean, can you look at this and figure out what the hell is going on.

I thought not. At this point, we might as well use a binary format. I happen to have a zip tool at my disposal, so I checked what would happen if I threw this through that. The end result was a file that was 1.42KB. And I had no more loss of readability than I have with the JSONH stuff.

To be frank, I just don’t get efforts like this. JSON is a text base human readable format. If you lose the human readable portion of the format, you might as well drop directly to binary. It is likely to be more efficient and you don’t lose anything by it.

And if you want to compress your data, it is probably better to use something like a compression tool. HTTP Compression, for example, is practically free, since all servers and clients should be able to consume it now. And any tool that you use should be able to inspect through it. And it is likely to generate much better results on your JSON documents than if you will try a clever format like this.