JSON Compression

Author:  Ma, Guolai

Story

JSON data comprises a large majority of content sent around the internet, especially for social networking sites and HTML5 games. One day, someone wants to search something (such as "search for #museum" on Twitter). Where there are large collections of dictionary objects, each one listing the property name and value. It makes the size of return value much larger. Looking at most of these types of data, it`s apparent there`s a structure layout that we can take advantage of compression.

JSON

What`s JSON

At the beginning of the 21st century, Douglas Crockford tried to find a simple data exchange format between client and server. XML is the popular data exchange language at that time, but Douglas thought it is too complex to generate and parse XML. At last, he proposed a new one - JSON.

JSON is really simple, it has a self-documenting format, it is much shorter because there is no data configuration overhead. That is why JSON is considered a fat-free alternative to XML.

JSON is built on two structures:

·         A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array.

·         An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.

Object:

An object is an unordered set of name/value pairs. An object begins with { (left brace) and ends with } (right brace). Each name is followed by : (colon) and the name/value pairs are separated by , (comma).

Array:

An array is an ordered collection of values. An array begins with [ (left bracket) and ends with ] (right bracket). Values are separated by , (comma).

Advantage

JSON used to have an advantage because it could be directly parsed by a JavaScript engine, but even that advantage is gone because of security and interoperability concerns. About the only thing JSON going for it is that it is usually more compact than the alternative, XML, and it is well supported by many web programming languages.

Problem

Though it is one of the most used data interchanged format, there is still room for improvement.

For one, it converts everything to text. The value 3.141592653589793 takes only 8 bytes of memory, but JSON.stringify() expands it to 17.

A second problem is its excessive use of quotes, which add two bytes to every string.

Thirdly, it has no standard format for using a schema. When multiple objects are serialized in the same message, the key names for each property must be repeated, even though they are the same for each object.

In today`s technologically inclined world sharing of data via internet is increasing day by day and need to share more compressed and reliable form of data is the need so that we can share data at much faster rate and also use low bandwidth.

Compression of JSON data is useful when large data structures must be transmitted from the web browser to the server. Certainly, compression will cost extra effort. However, looking at the current trend where the processors are getting faster and faster and with the surge of mobile devices where bandwidth is the limitation, CJSON should prove to be better technique than JSON.

In any case, if your application makes heavy use of JSON, then you should consider what it`s costing you, and figure out how to minimize that cost.

Compression Algorithms

         Here you'll find an analysis of several JSON compression algorithms and a conclusion whether JSON compression is useful and when it should be used.

CJSON

CSJON compress the JSON with automatic type extraction. It tackles the most pressing problem: the need to constantly repeat key names over and over. Using this compression algorithm, the following JSON:

Can be compressed as:

More details about CJSON algorithm can be found here.

HPACK

HPack is a lossless, cross language, performances focused, data set compressor. It is able to reduce up to 70% number of characters used to represent a generic homogeneous collection. This algorithm provides several level of compression (from 0 to 4). The level 0 compression performs the most basic compression by removing keys (property names) from the structure creating a header on index 0 with each property name. Next levels make it possible to reduce even more the size of the JSON by assuming that there are duplicated entries.

For the following JSON:

Can be compressed as:

More details about hpack algorithm can be found here.

JSONPACK

For the following JSON:

Can be compressed as:

More details about jsonpack algorithm can be found here.

Experiment

Experiment environment:

·         CPU - intel core i5

·         OS - Windows 7(64bit)

·         Memory - 4G

Experiment algorithms:

·         CJSON

·         Hpack

·         JSONPack

·         Gzipped

·         Gzipped and CJSON

·         Gzipped and HPack

·         Gzipped and JSONPack

Experiment data:

·         5 files export from json-generator varying from 3K to 5MB

·         1 file export from a real webapp

Experiment result:

·         compression ratio

·         time cost

Table 1 compression ratio

 Original JSON size

3k

50k

1M

2M

5M

webapp

CJSON

2517

45865

1106816

2018653

4760795

142224

HPack

2176

41403

1028647

1877062

3912005

failed

JSONPack

2517

47091

1129400

2060855

5037766

142113

Gzipped

1336

15459

290125

527030

1486170

51145

Gzipped and CJSON

1333

15116

286833

521021

1430745

51025

Gzipped and HPack

1132

12412

208088

376125

632593

failed

Gzipped and JSONPack

1450

17957

321494

586758

1765638

52451

Table 2 Time Cost (pack + unpack)(ms)

Original JSON size

3k

50k

1M

2M

5M

webapp

CJSON

5

15

82

181

767

29

HPack

4

2

7

11

24

failed

JSONPack

18

77

1535

4251

89220

34

Gzipped

24

54

1023

1840

4905

352

Gzipped and CJSON

35

100

1081

1860

5257

339

Gzipped and HPack

45

50

839

2106

4826

failed

Gzipped and JSONPack

76

151

2527

6535

94362

350

Analysis and Conclusion

JSON Compression algorithms considerably reduce json file size. HPack seems to be much more efficient than CJSON and JSONPack and also significantly faster. However, HPack only works if all dictionary objects in the array have exactly the same set of property names. It is restricted by the reason that json structure on internet is always complex.

Beyond doubt, gzip is the default choice of most developer, and it doesn`t disappoint us. When Gzip of content is allowed, it has a better efficiency than any other compression algorithm. The one and only disadvantage is cost of time. The reason is that Gzip adds extra processing on sorting key which will increases the burden for both server and client.

CJSON compression algorithm looks nice, however when you gzip the minified and the non-minified version, the file size only differ by about 5%. This is because the Huffman-Encoding in Gzip can deal quite well with repeated text structures. However, CJSON compression before Gzip could reduce the cost of sorting key and time.

Objectively speaking, it doesn`t exist the best compression algorithm for all. Only the most appropriate on specific domain. When all dictionary objects in the json array have exactly the same set of property names (I think it occupy most requirements), HPack is the best choice. When structure is complex but the CPU is powerful, “Gzipped + CJSON” is better than the others.

I`ve realized my json compression code based on cjson algorithm. Here is thesource code.

* 本文版权和/或知识产权归eBay Inc所有。如需引述,请和联系我们DL-eBay-CCOE-Tech@ebay.com。本文旨在进行学术探讨交流,如您认为某些信息侵犯您的合法权益,请联系我们DL-eBay-CCOE-Tech@ebay.com,并在通知中列明国家法律法规要求的必要信息,我们在收到您的通知后将根据国家法律法规尽快采取措施。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值