JSON Compression

最新推荐文章于 2023-08-03 11:56:03 发布

ebay

最新推荐文章于 2023-08-03 11:56:03 发布

阅读量1.5k

点赞数

分类专栏：开发文章标签： json

本文链接：https://blog.csdn.net/ebay/article/details/43529179

版权

开发专栏收录该内容

17 篇文章 0 订阅

订阅专栏

Author: Ma, Guolai

Story

JSON data comprises a large majority of content sent around the internet, especially for social networking sites and HTML5 games. One day, someone wants to search something (such as "search for #museum" on Twitter). Where there are large collections of dictionary objects, each one listing the property name and value. It makes the size of return value much larger. Looking at most of these types of data, it`s apparent there`s a structure layout that we can take advantage of compression.

JSON

What`s JSON

At the beginning of the 21st century, Douglas Crockford tried to find a simple data exchange format between client and server. XML is the popular data exchange language at that time, but Douglas thought it is too complex to generate and parse XML. At last, he proposed a new one - JSON.

JSON is really simple, it has a self-documenting format, it is much shorter because there is no data configuration overhead. That is why JSON is considered a fat-free alternative to XML.

JSON is built on two structures:

· A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array.

· An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.

Object:

An object is an unordered set of name/value pairs. An object begins with { (left brace) and ends with } (right brace). Each name is followed by : (colon) and the name/value pairs are separated by , (comma).

Array:

An array is an ordered collection of values. An array begins with [ (left bracket) and ends with ] (right bracket). Values are separated by , (comma).

Advantage

JSON used to have an advantage because it could be directly parsed by a JavaScript engine, but even that advantage is gone because of security and interoperability concerns. About the only thing JSON going for it is that it is usually more compact than the alternative, XML, and it is well supported by many web programming languages.

Problem

Though it is one of the most used data interchanged format, there is still room for improvement.

For one, it converts everything to text. The value 3.141592653589793 takes only 8 bytes of memory, but JSON.stringify() expands it to 17.

A second problem is its excessive use of quotes, which add two bytes to every string.

Thirdly, it has no standard format for using a schema. When multiple objects are serialized in the same message, the key names for each property must be repeated, even though they are the same for each object.

In today`s technologically inclined world sharing of data via internet is increasing day by day and need to share more compressed and reliable form of data is the need so that we can share data at much faster rate and also use low bandwidth.

Compression of JSON data is useful when large data structures must be transmitted from the web browser to the server. Certainly, compression will cost extra effort. However, looking at the current trend where the processors are getting faster and faster and with the surge of mobile devices where bandwidth is the limitation, CJSON should prove to be better technique than JSON.

In any case, if your application makes heavy use of JSON, then you should consider what it`s costing you, and figure out how to minimize that cost.

Compression Algorithms

Here you'll find an analysis of several JSON compression algorithms and a conclusion whether JSON compression is useful and when it should be used.

CJSON

CSJON compress the JSON with automatic type extraction. It tackles the most pressing problem: the need to constantly repeat key names over and over. Using this compression algorithm, the following JSON:

Can be compressed as:

More details about CJSON algorithm can be found here.

HPACK

HPack is a lossless, cross language, performances focused, data set compressor. It is able to reduce up to 70% number of characters used to represent a generic homogeneous collection. This algorithm provides several level of compression (from 0 to 4). The level 0 compression performs the most basic compression by removing keys (property names) from the structure creating a header on index 0 with each property name. Next levels make it possible to reduce even more the size of the JSON by assuming that there are duplicated entries.

For the following JSON:

Can be compressed as:

More details about hpack algorithm can be found here.

JSONPACK

For the following JSON:

Can be compressed as:

More details about jsonpack algorithm can be found here.

Experiment

Experiment environment:

· CPU - intel core i5

· OS - Windows 7(64bit)

· Memory - 4G

Experiment algorithms:

· CJSON

· Hpack

· JSONPack

· Gzipped

· Gzipped and CJSON

· Gzipped and HPack

· Gzipped and JSONPack

Experiment data:

· 5 files export from json-generator varying from 3K to 5MB

· 1 file export from a real webapp

Experiment result:

· compression ratio

· time cost

Table 1 compression ratio

Original JSON size	3k	50k	1M	2M	5M	webapp
CJSON	2517	45865	1106816	2018653	4760795	142224
HPack	2176	41403	1028647	1877062	3912005	failed
JSONPack	2517	47091	1129400	2060855	5037766	142113
Gzipped	1336	15459	290125	527030	1486170	51145
Gzipped and CJSON	1333	15116	286833	521021	1430745	51025
Gzipped and HPack	1132	12412	208088	376125	632593	failed
Gzipped and JSONPack	1450	17957	321494	586758	1765638	52451

Table 2 Time Cost (pack + unpack)(ms)

Original JSON size	3k	50k	1M	2M	5M	webapp
CJSON	5	15	82	181	767	29
HPack	4	2	7	11	24	failed
JSONPack	18	77	1535	4251	89220	34
Gzipped	24	54	1023	1840	4905	352
Gzipped and CJSON	35	100	1081	1860	5257	339
Gzipped and HPack	45	50	839	2106	4826	failed
Gzipped and JSONPack	76	151	2527	6535	94362	350

Analysis and Conclusion

JSON Compression algorithms considerably reduce json file size. HPack seems to be much more efficient than CJSON and JSONPack and also significantly faster. However, HPack only works if all dictionary objects in the array have exactly the same set of property names. It is restricted by the reason that json structure on internet is always complex.

Beyond doubt, gzip is the default choice of most developer, and it doesn`t disappoint us. When Gzip of content is allowed, it has a better efficiency than any other compression algorithm. The one and only disadvantage is cost of time. The reason is that Gzip adds extra processing on sorting key which will increases the burden for both server and client.

CJSON compression algorithm looks nice, however when you gzip the minified and the non-minified version, the file size only differ by about 5%. This is because the Huffman-Encoding in Gzip can deal quite well with repeated text structures. However, CJSON compression before Gzip could reduce the cost of sorting key and time.

Objectively speaking, it doesn`t exist the best compression algorithm for all. Only the most appropriate on specific domain. When all dictionary objects in the json array have exactly the same set of property names (I think it occupy most requirements), HPack is the best choice. When structure is complex but the CPU is powerful, “Gzipped + CJSON” is better than the others.

I`ve realized my json compression code based on cjson algorithm. Here is thesource code.

* 本文版权和/或知识产权归eBay Inc所有。如需引述，请和联系我们DL-eBay-CCOE-Tech@ebay.com。本文旨在进行学术探讨交流，如您认为某些信息侵犯您的合法权益，请联系我们DL-eBay-CCOE-Tech@ebay.com，并在通知中列明国家法律法规要求的必要信息，我们在收到您的通知后将根据国家法律法规尽快采取措施。

ebay

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
JSON Compression

Author: Ma, GuolaiStoryJSON data comprises a large majority of content sent around the internet, especially for social networking sites and HTML5 games. One day, someone wants to search somet
复制链接

扫一扫

专栏目录