by harshulj on 4/6/15, 4:20 AM with 71 comments
by jmoiron on 4/6/15, 5:43 AM
However, after swapping a fairly large and json-intensive production spider over to ujson, we noticed a large increase in memory use.
When I investigated, I discovered that simplejson reused allocated string objects, so when parsing/loading you basically got string compression for repeated string keys.
The effects were pretty large for our dataset, which was all API results from various popular websites and featured lots of lists of things with repeating keys; on a lot of large documents, the loaded mem object was sometimes 100M for ujson and 50M for simplejson. We ended up switching back because of this.
by borman on 4/6/15, 12:58 PM
cjson's way of handling unicode is just plain wrong: it uses utf-8 bytes as unicode code points. ujson cannot handle large numbers (somewhat larger than 263, i've seen a service that encodes unsigned 64-bit hash values in JSON this way: ujson fails to parse its payloads). With simplejson (when using speedups module), string's type depends on its value, i.e. it decodes strings as 'str' type if their characters are ascii-only, but as 'unicode' otherwise; strangely enough, it always decodes strings as unicode (like standard json module) when speedups are disables.
by Drdrdrq on 4/6/15, 6:32 AM
by jbergstroem on 4/6/15, 9:19 AM
Its syntax is nginx-like but can also parse strict json. It's pretty fast too.
More info here: https://github.com/vstakhov/libucl
by wodenokoto on 4/6/15, 6:01 AM
by chojeen on 4/6/15, 2:02 PM
by michaelmior on 4/6/15, 11:47 AM
So I can't serialize things with ultrajson that aren't serializable? I must be missing something in this statement.
> The verdict is pretty clear. Use simplejson instead of stock json in any case...
The verdict seems clear (based solely on the data in the post) that ultrajson is the winner.
by jroseattle on 4/6/15, 5:12 AM
Well-defined collections? As in, serializable? Well sure, that's requisite for the native json package as well as simplejson (as far as I can recall -- haven't used simplejson in some time.)
But does "texts" refer to strings? As in, only one data type? The source code certainly supports other types, so I wonder what this statement refers to.
by foota on 4/6/15, 5:08 AM
by jkire on 4/6/15, 8:53 AM
What about larger dictionaries? With such a small one I would be worried that a significant proportion of the time would be simple overhead.
[Warning: Anecdote] When we were testing out the various JSON libraries we found simplejson much faster than json for dumps. We used large dictionaries.
Was the simplejson package using its optimized C library?
by ktzar on 4/6/15, 4:54 AM
by stared on 4/6/15, 9:46 AM
(BTW: I got tempted to try ujson exactly for the original blog post, i.e. http://blog.dataweave.in/post/87589606893/json-vs-simplejson...)
Plus, AFAIK, at least in Python 3 json IS simplejson (but a few version older). So every comparison of these libraries is going to give different results over time (likely, with difference getting smaller). Of course, simpejson is the newer thing of the same, so it's likely to be better.
by willvarfar on 4/6/15, 8:45 AM
I leave this here in case it helps others.
We had other focus such as good for both python and java.
At the time we went msgpack. As msgpack is doing much the same work as json, it just shows that the magic is in the code not the format..)
by apu on 4/6/15, 7:11 AM
by dbenhur on 4/6/15, 5:34 AM
JSON is a data representation, not a data model.
by js2 on 4/6/15, 3:55 PM
by velox_io on 4/6/15, 12:12 PM
The speed deference between working with binary streams and parsing text is night and day.
by akoumjian on 4/6/15, 4:18 PM
It was a big disappointment after seeing these kinds of performance improvements.
by MagicWishMonkey on 4/6/15, 2:12 PM
by bpicolo on 4/6/15, 7:01 PM
by fijal on 4/6/15, 12:05 PM
by UUMMUU on 4/6/15, 12:37 PM
by aaronem on 4/6/15, 5:46 AM