Show HN: Gojay – Performant JSON encoder/decoder for Golang

153 points by francoisllm 6 years ago

nieksand 6 years ago

The fastest Go JSON parser I know of is https://github.com/buger/jsonparser. I've used that one in production quite successfully.

I don't see that in the comparative benchmarks for Gojay.

latch 6 years ago

> I don't see that in the comparative benchmarks for Gojay.
Look again, JasonParser is included in some, but not all, of the benchmarks. JasonParser [apparently] allocates no memory, which, to me, seems pretty compelling.
- nieksand 6 years ago
  
  You are right. I totally missed it in the first few tables.
  The lack of allocs for JsonParser was definitely a big win for my project. When processing at high rps (40k msg/sec), GC is near the top of all my performance profiles. Driving allocs down with Buger's parser was very helpful for getting the throughput up.
  
  Twirrim 6 years ago
  
  There's an important distinction between the decoding speed of documents and overall throughput.
  Wonder what can be done to realistically generate something approximating a real world benchmark to cover the throughput scenario.
  
  zerr 6 years ago
  
  What's the project? Seems like a language with a GC is not a good choice, is it?
  
  nieksand 6 years ago
  
  Essentially a gloried batcher / data transformer consuming from Kafka.
  Even with the GC overhead, I will still able to hit my performance goal of 40k msg/sec per AWS c3.4xl.
  In a vacuum I would have picked Rust over Go for this. But Go is widely used at my work, whereas I'm the only Rust guy.
  
  woolvalley 6 years ago
  
  How much of a difference you think rust would of made? Have you made something similar?
  
  nieksand 6 years ago
  
  I've written some high req/sec projects in Rust, but nothing that would be apples-to-apples to that particular Go project.
  The memory access patterns are straight-forward enough that the remaining GC could definitely go away. If I had to make a wild, from the hip, and unsubstantiated guess, maybe a 2x to 5x performance multiple. That would include a comparable amount of time spent optimizing and would depend greatly on the quality of the Kafka crates for Rust (which I've never used).
- abiox 6 years ago
  
  > JasonParser
  you had me actually looking for something called JasonParser for a few moments. :P
  
  conradfr 6 years ago
  
  Good news everyone!
  https://github.com/michalmuskala/jason
mintcrate 6 years ago

That project is not an encoder/decoder - it just returns values of keys. They are not comparable as the use cases are entirely different.
- nieksand 6 years ago
  
  When it comes to parsing JSON, they are indeed not doing the same thing. Buger's parser is not unmarshalling into an annotated struct.
  But often that struct isn't your end goal in life, such as when your data is being transformed, dispatched on, or being moved into another yet data structure. In that case unmarshalling to an intermediate structure, while convenient, is also extra computational work.
  Buger's parser isn't the first thing I'd reach for, but if JSON parsing performance is an issue, ObjectEach() can be used in many places where you'd otherwise invoke json.Unmarshal.
  
  alex_dev 6 years ago
  
  Hence why the above poster said different use cases and why comparing between them is ridiculous. Using ObjectEach() for every field to reach parity with json.Unmarshal is unconscionable.
tomohawk 6 years ago

Check the source to the benchmarks
https://github.com/francoispqt/gojay#benchmark-results
segmondy 6 years ago

Is it? Kubernete switched from the native go Json parser to codec for performance increase. http://ugorji.net/blog/go-codecgen

tptacek 6 years ago

Performance is a real issue for Go's standard JSON, but this is a lot of extra boilerplate code to have to write (I'd probably codegen most of this if I had to), so I'd assume the reasonable strategy would be to implement with encoding/json, profile, and then just GoJay the hotspots.

laumars 6 years ago

It looks like this has a Marshal and Unmarshal function just like it's core library counterpart. So I'd guess you might be able to use this as a drop in replacement. However I'm yet to prove that theory.
- tptacek 6 years ago
  
  encoding/json's Marshal and Unmarshall uses reflect and struct tags. This library doesn't: you have to define a function to make your struct satisfy an interface.
  
  laumars 6 years ago
  
  You don't have to use structs tags (nor even structs) to use encoding/json. In fact I often don't as a lot of my usage is with maps rather than structs.
  I did some testing with gojay and it did work as a drop in replacement for most of my usage. However the performance improvements I saw in my benchmarks were not nearly as favourable as those published in the projects git repository. I'm sure I could get better results if I played around with their APIs a little more rather than just using the mashallers but frankly the utility I'm using it in favours the flexibility of encoding/json a little more anyway, even if that does cost me a little in performance (and to be clear, it really wasn't much as I needed to iterate the execution of my utility a thousand times just to get any meaningful differences between the two libraries. So we're not talking real world usages).
  That said, if you're building high performance servers then I'm sure gojay would really stand out. On reflection (no pun intended), my requirements wasn't really the target use case for this package.
  
  tptacek 6 years ago
  
  That's fine, but this library isn't a drop-in replacement for encoding/json; to be that, it has to work for people who expect tagged structs to round-trip through it, which won't happen with this library.
  
  laumars 6 years ago
  
  That's fine but I never stated "drop in for all use cases". A point I made very clear in my second post. But for some use cases it can be. As I had explained already.
  The rest should be abundantly clear which use case would apply to anyone who's spent more than five minutes in Go (or any programming language that supports reflection) and had read the first like of the packages readme (ie that it doesn't use reflection).
  It's definitely worth remembering that one doesn't have to use structs and tags to write nor read JSON in Go before people start bitching about boiler plate code and lack of macros.
  
  phamilton 6 years ago
  
  Macros would solve this problem nicely if they we're available. I sure wish first class macros were on the golang roadmap.
  
  tptacek 6 years ago
  
  If you want to code in a language with macros, pick a language with macros. They tend not to look like Go (or C, or C++), because part of what makes macros effective is that coding in those languages is pretty close to working with their ASTs. Writing a Lisp-like macro for Go sounds like it would actually be more annoying than simply writing a codegen for it.
  
  bpicolo 6 years ago
  
  I don't think there's anything precluding Go from making it possible, other than the maintainers' desire to not have macros. Rust gives you the power here, and as a result has some tremendous serialization libs:
  https://github.com/serde-rs/serde
  Macros don't need to be as simple to use as in lisps to be useful - the rare library like serialization that can really benefit from them can be worth the language feature
  
  tptacek 6 years ago
  
  Rust is a good counterexample, yep.
  
  phamilton 6 years ago
  
  The end result with macros is far more seamless than with codegen, the point being that a consumer of the library doesn't even need to know about the macros. The code just works like the reflection approach, but without the runtime penalty. Codegen on the other hand makes an implementation detail a maintenance cost for the user of the library.
  
  stcredzero 6 years ago
  
  The end result with macros is far more seamless than with codegen, the point being that a consumer of the library doesn't even need to know about the macros.
  A double edged sword, that! Sometimes the end result with things like macros, is that it so seamless, that there is precious little to figure out what went wrong. Often, if your nifty new facility seems magic, its bugs are going to seem doubly magic. Not too long ago, I had to debug an exception for which no source code existed. It only existed as the confluence of 3 C++ templates.
  Codegen on the other hand makes an implementation detail a maintenance cost for the user of the library.
  I've heard veteran programmers say that there should be a separation of labor when it comes to libraries. Only certain people make good library writers, in the same way that only certain people make excellent musical accompanists. To be an excellent accompanist, the accompanist should have some sense of what it's like to be the lead. (So should themselves be able to take the lead.) The accompanist should have the perspective to not let his ego get in the way. The accompanist should be giving the lead what she wants, but modulated through their own expertise and sense of taste. (As opposed to mindlessly always giving everything asked for.)
  If there is a separation of responsibility, why would codegen be such a burden? If the library has to change so often, perhaps the responsibilities aren't distributed optimally?
  
  pcwalton 6 years ago
  
  Macros usually make debugging easier than code generators do. That's because quasiquote and similar operators let you debug code, not code that generates code.
  I'm the first to admit that macros have downsides compared to not having them in the language, but ease of debugging generated code isn't one of those downsides.
  
  stcredzero 6 years ago
  
  I've generally found that C/C++ macros make debugging harder. Debuggers don't always deal with debugging them the most straightforward thing.
  That's because quasiquote and similar operators let you debug code, not code that generates code.
  What I've found is that you can debug the generated code to debug the code generator. Code generators shouldn't be used to do something terribly complicated. I'd agree with you that code generators doing complicated things is a code smell. Leave those to native syntax, provided it's not as badly thought out as templates.
  
  phamilton 6 years ago
  
  > C/C++ macros
  I'd argue that those aren't first class macros. C macros are just source manipulators, and C++ templates (I'm not super experienced with them so forgive come if I'm way off) don't really manipulate the AST, it's more a type system with compile time resolution.
  
  phamilton 6 years ago
  
  > A double edged sword, that! Sometimes the end result with things like macros, is that it so seamless, that there is precious little to figure out what went wrong.
  First class macros generally contain the same logic as a reflection based approach, but they execute at compile-time and memoize the result. If you can debug compilation (which languages with first class macros generally support), then debugging is roughly the same. There are definitely cases where that's not true, but in this domain (serialization) a macro based solution is well understood. See https://github.com/devinus/poison/blob/62e98f19552289f3f7139... for a macro based example in Elixir.
  
  pcwalton 6 years ago
  
  S-expressions aren't actually necessary for macros to work. You just have to be able to separate "READ" from "EXPAND" (to use Lisp-ish terminology).
  I believe, but am not sure, that this is possible for Go. (Of course, whether this is a good idea is another thing entirely. I probably wouldn't have included macros if I had been put in charge of Go 1.0, given Go's goals of simplicity.)
  More on this from my former colleague Dave Herman, who did his dissertation on macros: http://calculist.org/blog/2012/04/17/homoiconicity-isnt-the-...
  
  vorg 6 years ago
  
  An alternative to both Lisp-like macros and code generation is to trigger macros via annotations in the syntax, and for the macro to manipulate the AST directly using the "go/ast" package. Apache Groovy, which has the same curly-braces style syntax as Golang, does it like this.
logicallee 6 years ago

You mention codegening, which is common in Go for a variety of reasons (as I understand the language does not support so much as a red-black binary tree - or any other containers that aren't part of the syntax - without codegening).
An additional reason for codegening is the extreme syntactic simplicity of the language: there is little syntactic sugar and it has a tiny, simple, explicit specification.
On the other hand the syntax is also meant to be clear, unambiguous, and human-readable. I was thinking what is good for humans to read might not be the easiest to get programs to output.
Question: since you seem to know about Codegening, do you find Go's grammar, syntax, etc to be a good target for codegening?
I don't have a more specific question but you could talk about any other feelings you have about codegening against Go (or any other language)
- Vendan 6 years ago
  
  Having written a data serialization library based on codegen, I would have to say that Go is relatively simple to codegen. One big annoyance is the import system of "if you don't need it, it's an error to import it.", which, while simple enough for manual code, means you have to do extra work while generating code to keep track of what you actually use (i.e. if none of the things are a time.Time, you don't need to import "time"). The other approach is to use stuff like "var _ = time.Now" kind of constructs, which do make the codegen easier, but I personally feel are far dirtier.
  One very nice feature of go is that the formatter is available as a library (https://golang.org/pkg/go/format/#Source), so you can gen "raw" code and then throw it through the formatter to get nice pretty output.
  
  dullgiulio 6 years ago
  
  Unused imports are definitely not an unsolved problem: as last step after dumbly generating code, run goimports.
  
  Vendan 6 years ago
  
  Never said "unsolved", I said "annoyance". And I tend to write the code to figure it out and do the right thing. My annoyance is with code gens that do https://github.com/golang/protobuf/blob/master/protoc-gen-go.... Not even goimports will strip those...
francoisllm 6 years ago

The next milestone is actually to write a code generator for structs, maps and slices.
- Rapzid 6 years ago
  
  Out of curiosity, is their any consideration to run-time gen and compilation followed by dynamic loading? There would obviously be UX to startup time tradeoff involved..
  I'm not sure how most languages tackle high performance json encoding/decoding, but I have seen a few .Net implementations that take advantage of the JIT to gain performance while providing a more seamless library consumer experience.
  
  Vendan 6 years ago
  
  Dynamic loading for go is not well supported, and that kind of "on the fly" compilation, JIT or otherwise, is not supported at all...
segmondy 6 years ago

That's exactly what go-codec does

krylon 6 years ago

I have been using ffjson[1] in a couple of private projects, which generates code to encode/decode data, which supposedly is faster than the json library from Go's standard library.

Has somebody any first-hand expertise how this compares to the competition?

(I could, of course, do my own benchmarks, but so far I have not had any performance issues on this end, so it has not been a pressing need. The only problem I have run into with ffjson is that various linters will bark at the generated code, but again, that has not been a big problem.)

[1] https://github.com/pquerna/ffjson

pquerna 6 years ago

(ffjson author here)
The main feature that ffjson has that most of the non-stdlib JSON libraries is stdlib compatibility. Eg, the same struct-tags and interfaces used in stdlib are used by ffjson. It's just trying move most of the reflection / allocations / etc to a `go generate` step vs runtime.
If you abandon trying to stay consistent with the stdlib JSON, and make new APIs/interfaces for propagating the encoder or decoder state, as Gojay has, it will undoubtedly be faster than ffjson or stdlib.
- krylon 6 years ago
  
  Thank you very much!

BillinghamJ 6 years ago

Would be handy if it included a utility to generate the (un)marshaling functions, so that can just be done as a build step.

So far we’ve used https://github.com/mailru/easyjson for that, but the code generation step is unbelievably slow - multiple minutes for about 100 structs.

francoisllm 6 years ago

It's actually the next milestone, a generator for structs, maps and slices. Until now the goal was to make it ready for some high traffic services in production. These traffic receive JSON with chinese, vietnamese character so we needed to make sure it works well first (boilerplate was not an issue).
- anothergoogler 6 years ago
  
  Go is pretty good about Unicode, did you run into issues with Chinese and Vietnamese with standard encoding/json package or with other third-party Go JSON libraries?
  
  francoispqr 6 years ago
  
  Nope, it's quite easy to integrate unicode parsing :) just need to check for "\u1234" strings in JSON as it is valid, also need to check for utf16 surrogates. Standard package does it already just had to implement it in Gojay.

dtolnay 6 years ago

To see how this fares against native code, I ported their benchmark to Rust's JSON library https://serde.rs/. Disclaimer: I am a maintainer of Serde.

Numbers and graphs: https://github.com/serde-rs/json-benchmark/tree/gojay#serde

Rust source code: https://github.com/serde-rs/json-benchmark/blob/gojay/src/li...

TL;DR GoJay ranges from 20% slower to 2.7x slower depending on workload.

IshKebab 6 years ago

Go is "native code" but anyway nice work!
- Groxx 6 years ago
  
  As always, it's a range.
  Go has a runtime which you cannot control which schedules your code when it feels like it and a garbage collector. Rust has neither.
  Sure, Go isn't parsing and interpreting its files at runtime. But neither does Python, so I'm not sure that's a meaningful line to draw.
  
  abiox 6 years ago
  
  > Sure, Go isn't parsing and interpreting its files at runtime
  afaik, referring to something as "native code" just indicates it isn't compiled to/executing as bytecode. not anything to do with gc, runtime, etc.
  
  Groxx 6 years ago
  
  Depends on who you ask. As demonstrated by literally every commenter in this thread so far.
  Besides, define bytecode. Once it has run through a JIT and diverged from the on-disk representation, is it now native? How is it distinguishable from a binary that detects your CPU architecture and executes different branches of code? What about other forms of self-modifying code?
  There are cases where an easy argument can be made (e.g. Java, which has a separately-supplied VM to run your bytecode... but then where's the line with DLLs?), but there isn't an unambiguous line in the sand here. At any line you can produce a new system that straddles it (e.g. a JAR which ships with its own VM. the VM is native code, is the binary now native or not?), and often there are already widely-used examples.
akmittal 6 years ago

Is serde best in rust world.
- staticassertion 6 years ago
  
  Serde is definitely the standard for rust. There's another project, pikkr, that takes a different approach that can work well for specific workloads:
  https://github.com/pikkr/pikkr

laumars 6 years ago

This looks perfect for a project I'm working on which is a UNIX / Linux $SHELL that makes heavy use of JSON pipelining.

agumonkey 6 years ago

structured shell, go on
- zbentley 6 years ago
  
  https://github.com/PowerShell/PowerShell
  
  laumars 6 years ago
  
  Powershell is object oriented and, in my opinion at least, an absolute pig for doing quick one liners (overly verbose syntax, pipelines bork if types mismatch even when the data getting passed is still essentially just textual). I wanted something that was still in the same realm of typical UNIX shells (even with Bash comparability where sensible) but with an awareness of complex data formats.
  However everyone has their own opinions and preferences. I wrote my shell to scratch my own personal itch and if others like / use it that is a bonus.
- laumars 6 years ago
  
  https://github.com/lmorg/murex
  It's not without its bugs but I now use it everyday as my primary shell. Documentation is about 30% there but that's something I'm actively working on at the moment.
  Happy to receive any feedback, possitive or negative :)

edhelas 6 years ago

How does it compare with encoding the same kind of data in a XML stream?

laumars 6 years ago

That would depend on the XML interface you used. I don't really rate the XML parser in the Go core library much. But I think as much if that is my own personal biased against XML.
- dullgiulio 6 years ago
  
  Encoding XML to a structure makes little sense, as there is not a direct map between tags, attributes and struct fields. Something like a DOM makes lots more sense, but it is also much more verbose. This is the big reason to prefer JSON over XML.
  
  laumars 6 years ago
  
  Yeah, that was the point I was hinting at :)

segmondy 6 years ago

How does this compare to http://ugorji.net/blog/go-codecgen ?

francoispqr 6 years ago

It's faster :)

atefatef 6 years ago

szss

throwme_1980 6 years ago

yet another boiler plate library, native lib works fine for 99.99% of use cases. for edge cases i wouldn't be using go anyways if speed was my thing.