lp:~jameinel/+junk/gozjson

Created by John A Meinel and last modified
Get this branch:
bzr branch lp:~jameinel/+junk/gozjson
Only John A Meinel can upload to this branch. If you are John A Meinel please log in for upload directions.

Related bugs

Related blueprints

Branch information

Owner:
John A Meinel
Status:
Development

Recent revisions

12. By John A Meinel

Pulling out UnmarshalJSON for now.

Next thing to try is just a sub-string matcher.

11. By John A Meinel

Using a complex regex makes things slower already.
I'm told that regexp isn't a particularly high perf regex lib,
at least from the 'performance notes' that has been mentioned.
The fact that I'm already at 13s, without even handling 'value',
or pulling out refs, etc. Makes it clear that at least the json
parser is faster than the regex module.

10. By John A Meinel

I didn't have success implementing UnmarshalJSON such that
it was actually faster. Partly because I still went to unmarshal
the generic interface, and then load that into my struct.
Note that using a custom struct with fewer fields *is* a lot
faster, because it doesn't parse those extra fields or map them
to the right types. I get down to 5.978s if I only have
Address and Type exposed.

9. By John A Meinel

Using a real struct is actually slower.
I'm guessing it requires more runtime type interfacing.
Because now we don't just put the fields into a generic map,
instead we have to look at the struct and see what field
this named value maps into, etc.
We could implement the Unmarshaler interface, which I'll try next.

8. By John A Meinel

Some buffering helps, but still gccgo is the slowest, and GOMAXPROCS slows things down.

 7.244s 6l
 7.627s 6l GOMAXPROCS=2
 9.548s gccgo dynamic
 8.755s gccgo -static

7. By John A Meinel

stub out some bits that aren't available in gccgo
Unfortunately, while gccgo is clearly better at CPU computations
(MurmurHash3 got as fast as the C++ code), it seems to be
*slower* at goroutines, etc. Specifically, parsing the minitest.json.gz:
$ ./read_zjson ../minitest.json.gz
Read 100000 lines in 7.370s
Peak Mem: 134.1MiB

$ ./read_zjson_gccgo ../minitest.json.gz
Read 100000 lines in 11.937s
Peak Mem: 103.3MiB

$ ./read_zjson_gccgo_static ../minitest.json.gz
Read 100000 lines in 10.990s
Peak Mem: 104.7MiB

static seems to help a little bit, but both are still slower
than the 6g version. Also notice this:
$ GOMAXPROCS=2 ./read_zjson ../minitest.json.gz
Read 100000 lines in 7.965s
Peak Mem: 91.4MiB

In theory we have 2 CPU bound actions (the decompression and
the json parsing). I might try a bit more buffering in
the channels, and see if that changes anything.

6. By John A Meinel

Tried a bunch of things to track OOM, nothing seems to be working.

So I just printed out the count every 10k, and then die at the end.
I don't seem to be able to make sure defer() gets called, or anything.
It is possible that the OOM is hard, and the interpreter is just lost.

Go hits 2.5GB of memory around 1.81M lines, python hits it at 2.35M lines.
I really didn't think python would be more memory efficient.

5. By John A Meinel

Leave the grep call in, for cases where we're able to call it.

4. By John A Meinel

Trying to spawn from python is giving us an OOM failure.

Which is strange, because we intentionally delete everything before
spawning. But not even gc.collect() is enough to let the os.fork()
succeed.
I didn't try running go to memory limits.

3. By John A Meinel

Implement parsing json for the python extraction and the go extraction.
Also give rlimits, because my machine gets *really* unhappy if you
get into swap. (Initial results just crashed the machine.)

Branch metadata

Branch format:
Branch format 7
Repository format:
Bazaar repository format 2a (needs bzr 1.16 or later)
This branch contains Public information 
Everyone can see this information.

Subscribers