lp:~jameinel/+junk/gozjson
- Get this branch:
- bzr branch lp:~jameinel/+junk/gozjson
Branch information
- Owner:
- John A Meinel
- Status:
- Development
Recent revisions
- 12. By John A Meinel
-
Pulling out UnmarshalJSON for now.
Next thing to try is just a sub-string matcher.
- 11. By John A Meinel
-
Using a complex regex makes things slower already.
I'm told that regexp isn't a particularly high perf regex lib,
at least from the 'performance notes' that has been mentioned.
The fact that I'm already at 13s, without even handling 'value',
or pulling out refs, etc. Makes it clear that at least the json
parser is faster than the regex module. - 10. By John A Meinel
-
I didn't have success implementing UnmarshalJSON such that
it was actually faster. Partly because I still went to unmarshal
the generic interface, and then load that into my struct.
Note that using a custom struct with fewer fields *is* a lot
faster, because it doesn't parse those extra fields or map them
to the right types. I get down to 5.978s if I only have
Address and Type exposed. - 9. By John A Meinel
-
Using a real struct is actually slower.
I'm guessing it requires more runtime type interfacing.
Because now we don't just put the fields into a generic map,
instead we have to look at the struct and see what field
this named value maps into, etc.
We could implement the Unmarshaler interface, which I'll try next. - 8. By John A Meinel
-
Some buffering helps, but still gccgo is the slowest, and GOMAXPROCS slows things down.
7.244s 6l
7.627s 6l GOMAXPROCS=2
9.548s gccgo dynamic
8.755s gccgo -static - 7. By John A Meinel
-
stub out some bits that aren't available in gccgo
Unfortunately, while gccgo is clearly better at CPU computations
(MurmurHash3 got as fast as the C++ code), it seems to be
*slower* at goroutines, etc. Specifically, parsing the minitest.json.gz:
$ ./read_zjson ../minitest.json.gz
Read 100000 lines in 7.370s
Peak Mem: 134.1MiB$ ./read_zjson_gccgo ../minitest.json.gz
Read 100000 lines in 11.937s
Peak Mem: 103.3MiB$ ./read_
zjson_gccgo_ static ../minitest.json.gz
Read 100000 lines in 10.990s
Peak Mem: 104.7MiBstatic seems to help a little bit, but both are still slower
than the 6g version. Also notice this:
$ GOMAXPROCS=2 ./read_zjson ../minitest.json.gz
Read 100000 lines in 7.965s
Peak Mem: 91.4MiBIn theory we have 2 CPU bound actions (the decompression and
the json parsing). I might try a bit more buffering in
the channels, and see if that changes anything. - 6. By John A Meinel
-
Tried a bunch of things to track OOM, nothing seems to be working.
So I just printed out the count every 10k, and then die at the end.
I don't seem to be able to make sure defer() gets called, or anything.
It is possible that the OOM is hard, and the interpreter is just lost.Go hits 2.5GB of memory around 1.81M lines, python hits it at 2.35M lines.
I really didn't think python would be more memory efficient. - 4. By John A Meinel
-
Trying to spawn from python is giving us an OOM failure.
Which is strange, because we intentionally delete everything before
spawning. But not even gc.collect() is enough to let the os.fork()
succeed.
I didn't try running go to memory limits. - 3. By John A Meinel
-
Implement parsing json for the python extraction and the go extraction.
Also give rlimits, because my machine gets *really* unhappy if you
get into swap. (Initial results just crashed the machine.)
Branch metadata
- Branch format:
- Branch format 7
- Repository format:
- Bazaar repository format 2a (needs bzr 1.16 or later)