lp:~jameinel/+junk/godirstate
- Get this branch:
- bzr branch lp:~jameinel/+junk/godirstate
Branch information
- Owner:
- John A Meinel
- Status:
- Development
Recent revisions
- 24. By John A Meinel
-
Remove the error returns in favor of panic().
The immediate feeling, less clear state because we no longer add extra
state info to the error message. Probably a worthy trade for getting
a real traceback that has *even more* error info. (At the expense that
Parse() suppresses getting a traceback.)
Much clearer code without all the 'err' checking statements.Possibly marginally faster (183ms). Probably fewer if checks
in trade for panic which probably has to be runtime checked
anyway. - 22. By John A Meinel
-
setting it to 50MB should allow us to read the whole file in 1 go, but actually makes it slower. 200ms.
- 21. By John A Meinel
-
Increasing the buffer size to 1MB helps a tiny bit, but not particularly dramatically.
Down to about 182ms from 187ms. so about 2-3%.
10 loops 70133 entries in 184.621ms
185 samples (avg 1 threads)
13.51%bytes.IndexByte
8.11%runtime.mcpy
5.41%MHeap_AllocLock ed
5.41%runtime. mallocgc
5.41%scanblock
4.86%dirstate. *entryParser· getDetails - 20. By John A Meinel
-
Play with removing the 'extra' field. But it doesn't seem very expensive
for cases where we don't use it. 187ms to 185ms. Not tracking the extra
content is more apparent, at 255ms vs 289ms. But parsing-and-not- keeping data
is cheating, so it doesn't really count.
With just a working+basis, we're at 187ms vs 131ms, which is pretty good.
6prof gives this layout:
10 loops 70133 entries in 186.195ms
186 samples (avg 1 threads)
19.35%bytes.IndexByte
8.06%runtime.mcpy
7.53%syscall. Syscall
4.84%runtime. memclr
4.84%scanblock
4.84%sweep
4.30%bytes.Equal
3.76%dirstate. *entryParser· getNext
So we still spend most of the time finding null bytes, followed by time
copying bytes into new buffers. I guess the 8% syscall times is for
reading the content, since we don't read everything-up-front like we
do in the python code. - 19. By John A Meinel
-
It appears go authors are no strangers to assembly optimizations.
Changing from bytes.Index() to a custom 'memchr' function dropped the time from
478ms down to ~324ms. However bytes.IndexByte() is assembly code, that even uses
SSE instructions to chunk through memory quickly, and drops it all the way down
to 283ms.
Still 283ms vs Pyrex 164ms, but I didn't expect to shave 195ms off the go time
by a simple switch of bytes.Index to bytes.IndexByte. - 18. By John A Meinel
-
In a gcc tree with a merge (2 parents), still consistent.
go: 10 loops 70134 entries in 482.877ms
py: 10 loops, best of 3: 163 msec per loop
ratio is 2.96:1 in favor of Pyrex. - 17. By John A Meinel
-
Relative pattern holds for 'gcc' sized tree:
go: 10 loops 70134 entries in 336.742ms
pyrex: 10 loops, best of 3: 133 msec per loop
Which is 2.5:1 in favor of pyrex, for bzr.dev it was 2.8:1. - 16. By John A Meinel
-
For posterity, not sharing the dirname string was:
go: 1000 loops 1415 entries in 5.708ms
pyrex: 1000 loops, best of 3: 1.84 msec per loop
So it seems to help (5.7ms became 5.5ms) but not a tremendous win. - 15. By John A Meinel
-
Try to share the dirname entries.
Current timing is:
go: 1000 loops 1415 entries in 5.588ms
pyrex: 1000 loops, best of 3: 1.98 msec per loop
Branch metadata
- Branch format:
- Branch format 7
- Repository format:
- Bazaar repository format 2a (needs bzr 1.16 or later)