On 8/30/2010 7:44 AM, Martin [gz] wrote:
> Okay, I'm confused now. Every microbenchmark I could cook up the bzrlib version of the _escape_cdata is actually slower than the original. So, I tried profiling bundle, and sure enough there's an xml escaping function high up in the list, but it's not the etree one:
>
> 502 0 205.1471 16.3189 <C:\Python24\Lib\site-packages\bzrlib\xml8.py>:217(write_inventory)
> +2610268 0 13.1339 8.5832 +<C:\Python24\Lib\site-packages\bzrlib\xml8.py>:94(_encode_and_escape)
>
> So, what command can I run instead to measure how much ripping this code out hurts performance?
I would guess a major factor would be "which version of Elementtree" :)
since it isn't bundled with earlier pythons.
As near as I can tell, the main change is to switch:
text = replace(text, "&", "&")
text = replace(text, "'", "'") # FIXME: overkill
text = replace(text, "\"", """)
text = replace(text, "<", "<")
text = replace(text, ">", ">")
a) grab a large inventory content from pre-2a format (1.9-rich-root,
for example). This can be a single revision
b) Time the different between a single re.sub() versus 5 calls to
'string.replace'.
Anyway, as mentioned, this isn't a large perf issue for current formats,
so we probably can just revert it.
And your profiling shows... we worked around the ElementTree code
entirely in later revisions, so again, it is likely to not degrade
performance by applying your patch.
merge: approve
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 8/30/2010 7:44 AM, Martin [gz] wrote: Lib\site- packages\ bzrlib\ xml8.py> :217(write_ inventory) Lib\site- packages\ bzrlib\ xml8.py> :94(_encode_ and_escape)
> Okay, I'm confused now. Every microbenchmark I could cook up the bzrlib version of the _escape_cdata is actually slower than the original. So, I tried profiling bundle, and sure enough there's an xml escaping function high up in the list, but it's not the etree one:
>
> 502 0 205.1471 16.3189 <C:\Python24\
> +2610268 0 13.1339 8.5832 +<C:\Python24\
>
> So, what command can I run instead to measure how much ripping this code out hurts performance?
I would guess a major factor would be "which version of Elementtree" :)
since it isn't bundled with earlier pythons.
As near as I can tell, the main change is to switch:
text = replace(text, "&", "&")
text = replace(text, "'", "'") # FIXME: overkill
text = replace(text, "\"", """)
text = replace(text, "<", "<")
text = replace(text, ">", ">")
to using: "[&'\"< >]") replace( match, map=escape_map): re.sub( _escape_ replace, text)
escape_re = re.compile(
escape_map = {
"&":'&',
"'":"'", # FIXME: overkill
"\"":""",
"<":"<",
">":">",
}
def _escape_
return map[match.group()]
...
text = escape_
As such, I think a valid benchmark would be:
a) grab a large inventory content from pre-2a format (1.9-rich-root, replace' .
for example). This can be a single revision
b) Time the different between a single re.sub() versus 5 calls to
'string.
Anyway, as mentioned, this isn't a large perf issue for current formats,
so we probably can just revert it.
And your profiling shows... we worked around the ElementTree code
entirely in later revisions, so again, it is likely to not degrade
performance by applying your patch.
merge: approve
John
=:->
-----BEGIN PGP SIGNATURE----- enigmail. mozdev. org/
747MACgkQJdeBCY SNAAPIwwCgykHab jXO7EuELNwHqurU Y+Pc 5kqD10G1hjvTC0S Rz418Kpi1I
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAkx
vjUAoJ/
=Sm4E
-----END PGP SIGNATURE-----