Bazaar

Code review comment for lp:~gz/bzr/remove_monkey_patched_elementtree_escaping_614522

remove_monkey_patched_elementtree_escaping_614522
Merge into bzr.dev

Revision history for this message

John A Meinel (jameinel) wrote on 2010-08-30:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 8/30/2010 7:44 AM, Martin [gz] wrote:
> Okay, I'm confused now. Every microbenchmark I could cook up the bzrlib version of the _escape_cdata is actually slower than the original. So, I tried profiling bundle, and sure enough there's an xml escaping function high up in the list, but it's not the etree one:
>
> 502 0 205.1471 16.3189 <C:\Python24\Lib\site-packages\bzrlib\xml8.py>:217(write_inventory)
> +2610268 0 13.1339 8.5832 +<C:\Python24\Lib\site-packages\bzrlib\xml8.py>:94(_encode_and_escape)
>
> So, what command can I run instead to measure how much ripping this code out hurts performance?

I would guess a major factor would be "which version of Elementtree" :)
since it isn't bundled with earlier pythons.

As near as I can tell, the main change is to switch:

            text = replace(text, "&", "&")
            text = replace(text, "'", "'") # FIXME: overkill
            text = replace(text, "\"", """)
            text = replace(text, "<", "<")
            text = replace(text, ">", ">")

to using:
escape_re = re.compile("[&'\"<>]")
escape_map = {
    "&":'&',
    "'":"'", # FIXME: overkill
    "\"":""",
    "<":"<",
    ">":">",
    }
def _escape_replace(match, map=escape_map):
    return map[match.group()]
...
  text = escape_re.sub(_escape_replace, text)

As such, I think a valid benchmark would be:

a) grab a large inventory content from pre-2a format (1.9-rich-root,
for example). This can be a single revision
b) Time the different between a single re.sub() versus 5 calls to
'string.replace'.

Anyway, as mentioned, this isn't a large perf issue for current formats,
so we probably can just revert it.

And your profiling shows... we worked around the ElementTree code
entirely in later revisions, so again, it is likely to not degrade
performance by applying your patch.

merge: approve

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkx747MACgkQJdeBCYSNAAPIwwCgykHabjXO7EuELNwHqurUY+Pc
vjUAoJ/5kqD10G1hjvTC0SRz418Kpi1I
=Sm4E
-----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 8/30/2010 7:44 AM, Martin [gz] wrote:
> Okay, I'm confused now. Every microbenchmark I could cook up the bzrlib version of the _escape_cdata is actually slower than the original. So, I tried profiling bundle, and sure enough there's an xml escaping function high up in the list, but it's not the etree one:
> 
>          502            0    205.1471     16.3189   <C:\Python24\Lib\site-packages\bzrlib\xml8.py>:217(write_inventory)
>     +2610268            0     13.1339      8.5832   +<C:\Python24\Lib\site-packages\bzrlib\xml8.py>:94(_encode_and_escape)
> 
> So, what command can I run instead to measure how much ripping this code out hurts performance?

I would guess a major factor would be "which version of Elementtree" :)
since it isn't bundled with earlier pythons.

As near as I can tell, the main change is to switch:

text = replace(text, "&", "&amp;")
            text = replace(text, "'", "&apos;") # FIXME: overkill
            text = replace(text, "\"", "&quot;")
            text = replace(text, "<", "&lt;")
            text = replace(text, ">", "&gt;")

to using:
escape_re = re.compile("[&'\"<>]")
escape_map = {
    "&":'&amp;',
    "'":"&apos;", # FIXME: overkill
    "\"":"&quot;",
    "<":"&lt;",
    ">":"&gt;",
    }
def _escape_replace(match, map=escape_map):
    return map[match.group()]
...
  text = escape_re.sub(_escape_replace, text)

As such, I think a valid benchmark would be:

a) grab a large inventory content from pre-2a format (1.9-rich-root,
    for example). This can be a single revision
 b) Time the different between a single re.sub() versus 5 calls to
    'string.replace'.

Anyway, as mentioned, this isn't a large perf issue for current formats,
so we probably can just revert it.

And your profiling shows... we worked around the ElementTree code
entirely in later revisions, so again, it is likely to not degrade
performance by applying your patch.

merge: approve

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkx747MACgkQJdeBCYSNAAPIwwCgykHabjXO7EuELNwHqurUY+Pc
vjUAoJ/5kqD10G1hjvTC0SRz418Kpi1I
=Sm4E
-----END PGP SIGNATURE-----

review: Approve

« Back to merge proposal