Launchpad itself

Code review comment for lp:~stub/launchpad/memcache

memcache
Merge into devel

Revision history for this message

Stuart Bishop (stub) wrote on 2010-03-02:

On Sat, Feb 27, 2010 at 5:47 AM, Gary Poster <email address hidden> wrote:

> Might as well clean up the ``#level debug`` comments in launchpad.conf.

Done.

> I question including the "anonymous" visibility. You bring up a good reason for not including it. Why don't we just exclude it this time around? It feels like something that might be interesting for upstream but not for right now--and like something that people will use unnecessarily. (On the other hand, if you want to push back, in the interest of saving you from waiting for me to wake up on the other side of the globe, you may merely imagine me acquiescing, and keep it, if you like.)

I'd like to keep it because it completes the visibility model and I'd rather not have to re implement it later if we push this upstream or if we have real use cases where we do need it. Also, my comments are just my guess - is our squid configured to cache pages with query strings? I don't really know.

> Line 383 of the diff incorrectly describes the syntax, AIUI: <div tal:content="cache:1h public">. Please correct it.

Fixed.

> I would be tempted to try to provide a more helpful error message when someone includes too many commas (re "self.visibility, max_age = (s.strip() for s in expr.split(','))" from line 392 of the diff. Line 403 ("value, unit = max_age.split(' ')") is similar. If you can't be bothered, I'll look the other way.

Fixed.

> You say that "units is one of 'seconds', 'minutes', 'hours' or 'days'." However, you accept s*, m*, h* and d*. You are much stricter about the visibility strings. I'm inclined to favor enforcing readability, and enforce the full strings. This is negotiable (that is, under the circumstances, you may hear that as "ignorable" if you wish) but I feel more strongly about this one than some others I've brought up with similar deference.

Ok. I did that because it made it simpler to accept plural forms. Fixed.

> You generate _valid_key_charactersbut then you never use it. Maybe delete it?

Yes - that was cruft from earlier work.

> I am curious why you are not using SPACE (32) as your delineator, as opposed to the colon, which forces your logic to have to be a bit trickier here and there. Maybe I'll see why later...

Space is not a valid character in memcache keys, but colon is. I'm also using a mixture of colon and comma because I think I have seen memcache reporting tools using this to summarize memcache utilization, but it doesn't really matter provided it isn't a number or one of the magic tokens like 'p' or 'a'. These tools might be a figment of an overactive imagination.

> The way you are handling repeats is very interesting. It's an interesting problem. I first thought that your approach of adding one to the counter_key in the request annotations would not work very well in the case of nested loops, because if something changed, then it would do a very odd cascade that might put an old sub-item from one top-level section into another top-level section entirely. However, if a collection changes significantly from one repeat to the next--an item is inserted somewhere, rather than appended, in particular--you are kind of hosed anyway. I guess I'm fine with what you have, though I'd like it if you added a warning in the doctest that cacheing things in a repeat is perhaps a less appealing prospect than some others.

I can't see how the approach for loops would cause any weirdness or fail that isn't already an issue outside of loops. If you retrieve a list of bugs from the live database but used cached renderings, the two might not match. This is similar to any other case where you are mixing live information with cached information. If that isn't what you meant, I do not follow your reasoning.

> OK, I asked why you are not using SPACE (32) as your delineator, and now I see those colons in "pt:%s:%s,%s:%s:%d,%d:%d,%s". I see commas too, though. Why can't we use spaces for everything? Contrariwise, if we are separating with colons and commas, why are you not excluding commas from

I'm thinking we can use colon to report on memcached utilization. I'm using colon where it seemed sensible to create a subdivision, and comma where it wasn't. This was just a guess though, as these reporting tools are still science fiction.

> your valid characters? Contrariwise to both of those, or perhaps perpendicularly to them, if the url is at the end, we know that any colon or comma after the ones used in our string formatting pattern belong to the url, so why do we escape any of the delimiters? And then, to add to the excitement, why do we not escape any of the other values in that key--how sure are you that the user id will not have a colon or comma, for instance?

I want to escape colons in the URL to avoid confusing the fictional reporting tools. Nothing else in the key can generate a character that needs sanitizing - the user id is an integer for example.

> To state the obvious, getKey needs to be very fast or else it will be difficult for this to be a performance win. It looks pretty reasonable to me, though. ISTR that calculating an md5 hash is pretty darn fast. I'll not worry about it.

Its probably faster than any suitable hash we come up with inhouse anyway. I chose md5 over sha1 due to my assumption that md5 would be faster. I never timed it though.

> You have a nice __repr__ on the MemcacheMiss, while you have what I interpret to be a vestigial __unicode__ (before you monkeypatched evaluateText) on MemcacheHit. Would it be reasonable to change MemcacheHit to drop the __unicode__ and add a __repr__? If so, please do.

I've dropped the __unicode__. I didn't add __repr__ as there isn't enough suitable information in the class to do a better job than the default __repr__ (the cached text itself isn't suitable as it might be huge). It was cruft from an earlier attempt to keep changes to TALInterpreter minimal.

--
Stuart Bishop <email address hidden>
http://www.stuartbishop.net/

On Sat, Feb 27, 2010 at 5:47 AM, Gary Poster <gary.poster@canonical.com> wrote:

> Might as well clean up the ``#level debug`` comments in launchpad.conf.

Done.

> I question including the "anonymous" visibility.  You bring up a good reason for not including it.  Why don't we just exclude it this time around?  It feels like something that might be interesting for upstream but not for right now--and like something that people will use unnecessarily.  (On the other hand, if you want to push back, in the interest of saving you from waiting for me to wake up on the other side of the globe, you may merely imagine me acquiescing, and keep it, if you like.)

> Line 383 of the diff incorrectly describes the syntax, AIUI: <div tal:content="cache:1h public">.  Please correct it.

Fixed.

> I would be tempted to try to provide a more helpful error message when someone includes too many commas (re "self.visibility, max_age = (s.strip() for s in expr.split(','))" from line 392 of the diff.  Line 403 ("value, unit = max_age.split(' ')") is similar.  If you can't be bothered, I'll look the other way.

Fixed.

> You say that "units is one of 'seconds', 'minutes', 'hours' or 'days'."  However, you accept s*, m*, h* and d*.  You are much stricter about the visibility strings.  I'm inclined to favor enforcing readability, and enforce the full strings.  This is negotiable (that is, under the circumstances, you may hear that as "ignorable" if you wish) but I feel more strongly about this one than some others I've brought up with similar deference.

Ok. I did that because it made it simpler to accept plural forms. Fixed.

> You generate _valid_key_charactersbut then you never use it.  Maybe delete it?

Yes - that was cruft from earlier work.

> I am curious why you are not using SPACE (32) as your delineator, as opposed to the colon, which forces your logic to have to be a bit trickier here and there.  Maybe I'll see why later...

> The way you are handling repeats is very interesting.  It's an interesting problem.  I first thought that your approach of adding one to the counter_key in the request annotations would not work very well in the case of nested loops, because if something changed, then it would do a very odd cascade that might put an old sub-item from one top-level section into another top-level section entirely.  However, if a collection changes significantly from one repeat to the next--an item is inserted somewhere, rather than appended, in particular--you are kind of hosed anyway.  I guess I'm fine with what you have, though I'd like it if you added a warning in the doctest that cacheing things in a repeat is perhaps a less appealing prospect than some others.

> OK, I asked why you are not using SPACE (32) as your delineator, and now I see those colons in "pt:%s:%s,%s:%s:%d,%d:%d,%s".  I see commas too, though.  Why can't we use spaces for everything?  Contrariwise, if we are separating with colons and commas, why are you not excluding commas from

> your valid characters? Contrariwise to both of those, or perhaps perpendicularly to them, if the url is at the end, we know that any colon or comma after the ones used in our string formatting pattern belong to the url, so why do we escape any of the delimiters?  And then, to add to the excitement, why do we not escape any of the other values in that key--how sure are you that the user id will not have a colon or comma, for instance?

I want to escape colons in the URL to avoid confusing the fictional reporting tools. Nothing else in the key can generate a character that needs sanitizing - the user id is an integer for example.

> To state the obvious, getKey needs to be very fast or else it will be difficult for this to be a performance win. It looks pretty reasonable to me, though.  ISTR that calculating an md5 hash is pretty darn fast.  I'll not worry about it.

Its probably faster than any suitable hash we come up with inhouse anyway. I chose md5 over sha1 due to my assumption that md5 would be faster. I never timed it though.

> You have a nice __repr__ on the MemcacheMiss, while you have what I interpret to be a vestigial __unicode__ (before you monkeypatched evaluateText) on MemcacheHit.  Would it be reasonable to change MemcacheHit to drop the __unicode__ and add a __repr__?  If so, please do.

-- 
Stuart Bishop <stuart@stuartbishop.net>
http://www.stuartbishop.net/

« Back to merge proposal