Launchpad itself

Code review comment for lp:~leonardr/launchpad/test-representation-cache

test-representation-cache
Merge into devel

Revision history for this message

Leonard Richardson (leonardr) wrote on 2010-06-02:

> 1) Raw SQL, at psql command line or executed by appservers or scripts.
> 2) Stored procedures
> 3) Store.remove()
> 4) Using Storm for bulk updates, inserts or deletions.
> 5) Scripts or appservers running with non-identical [memcache] config section.
> 6) Scripts or appservers unable to contact memcached servers
> 7) Network glitches
>
> We should probably ignore the last three. Are the first four a problem
> though for the webservice? If it doesn't really matter, fine.
> Otherwise we might need some ON DELETE and ON UPDATE database triggers
> to keep things in sync. Hopefully this is unnecessary, as it could
> will kill our performance gain.

Certainly the first four are a problem. So far we have yet to come to
terms with performance improvements that involve serving stale data from
the web service (or allowing users to use the possibly stale data they
already have).

Can we put the invalidation code in the ON DELETE/ON UPDATE trigger,
rather than in the Storm trigger? Would it hurt performance to just move
the invalidation down a layer?

> + >>> ignore = webservice.get("/~salgado", api_version="devel").jsonBody()
> +
> + >>> cache.get_by_key(key)
> + '{...}'
>
> Is webservice.get making a real connection to the webservice here? I
> am curious if the cache is populated when the object is retrieved,
> rather than when the transaction that retrieved the object commits.

An HTTP request is being constructed and through Python machinations it
is invoking the web service code. But nothing's going over the network.

The cache is populated by the web service code during the GET handling.
Cache population shouldn't have anything to do with the database.

I hope this answers your question.

> > + >>> cache_key = cache.key_for(
> > + ... person, 'media/type', 'web-service-version')
> > + >>> print person.id, cache_key
> > + 29 Person(29,),media/type,web-service-version
>
> Do we need the LPCONFIG here too? Or is it ok for edge & production to mix?
>
> Hmm... perhaps IMemcacheClient should grow a real API and
> automatically prepend the LPCONFIG - not sure if have a use case for
> edge and production sharing data, but it is convenient to have them
> share the same physical Memcached servers.

Aaaah, it's absolutely not OK for edge and production to mix data. The
JSON representations include http: links to one server or the other. I
can fix this in the branch by tacking the instance name on to the cache
key.

> Should we document the cases where this doesn't happen, or is this
> irrelevant to the webservice?
>
> store.execute(Update({Person.addressline1: 'Newer Address'},
> Person.name==person.name)) should make the cache stale.

I'd say that's relevant, and another argument for putting the cache
invalidation in a database hook rather than an ORM hook.

>
> > === added file 'lib/lp/services/memcache/restful.py'
>
> > +class MemcachedStormRepresentationCache(BaseRepresentationCache):
> > + """Caches lazr.restful representations of Storm objects in memcached."""
> > +
> > + def __init__(self):
> > + self.client = memcache_client_factory()
>
> You should be using getUtility(IMemcacheClient) here - otherwise we
> end up with two clients and twice as many sockets opened to each
> memcached. Or we need to refactor the factory to return a singleton
> (is that still a factory?)

I didn't know about getUtility(IMemcacheClient)--I was copying code from
somewhere else. I'll fix it.

>
> > + def key_for(self, obj, media_type, version):
> > + storm_info = storm.info.get_obj_info(obj)
> > + table_name = storm_info.cls_info.table
> > + primary_key = tuple(var.get() for var in storm_info.primary_vars)
> > +
> > + key = (table_name + repr(primary_key)
> > + + ',' + media_type + ',' + str(version))
> > + return key
> > +
> > + def get_by_key(self, key, default=None):
> > + return self.client.get(key) or default
> > +
> > + def set_by_key(self, key, value):
> > + self.client.set(key, value)
> > +
> > + def delete_by_key(self, key):
> > + self.client.delete(key)
>
>
> We will likely need to calculate the key using the Storm class and the
> primary key value rather than an actual object instance. No need to
> add it now, but it might affect your API design.

Can you explain why? Is this related to the database hook? When I
_populate_ the cache, I have an object instance--I guess I can use
storm_info to get the Storm class and the primary key value. Since
lazr.restful never calls the delete code itself, we can implement it
however we want.

Leonard

> 1) Raw SQL, at psql command line or executed by appservers or scripts.
> 2) Stored procedures
> 3) Store.remove()
> 4) Using Storm for bulk updates, inserts or deletions.
> 5) Scripts or appservers running with non-identical [memcache] config section.
> 6) Scripts or appservers unable to contact memcached servers
> 7) Network glitches
> 
> We should probably ignore the last three. Are the first four a problem
> though for the webservice? If it doesn't really matter, fine.
> Otherwise we might need some ON DELETE and ON UPDATE database triggers
> to keep things in sync. Hopefully this is unnecessary, as it could
> will kill our performance gain.

Can we put the invalidation code in the ON DELETE/ON UPDATE trigger,
rather than in the Storm trigger? Would it hurt performance to just move
the invalidation down a layer?

> + >>> ignore = webservice.get("/~salgado", api_version="devel").jsonBody()
> +
> + >>> cache.get_by_key(key)
> + '{...}'
> 
> Is webservice.get making a real connection to the webservice here? I
> am curious if the cache is populated when the object is retrieved,
> rather than when the transaction that retrieved the object commits.

An HTTP request is being constructed and through Python machinations it
is invoking the web service code. But nothing's going over the network.

The cache is populated by the web service code during the GET handling.
Cache population shouldn't have anything to do with the database.

I hope this answers your question.

> > +    >>> cache_key = cache.key_for(
> > +    ...     person, 'media/type', 'web-service-version')
> > +    >>> print person.id, cache_key
> > +    29 Person(29,),media/type,web-service-version
> 
> Do we need the LPCONFIG here too? Or is it ok for edge & production to mix?
> 
> Hmm... perhaps IMemcacheClient should grow a real API and
> automatically prepend the LPCONFIG - not sure if have a use case for
> edge and production sharing data, but it is convenient to have them
> share the same physical Memcached servers.

> Should we document the cases where this doesn't happen, or is this
> irrelevant to the webservice?
> 
> store.execute(Update({Person.addressline1: 'Newer Address'},
> Person.name==person.name)) should make the cache stale.

I'd say that's relevant, and another argument for putting the cache
invalidation in a database hook rather than an ORM hook.

> 
> > === added file 'lib/lp/services/memcache/restful.py'
> 
> > +class MemcachedStormRepresentationCache(BaseRepresentationCache):
> > +    """Caches lazr.restful representations of Storm objects in memcached."""
> > +
> > +    def __init__(self):
> > +        self.client = memcache_client_factory()
> 
> You should be using getUtility(IMemcacheClient) here - otherwise we
> end up with two clients and twice as many sockets opened to each
> memcached. Or we need to refactor the factory to return a singleton
> (is that still a factory?)

I didn't know about getUtility(IMemcacheClient)--I was copying code from
somewhere else. I'll fix it.

> 
> > +    def key_for(self, obj, media_type, version):
> > +        storm_info = storm.info.get_obj_info(obj)
> > +        table_name = storm_info.cls_info.table
> > +        primary_key = tuple(var.get() for var in storm_info.primary_vars)
> > +
> > +        key = (table_name + repr(primary_key)
> > +                + ',' + media_type + ',' + str(version))
> > +        return key
> > +
> > +    def get_by_key(self, key, default=None):
> > +        return self.client.get(key) or default
> > +
> > +    def set_by_key(self, key, value):
> > +        self.client.set(key, value)
> > +
> > +    def delete_by_key(self, key):
> > +        self.client.delete(key)
> 
> 
> We will likely need to calculate the key using the Storm class and the
> primary key value rather than an actual object instance. No need to
> add it now, but it might affect your API design.

Leonard

« Back to merge proposal