loggerhead

Merge lp:~mkanat/loggerhead/synchronize-lru_cache into lp:loggerhead

synchronize-lru_cache
Merge into trunk-rich

Proposed by Max Kanat-Alexander on 2010-04-23

Status:

Superseded

Proposed branch:

lp:~mkanat/loggerhead/synchronize-lru_cache

Merge into:

lp:loggerhead

Diff against target:

39 lines (+13/-2)

1 file modified

loggerhead/history.py (+13/-2)

To merge this branch:

bzr merge lp:~mkanat/loggerhead/synchronize-lru_cache

Critical

Fix Released

Link a bug report

Reviewer	Date Requested	Status
Max Kanat-Alexander (community)		Needs Fixing on 2010-05-04
Matt Nordhoff	2010-04-23	Approve on 2010-04-26
Review via email: mp+23977@code.launchpad.net

This proposal has been superseded by a proposal from 2010-05-04.

Description of the change

Fixes Bug #420738 by synchronizing all our access to lru_cache, which is not thread-safe.

In testing, I don't see any significant performance degradation under access from many different users accessing several different branches.

Revision history for this message

Ian Clatworthy (ian-clatworthy) wrote on 2010-04-23:

Max,

Well done on tracking down the problem and thanks for the patch. Aren't threading problems fun? :-)

My main concern with the proposed change is that I'd prefer the Cache object to have the intelligence to Do The Right Thing wrt locking, rather than it being left up to the client to guard access to it. As it stands, we're deciding to use an LRU cache in apps/branch.py and paying the consequences through increased code complexity several layers deeper. What do you think about having a decorator class that provides the synchronized access? E.g. changing the cache creation line to be something like ...

graph_cache = ThreadSafeCache(bzrlib.lru_cache.LRUCache(10))

and introducing ThreadSafeCache() as the layer with the RLock?

OTOH, this bug is critical and perhaps we ought to just land it as is, particularly as John is looking to rework the caching architecture Real Soon Now anyhow.

Any second opinions?

Revision history for this message

Max Kanat-Alexander (mkanat) wrote on 2010-04-23:

> Well done on tracking down the problem and thanks for the patch. Aren't
> threading problems fun? :-)

Fun indeed. :-)

> What do you think
> about having a decorator class that provides the synchronized access? E.g.
> changing the cache creation line to be something like ...
>
> graph_cache = ThreadSafeCache(bzrlib.lru_cache.LRUCache(10))
>
> and introducing ThreadSafeCache() as the layer with the RLock?

I thought about that, and the problem is that it would depend to some degree on implementation details of lru_cache--if the client started using it in different ways, or if the lru_cache implementation changed in some ways, we'd have to start adding locks in different places.

John has already said that he's not going to make the LRUCache itself thread-safe (see the bug), and the only accesses we make into to it are via this interface, so I think the simplest and most forward-compatible solution is to put the locking in our client code.

lp:~mkanat/loggerhead/synchronize-lru_cache updated on 2010-04-24

410. By Matt Nordhoff on 2010-04-24

Fix infinite recursion when unpickling Containers.

Unpickle tries to access a few attributes (__getstate__, etc.) before filling the __dict__, so __getattr__ went into a loop trying to find self._properties.

It's no longer possible to use properties that start with an underscore, but that's a bit icky anyway.

411. By Matt Nordhoff on 2010-04-24

Don't let properties that start with _ be created in the first place.

Revision history for this message

Matt Nordhoff (mnordhoff) wrote on 2010-04-26:

Sounds fine to me, and I've been running it for a few days without problems.

review: Approve

lp:~mkanat/loggerhead/synchronize-lru_cache updated on 2010-04-27

412. By Michael Hudson-Doyle on 2010-04-26: merge ~mkanat/loggerhead/synchronize-lru_cache
413. By John A Meinel on 2010-04-27: Cache the Branch reverse-tag dict, since Branch does not.

Revision history for this message

John A Meinel (jameinel) wrote on 2010-05-01:

I should comment here a little bit. I'm a bit surprised that this actually handles what we want it to.

Specifically, RevInfoMemoryCache is created for each History object created, and a new one is created for each HTTP request (if I sorted out my info correctly.)

As such, *each* HTTP thread ends up with its own lock around the cache. And you don't end up doing much of any cross-thread protection, which was the whole point.

I could probably be convinced to add a potential locking primitive to LRUCache. I wanted to avoid the overhead, but we can just default to None, and allow callers to set a Lock object that we'll respect. Something that says "if self._lock is not None: self._lock.acquire()".

I think that __getitem__ can be protected after the "if node is mru:" check. If two threads are accessing different items concurrently, there is no clear-cut MRU, and at worst the one that thinks it is already 'most recent' will just be 1 step behind.

If this *does* seem to fix the problem, my guess is that it only does so by playing tricks with threads and the GIL, and causing sync points. (aka, by accident)

Revision history for this message

John A Meinel (jameinel) wrote on 2010-05-01:

Check out: lp:///~jameinel/bzr/lru_locking

It still needs tests, and possibly 'live-fire' testing, hammering the cache structure with lots of threads. However, I'm fairly sure that I've gotten a minimal locked section (we lock as little as possible), while still maintaining correctness.

LRUCache.add() is not specifically thread-safe if you have cleanups and are adding different values for the same key. We might want to consider putting a big lock around all of add() for that case, but callers could also do that...

The main reason I poked at it was because you can't get 'fine-grained' locking around the LRU double-linked-list unless it is implemented in the cache itself.

lp:~mkanat/loggerhead/synchronize-lru_cache updated on 2010-05-01

414. By Matt Nordhoff on 2010-05-01: Set Cache-Control and Expires headers on static pages. (John Arbash Meinel)

Revision history for this message

Max Kanat-Alexander (mkanat) wrote on 2010-05-04:

Okay, John is right--the bug is not fixed. I just ran into it on bzr.mozilla.org again.

For loggerhead, a quick workaround is to make the lock global instead of a per-instance item.

We do have cleanups, and I suppose it's possible for us to be updating an existing branch with a new cache, right? So I think we'll need this lock in loggerhead anyway.

review: Needs Fixing

lp:~mkanat/loggerhead/synchronize-lru_cache updated on 2010-05-04

415. By Max Kanat-Alexander on 2010-05-04: jam rightly pointed out that the RevInfoMemoryCache lock needs to be global

Unmerged revisions

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Colin Watson

James Barlow

John A Meinel

Matt Nordhoff

Max Kanat-Alexander

Michael Hudson-Doyle

Paul Hummer

jay ham

1	=== modified file 'loggerhead/history.py'
2	--- loggerhead/history.py 2010-04-14 17:54:32 +0000
3	+++ loggerhead/history.py 2010-04-23 02:53:12 +0000
4	@@ -168,6 +168,9 @@
5
6	def __init__(self, cache):
7	self._cache = cache
8	+ # lru_cache is not thread-safe, so we need to lock all accesses.
9	+ # It is even modified when doing a get() on it.
10	+ self._lock = threading.RLock()
11
12	def get(self, key, revid):
13	"""Return the data associated with `key`, subject to a revid check.
14	@@ -175,7 +178,11 @@
15	If a value was stored under `key`, with the same revid, return it.
16	Otherwise return None.
17	"""
18	- cached = self._cache.get(key)
19	+ self._lock.acquire()
20	+ try:
21	+ cached = self._cache.get(key)
22	+ finally:
23	+ self._lock.release()
24	if cached is None:
25	return None
26	stored_revid, data = cached
27	@@ -187,7 +194,11 @@
28	def set(self, key, revid, data):
29	"""Store `data` under `key`, to be checked against `revid` on get().
30	"""
31	- self._cache[key] = (revid, data)
32	+ self._lock.acquire()
33	+ try:
34	+ self._cache[key] = (revid, data)
35	+ finally:
36	+ self._lock.release()
37
38	# Used to store locks that prevent multiple threads from building a
39	# revision graph for the same branch at the same time, because that can

loggerhead

Merge lp:~mkanat/loggerhead/synchronize-lru_cache into lp:loggerhead

Commit message

Description of the change

Unmerged revisions

Preview Diff

Subscribers