Comment 7 for bug 495000

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 495000] Re: Autopack fails with NoSuchFile error when committing concurrently

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gareth White wrote:
> I set up a new (treeless) shared repository and created a new branch. I
> then created a lightweight checkout and imported approx. 600 files,
> including some large ones so that the (uncompressed) tree size is about
> 250 MB. Note that this is not strictly necessary to reproduce the bug,
> but makes it easier as the autopack takes more time.
>
> I then created a second branch from this first one and made another
> lightweight checkout. I committed 8 times in the first checkout (so that
> the total number of revisions in the repo is 9) and then committed in
> both checkouts simultaneously. The first one succeeded, but the second
> one failed with the NoSuchFile error.
>
> The attached log is from the one that failed. The traceback is similar
> to the one I posted earlier using the smart server.
>
> ** Attachment added: "bzr.log"
> http://launchpadlibrarian.net/36812834/bzr.log
>

So your traceback seems a bit surprising. Namely, it looks like the
failure is occurring while renaming the pack files to the obsolete
directory. I have an idea what is going on, and why we only run into
this on Windows.

Specifically, the first autopack succeeds and finishes its operation. At
that point, though, the second autopack is running and is reading from
the files.

We try to rename the obsoleted pack files into another directory, but
the rename fails on Windows because the file is still open. (This
doesn't happen on POSIX systems because the inode abstraction means you
are allowed to do all sorts of things to open files, rename, delete,
etc, without bothering the process that has the file open.)

The code in question actually handles when a file it expected has been
renamed out of the way, but it doesn't handle when it fails to rename.
(So process 2 is perfectly fine if the file it was reading disappears,
process 1 doesn't not how to handle not being able to move the file.)

So we need to sort out what we want to do. We could

a) Leave the file alone. It has already become unreferenced garbage (we
don't try to rename the file until after we've removed the reference to
it.) It won't break anything to have it there, but it does waste disk space.

b) Try a short sleep and try-again. This won't fix all cases, and
eventually we should time out and stop trying. Actually, we probably
want to move on to renaming all the other files we can, and then come
back to trying to rename this one. That way the other process doesn't
hold open another pack file, and in fact, will probably trigger it to
stop what it is doing completely, freeing up this one.

We could do one loop, or 2, or 5, or X and then stop. I don't think we
want to try indefinitely.

It certainly can be identified after the fact that this is an
unreferenced file, though currently we don't have that action
specifically exposed.

c) Start referencing the file again. Which would help it get cleaned out
in a later autopack/pack cycle. I don't really like this, as the second
process is probably thinking about getting rid of it, and we are
preventing that process from realizing that it should stop trying.
(someone else already did the autopacking work.)

I should also note that the rate at which you'll encounter this should
be quite low. Autopack triggers with "exponential backoff". So when
you've done 10 commits, we autopack those 10 pack files together. After
another 10 commits, we only autopack the most recent 10 pack files
(leaving you with 2 10 revision packs.) We won't touch that really big
first import again until you've gotten to 100 commits, and after that we
won't touch it again until you've gotten to 1000, etc.

So while initial testing looks like it runs all the time, it is really
only the first 10 commits that trigger what you are seeing with any
frequency. We've also talked about changing autopack to take into
account the size of the pack files rather than just the number of revs
(rev 1 is always much bigger than every other commit). It takes a bit to
get right, especially over remote transports, etc.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksnpRkACgkQJdeBCYSNAANrfQCgt1Nbvr/uF8jZR5XckH+5YLVD
LlUAn3Kk4BMUatu/mjSNdacUKeDhs2hV
=Wv7U
-----END PGP SIGNATURE-----