Merge lp:~jameinel/bzr/2.0.4-faster-export-343218 into lp:bzr/2.0

Proposed by John A Meinel
Status: Merged
Merged at revision: not available
Proposed branch: lp:~jameinel/bzr/2.0.4-faster-export-343218
Merge into: lp:bzr/2.0
Diff against target: 77 lines (+32/-13)
2 files modified
NEWS (+5/-0)
bzrlib/export/dir_exporter.py (+27/-13)
To merge this branch: bzr merge lp:~jameinel/bzr/2.0.4-faster-export-343218
Reviewer Review Type Date Requested Status
Martin Pool Needs Fixing
Review via email: mp+16210@code.launchpad.net
To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote :

This is a fairly straightforward fix for bug #343218.

Basically, the dir_exporter code was looping over all entries, and reading the content one-by-one. Instead, this starts by creating all the directories, and then uses 'iter_file_bytes' to request the content for all files and export them.

All the tests still pass, and the performance for 'bzr export' locally can be 2:1 faster, over my local network, I saw 2min drop down to 25s. I didn't try against launchpad.

I'm proposing this against 2.0 because it is a fairly small and localized change. I'm planning a slightly more involved version for the 2.1 series. (refactor into a helper that can be used by the tar exporter, etc.) The main issue there is that we probably *do* care about the order of content for a tarball, espec if we are going to try to use it for the 'pristine tarball' stuff.

Revision history for this message
Martin Pool (mbp) wrote :

+ if executable:
+ os.chmod(fullpath, 0755)

I realize it's not new code but this is actually kind of wrong: if the user's umask is not 0222 executable files will end up with the wrong perms. It would be better to set it to 0777 when opening the file then the umask will have the correct effect, ie

out = open(...., mode=file_perms)

Aside from that it looks good.

For me this is borderline for 2.0: not changing user visible behaviour probably trumps improving performance. If we knew that people had already specifically wanted a fix in 2.0 rather than 2.1 that might tip the balance (do they?)

review: Needs Fixing
Revision history for this message
John A Meinel (jameinel) wrote :

So you can't set file perms to a plain 'open()' call. The mode flag is 'rb' versus 'wb', etc.

You can, however use "os.open()" to get a raw file descriptor and then you get more flags. (os.open(, O_CREAT, 0777), etc.)

I'll change that.

However, my patch isn't a user-visible change (as I understand it). Or are you saying the chmod is a 2.0 borderline?

It *does* change the order that files will get created in the export. However, once they are in the filesystem, you can't really tell that. I did *not* change the ordering for tarball or zip exports.

Revision history for this message
Martin Pool (mbp) wrote :

2009/12/18 John A Meinel <email address hidden>:
> So you can't set file perms to a plain 'open()' call. The mode flag is 'rb' versus 'wb', etc.
>
> You can, however use "os.open()" to get a raw file descriptor and then you get more flags. (os.open(, O_CREAT, 0777), etc.)
>
> I'll change that.
>
> However, my patch isn't a user-visible change (as I understand it). Or are you saying the chmod is a 2.0 borderline?
>
> It *does* change the order that files will get created in the export. However, once they are in the filesystem, you can't really tell that. I did *not* change the ordering for tarball or zip exports.

oh, I thought (from some other comments) you might also be changing it
in tar files, which would be slightly user visible. Obviously this is
not, so it's ok.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> 2009/12/18 John A Meinel <email address hidden>:
>> So you can't set file perms to a plain 'open()' call. The mode flag is 'rb' versus 'wb', etc.
>>
>> You can, however use "os.open()" to get a raw file descriptor and then you get more flags. (os.open(, O_CREAT, 0777), etc.)
>>
>> I'll change that.
>>
>> However, my patch isn't a user-visible change (as I understand it). Or are you saying the chmod is a 2.0 borderline?
>>
>> It *does* change the order that files will get created in the export. However, once they are in the filesystem, you can't really tell that. I did *not* change the ordering for tarball or zip exports.
>
> oh, I thought (from some other comments) you might also be changing it
> in tar files, which would be slightly user visible. Obviously this is
> not, so it's ok.
>

I sent a mail to the list for discussing my 2.1 version of the patch,
asking what people thought about possibly changing the ordering.

Of course, nobody has responded to that...

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksq4OsACgkQJdeBCYSNAAOmlwCfRqedgu6YXLBhbQ3d1fxjPRZB
f70AoNQkGKw0bI9f5gZhdtq/p2ZD2clp
=fgyf
-----END PGP SIGNATURE-----

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2009-12-15 16:56:18 +0000
3+++ NEWS 2009-12-18 05:41:14 +0000
4@@ -20,6 +20,11 @@
5 Bug Fixes
6 *********
7
8+* ``bzr export dir`` now requests all file content as a record stream,
9+ rather than requsting the file content one file-at-a-time. This can make
10+ exporting over the network significantly faster (54min => 9min in one
11+ case). (John Arbash Meinel, #343218)
12+
13 Improvements
14 ************
15
16
17=== modified file 'bzrlib/export/dir_exporter.py'
18--- bzrlib/export/dir_exporter.py 2009-07-29 13:46:55 +0000
19+++ bzrlib/export/dir_exporter.py 2009-12-18 05:41:14 +0000
20@@ -1,4 +1,4 @@
21-# Copyright (C) 2005 Canonical Ltd
22+# Copyright (C) 2005, 2009 Canonical Ltd
23 #
24 # This program is free software; you can redistribute it and/or modify
25 # it under the terms of the GNU General Public License as published by
26@@ -52,21 +52,17 @@
27 raise errors.BzrError("Can't export tree to non-empty directory.")
28 else:
29 raise
30+ # Iterate everything, building up the files we will want to export, and
31+ # creating the directories and symlinks that we need.
32+ # This tracks (file_id, (destination_path, executable))
33+ # This matches the api that tree.iter_files_bytes() wants
34+ # Note in the case of revision trees, this does trigger a double inventory
35+ # lookup, hopefully it isn't too expensive.
36+ to_fetch = []
37 for dp, ie in _export_iter_entries(tree, subdir):
38 fullpath = osutils.pathjoin(dest, dp)
39 if ie.kind == "file":
40- if filtered:
41- chunks = tree.get_file_lines(ie.file_id)
42- filters = tree._content_filter_stack(dp)
43- context = ContentFilterContext(dp, tree, ie)
44- contents = filtered_output_bytes(chunks, filters, context)
45- content = ''.join(contents)
46- fileobj = StringIO.StringIO(content)
47- else:
48- fileobj = tree.get_file(ie.file_id)
49- osutils.pumpfile(fileobj, file(fullpath, 'wb'))
50- if tree.is_executable(ie.file_id):
51- os.chmod(fullpath, 0755)
52+ to_fetch.append((ie.file_id, (dp, tree.is_executable(ie.file_id))))
53 elif ie.kind == "directory":
54 os.mkdir(fullpath)
55 elif ie.kind == "symlink":
56@@ -80,3 +76,21 @@
57 else:
58 raise errors.BzrError("don't know how to export {%s} of kind %r" %
59 (ie.file_id, ie.kind))
60+ # The data returned here can be in any order, but we've already created all
61+ # the directories
62+ flags = os.O_CREAT | os.O_TRUNC | os.O_WRONLY | getattr(os, 'O_BINARY', 0)
63+ for (relpath, executable), chunks in tree.iter_files_bytes(to_fetch):
64+ if filtered:
65+ filters = tree._content_filter_stack(relpath)
66+ context = ContentFilterContext(relpath, tree, ie)
67+ chunks = filtered_output_bytes(chunks, filters, context)
68+ fullpath = osutils.pathjoin(dest, relpath)
69+ # We set the mode and let the umask sort out the file info
70+ mode = 0666
71+ if executable:
72+ mode = 0777
73+ out = os.fdopen(os.open(fullpath, flags, mode), 'wb')
74+ try:
75+ out.writelines(chunks)
76+ finally:
77+ out.close()

Subscribers

People subscribed via source and target branches