Merge lp:~jelmer/bzr/export-use-tree-timestamp into lp:bzr

Proposed by Jelmer Vernooij
Status: Merged
Merged at revision: not available
Proposed branch: lp:~jelmer/bzr/export-use-tree-timestamp
Merge into: lp:bzr
Diff against target: 302 lines (+89/-25)
8 files modified
NEWS (+8/-0)
bzrlib/builtins.py (+6/-2)
bzrlib/export/__init__.py (+10/-5)
bzrlib/export/dir_exporter.py (+7/-3)
bzrlib/export/tar_exporter.py (+17/-10)
bzrlib/export/zip_exporter.py (+9/-4)
bzrlib/tests/blackbox/test_export.py (+10/-0)
bzrlib/tests/test_export.py (+22/-1)
To merge this branch: bzr merge lp:~jelmer/bzr/export-use-tree-timestamp
Reviewer Review Type Date Requested Status
Martin Pool Needs Fixing
Robert Collins (community) code Needs Fixing
Bazaar Developers code Pending
Review via email: mp+20865@code.launchpad.net

Description of the change

This adds an extra argument ``use_tree_timestamp`` to the exporters, to use the commit time of the commit in which a file revision was introduced as the mtime when exporting. This is useful as a means of getting deterministic exports (i.e. the generated tarball from a particular revision is always the same).

use_tree_timestamps is not the default for performance reasons, but the provided infrastructure is useful for bzr-builddeb.

To post a comment you must log in.
Revision history for this message
Robert Collins (lifeless) wrote :

 review: needsfixing

perhaps per_file_timestamps - the source is the tree, but its the
lookup-per-rev that will hurt.

Crucially the docstring change should be a little more verbose about
this, so that folk reading it can predict the behaviour/tradeoff.

Should this be exposed in 'bzr export's command line?

Cheers,
Rob

review: Needs Fixing
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jelmer Vernooij wrote:
> Jelmer Vernooij has proposed merging lp:~jelmer/bzr/export-use-tree-timestamp into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
>
> This adds an extra argument ``use_tree_timestamp`` to the exporters, to use the commit time of the commit in which a file revision was introduced as the mtime when exporting. This is useful as a means of getting deterministic exports (i.e. the generated tarball from a particular revision is always the same).
>
> use_tree_timestamps is not the default for performance reasons, but the provided infrastructure is useful for bzr-builddeb.
>

Note that using 'tree.get_file_mtime()' for each file is a bit of a
shame. Specifically, the api doesn't really do any caching (and it isn't
very easy to do so).

If we wanted it to perform well, doing 1 pass over the inventory to find
revisions we will care about, doing another request to read all of those
revisions in 1 pass, should actually be pretty decent. (log can display
all revisions in generally a rather decent amount of time, this should
be no slower than that.)

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkuVKG0ACgkQJdeBCYSNAAMtNQCgqkS8W/byYFqC3XpeK9QeQCjV
fVcAmgNaJOolG9yNfUFmqnOwEwuYYLDQ
=b4zc
-----END PGP SIGNATURE-----

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

> Jelmer Vernooij wrote:
> > Jelmer Vernooij has proposed merging lp:~jelmer/bzr/export-use-tree-
> timestamp into lp:bzr.
> >
> > Requested reviews:
> > bzr-core (bzr-core)
> >
> >
> > This adds an extra argument ``use_tree_timestamp`` to the exporters, to use
> the commit time of the commit in which a file revision was introduced as the
> mtime when exporting. This is useful as a means of getting deterministic
> exports (i.e. the generated tarball from a particular revision is always the
> same).
> >
> > use_tree_timestamps is not the default for performance reasons, but the
> provided infrastructure is useful for bzr-builddeb.
> >
>
> Note that using 'tree.get_file_mtime()' for each file is a bit of a
> shame. Specifically, the api doesn't really do any caching (and it isn't
> very easy to do so).
>
> If we wanted it to perform well, doing 1 pass over the inventory to find
> revisions we will care about, doing another request to read all of those
> revisions in 1 pass, should actually be pretty decent. (log can display
> all revisions in generally a rather decent amount of time, this should
> be no slower than that.)
Yeah,I realize that. That seemed like significantly more work though so it's something I'd rather do in a separate patch.

This at least adds the functionality, but doesn't have any performance consequences for the default behaviour. The overhead at the moment doesn't seem too bad. On a Samba tree with a warm cache:

./bzr export x.tar.gz ~/src/samba/trunk 15.58s user 0.28s system 96% cpu 16.374 total
./bzr export --per-file-timestamps x.tar.gz ~/src/samba/trunk 19.06s user 0.43s system 86% cpu 22.629 total
./bzr ls --versioned --recursive ~/src/samba/trunk | wc -l
7921

Cheers,

Jelmer

Revision history for this message
Martin Pool (mbp) wrote :

I see you renamed the option per Robert's suggestion, but you need to also update the news!

I'm fine for you to make this faster in a later patch.

review: Needs Fixing
Revision history for this message
Jelmer Vernooij (jelmer) wrote :

> I see you renamed the option per Robert's suggestion, but you need to also
> update the news!
>
> I'm fine for you to make this faster in a later patch.
I've fixed NEWS.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2010-03-11 11:07:32 +0000
3+++ NEWS 2010-03-11 18:02:25 +0000
4@@ -37,6 +37,10 @@
5 New Features
6 ************
7
8+* ``bzr export`` now takes an optional argument ``--per-file-timestamps``
9+ to set file mtimes to the last timestamp of the last revision in which
10+ they were changed rather than the current time. (Jelmer Vernooij)
11+
12 * If the Apport crash-reporting tool is available, bzr crashes are now
13 stored into the ``/var/crash`` apport spool directory, and the user is
14 invited to report them to the developers from there, either
15@@ -147,6 +151,10 @@
16 to be done separately.
17 (Martin Pool, #507710)
18
19+* Exporters now support a ``per_file_timestamps`` argument to write out the
20+ timestamp of the commit in which a file revision was introduced.
21+ (Jelmer Vernooij)
22+
23 * New method ``BzrDir.list_branches()`` that returns a sequence of branches
24 present in a control directory. (Jelmer Vernooij)
25
26
27=== modified file 'bzrlib/builtins.py'
28--- bzrlib/builtins.py 2010-03-03 08:25:15 +0000
29+++ bzrlib/builtins.py 2010-03-11 18:02:25 +0000
30@@ -2806,9 +2806,12 @@
31 Option('root',
32 type=str,
33 help="Name of the root directory inside the exported file."),
34+ Option('per-file-timestamps',
35+ help='Set modification time of files to that of the last '
36+ 'revision in which it was changed.'),
37 ]
38 def run(self, dest, branch_or_subdir=None, revision=None, format=None,
39- root=None, filters=False):
40+ root=None, filters=False, per_file_timestamps=False):
41 from bzrlib.export import export
42
43 if branch_or_subdir is None:
44@@ -2821,7 +2824,8 @@
45
46 rev_tree = _get_one_revision_tree('export', revision, branch=b, tree=tree)
47 try:
48- export(rev_tree, dest, format, root, subdir, filtered=filters)
49+ export(rev_tree, dest, format, root, subdir, filtered=filters,
50+ per_file_timestamps=per_file_timestamps)
51 except errors.NoSuchExportFormat, e:
52 raise errors.BzrCommandError('Unsupported export format: %s' % e.format)
53
54
55=== modified file 'bzrlib/export/__init__.py'
56--- bzrlib/export/__init__.py 2010-01-29 12:04:38 +0000
57+++ bzrlib/export/__init__.py 2010-03-11 18:02:25 +0000
58@@ -19,7 +19,6 @@
59 Such as non-controlled directories, tarfiles, zipfiles, etc.
60 """
61
62-from bzrlib.trace import mutter
63 import os
64 import bzrlib.errors as errors
65
66@@ -55,14 +54,16 @@
67
68 When requesting a specific type of export, load the respective path.
69 """
70- def _loader(tree, dest, root, subdir, filtered):
71+ def _loader(tree, dest, root, subdir, filtered, per_file_timestamps):
72 mod = __import__(module, globals(), locals(), [funcname])
73 func = getattr(mod, funcname)
74- return func(tree, dest, root, subdir, filtered=filtered)
75+ return func(tree, dest, root, subdir, filtered=filtered,
76+ per_file_timestamps=per_file_timestamps)
77 register_exporter(scheme, extensions, _loader)
78
79
80-def export(tree, dest, format=None, root=None, subdir=None, filtered=False):
81+def export(tree, dest, format=None, root=None, subdir=None, filtered=False,
82+ per_file_timestamps=False):
83 """Export the given Tree to the specific destination.
84
85 :param tree: A Tree (such as RevisionTree) to export
86@@ -81,6 +82,9 @@
87 a directory to start exporting from.
88 :param filtered: If True, content filtering is applied to the
89 files exported.
90+ :param per_file_timestamps: Whether to use the timestamp stored in the
91+ tree rather than now(). This will do a revision lookup
92+ for every file so will be significantly slower.
93 """
94 global _exporters, _exporter_extensions
95
96@@ -99,7 +103,8 @@
97 raise errors.NoSuchExportFormat(format)
98 tree.lock_read()
99 try:
100- return _exporters[format](tree, dest, root, subdir, filtered=filtered)
101+ return _exporters[format](tree, dest, root, subdir, filtered=filtered,
102+ per_file_timestamps=per_file_timestamps)
103 finally:
104 tree.unlock()
105
106
107=== modified file 'bzrlib/export/dir_exporter.py'
108--- bzrlib/export/dir_exporter.py 2010-02-23 07:43:11 +0000
109+++ bzrlib/export/dir_exporter.py 2010-03-11 18:02:25 +0000
110@@ -18,7 +18,6 @@
111
112 import errno
113 import os
114-import StringIO
115 import time
116
117 from bzrlib import errors, osutils
118@@ -30,7 +29,8 @@
119 from bzrlib.trace import mutter
120
121
122-def dir_exporter(tree, dest, root, subdir, filtered=False):
123+def dir_exporter(tree, dest, root, subdir, filtered=False,
124+ per_file_timestamps=False):
125 """Export this tree to a new directory.
126
127 `dest` should either not exist or should be empty. If it does not exist it
128@@ -92,4 +92,8 @@
129 out.writelines(chunks)
130 finally:
131 out.close()
132- os.utime(fullpath, (now, now))
133+ if per_file_timestamps:
134+ mtime = tree.get_file_mtime(tree.path2id(relpath), relpath)
135+ else:
136+ mtime = now
137+ os.utime(fullpath, (mtime, mtime))
138
139=== modified file 'bzrlib/export/tar_exporter.py'
140--- bzrlib/export/tar_exporter.py 2009-03-23 14:59:43 +0000
141+++ bzrlib/export/tar_exporter.py 2010-03-11 18:02:25 +0000
142@@ -17,13 +17,12 @@
143 """Export a Tree to a non-versioned directory.
144 """
145
146-import os
147 import StringIO
148 import sys
149 import tarfile
150 import time
151
152-from bzrlib import errors, export, osutils
153+from bzrlib import export, osutils
154 from bzrlib.export import _export_iter_entries
155 from bzrlib.filters import (
156 ContentFilterContext,
157@@ -32,7 +31,8 @@
158 from bzrlib.trace import mutter
159
160
161-def tar_exporter(tree, dest, root, subdir, compression=None, filtered=False):
162+def tar_exporter(tree, dest, root, subdir, compression=None, filtered=False,
163+ per_file_timestamps=False):
164 """Export this tree to a new tar file.
165
166 `dest` will be created holding the contents of this tree; if it
167@@ -52,7 +52,10 @@
168 for dp, ie in _export_iter_entries(tree, subdir):
169 filename = osutils.pathjoin(root, dp).encode('utf8')
170 item = tarfile.TarInfo(filename)
171- item.mtime = now
172+ if per_file_timestamps:
173+ item.mtime = tree.get_file_mtime(ie.file_id, dp)
174+ else:
175+ item.mtime = now
176 if ie.kind == "file":
177 item.type = tarfile.REGTYPE
178 if tree.is_executable(ie.file_id):
179@@ -89,9 +92,13 @@
180 ball.close()
181
182
183-def tgz_exporter(tree, dest, root, subdir, filtered=False):
184- tar_exporter(tree, dest, root, subdir, compression='gz', filtered=filtered)
185-
186-
187-def tbz_exporter(tree, dest, root, subdir, filtered=False):
188- tar_exporter(tree, dest, root, subdir, compression='bz2', filtered=filtered)
189+def tgz_exporter(tree, dest, root, subdir, filtered=False,
190+ per_file_timestamps=False):
191+ tar_exporter(tree, dest, root, subdir, compression='gz',
192+ filtered=filtered, per_file_timestamps=per_file_timestamps)
193+
194+
195+def tbz_exporter(tree, dest, root, subdir, filtered=False,
196+ per_file_timestamps=False):
197+ tar_exporter(tree, dest, root, subdir, compression='bz2',
198+ filtered=filtered, per_file_timestamps=per_file_timestamps)
199
200=== modified file 'bzrlib/export/zip_exporter.py'
201--- bzrlib/export/zip_exporter.py 2010-02-17 17:11:16 +0000
202+++ bzrlib/export/zip_exporter.py 2010-03-11 18:02:25 +0000
203@@ -42,7 +42,8 @@
204 _DIR_ATTR = stat.S_IFDIR | ZIP_DIRECTORY_BIT
205
206
207-def zip_exporter(tree, dest, root, subdir, filtered=False):
208+def zip_exporter(tree, dest, root, subdir, filtered=False,
209+ per_file_timestamps=False):
210 """ Export this tree to a new zip file.
211
212 `dest` will be created holding the contents of this tree; if it
213@@ -62,11 +63,15 @@
214
215 # zipfile.ZipFile switches all paths to forward
216 # slashes anyway, so just stick with that.
217+ if per_file_timestamps:
218+ mtime = tree.get_file_mtime(ie.file_id, dp)
219+ else:
220+ mtime = now
221 filename = osutils.pathjoin(root, dp).encode('utf8')
222 if ie.kind == "file":
223 zinfo = zipfile.ZipInfo(
224 filename=filename,
225- date_time=now)
226+ date_time=mtime)
227 zinfo.compress_type = compression
228 zinfo.external_attr = _FILE_ATTR
229 if filtered:
230@@ -84,14 +89,14 @@
231 # not just empty files.
232 zinfo = zipfile.ZipInfo(
233 filename=filename + '/',
234- date_time=now)
235+ date_time=mtime)
236 zinfo.compress_type = compression
237 zinfo.external_attr = _DIR_ATTR
238 zipf.writestr(zinfo,'')
239 elif ie.kind == "symlink":
240 zinfo = zipfile.ZipInfo(
241 filename=(filename + '.lnk'),
242- date_time=now)
243+ date_time=mtime)
244 zinfo.compress_type = compression
245 zinfo.external_attr = _FILE_ATTR
246 zipf.writestr(zinfo, ie.symlink_target)
247
248=== modified file 'bzrlib/tests/blackbox/test_export.py'
249--- bzrlib/tests/blackbox/test_export.py 2010-02-17 17:11:16 +0000
250+++ bzrlib/tests/blackbox/test_export.py 2010-03-11 18:02:25 +0000
251@@ -293,3 +293,13 @@
252 tree.commit('more setup')
253 out, err = self.run_bzr('export exported branch/subdir')
254 self.assertEqual(['foo.txt'], os.listdir('exported'))
255+
256+ def test_dir_export_per_file_timestamps(self):
257+ tree = self.example_branch()
258+ self.build_tree_contents([('branch/har', 'foo')])
259+ tree.add('har')
260+ tree.commit('setup', timestamp=42)
261+ self.run_bzr('export --per-file-timestamps t branch')
262+ har_st = os.stat('t/har')
263+ self.assertEquals(42, har_st.st_mtime)
264+
265
266=== modified file 'bzrlib/tests/test_export.py'
267--- bzrlib/tests/test_export.py 2010-02-04 16:06:36 +0000
268+++ bzrlib/tests/test_export.py 2010-03-11 18:02:25 +0000
269@@ -20,7 +20,6 @@
270 from bzrlib import (
271 errors,
272 export,
273- osutils,
274 tests,
275 )
276
277@@ -99,3 +98,25 @@
278 st_b = t.stat('b')
279 # All files must be given the same mtime.
280 self.assertEqual(st_a.st_mtime, st_b.st_mtime)
281+
282+ def test_dir_export_files_per_file_timestamps(self):
283+ builder = self.make_branch_builder('source')
284+ builder.start_series()
285+ builder.build_snapshot(None, None, [
286+ ('add', ('', 'root-id', 'directory', '')),
287+ ('add', ('a', 'a-id', 'file', 'content\n'))],
288+ timestamp=3423)
289+ builder.build_snapshot(None, None, [
290+ ('add', ('b', 'b-id', 'file', 'content\n'))],
291+ timestamp=42)
292+ builder.finish_series()
293+ b = builder.get_branch()
294+ b.lock_read()
295+ self.addCleanup(b.unlock)
296+ tree = b.basis_tree()
297+ export.export(tree, 'target', format='dir', per_file_timestamps=True)
298+ t = self.get_transport('target')
299+ st_a = t.stat('a')
300+ st_b = t.stat('b')
301+ self.assertEqual(42.0, st_b.st_mtime)
302+ self.assertEqual(3423.0, st_a.st_mtime)