Merge lp:~wgrant/launchpad/gzip-n into lp:launchpad

Proposed by William Grant
Status: Merged
Merged at revision: 18220
Proposed branch: lp:~wgrant/launchpad/gzip-n
Merge into: lp:launchpad
Diff against target: 42 lines (+12/-2)
2 files modified
lib/lp/archivepublisher/tests/test_repositoryindexfile.py (+8/-1)
lib/lp/archivepublisher/utils.py (+4/-1)
To merge this branch: bzr merge lp:~wgrant/launchpad/gzip-n
Reviewer Review Type Date Requested Status
Colin Watson (community) Approve
Review via email: mp+307500@code.launchpad.net

Commit message

Fix RepositoryIndexFile to gzip without timestamps.

Description of the change

Fix RepositoryIndexFile to gzip without timestamps.

Avoids polluting by-hash directories with dozens of identical gzipped files
when index content doesn't otherwise change. Also prevents some needless hash
sum mismatch errors from apt.

See eg. the 30ish 544B gzips in http://ppa.launchpad.net/varlesh-l/test/ubuntu/dists/xenial/main/source/by-hash/SHA256/.

bzip2 and xz don't store mtime, so aren't affected.

To post a comment you must log in.
Revision history for this message
Colin Watson (cjwatson) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/archivepublisher/tests/test_repositoryindexfile.py'
2--- lib/lp/archivepublisher/tests/test_repositoryindexfile.py 2016-02-05 20:28:29 +0000
3+++ lib/lp/archivepublisher/tests/test_repositoryindexfile.py 2016-10-04 01:28:49 +0000
4@@ -99,7 +99,8 @@
5 repo_file.write('hello')
6 repo_file.close()
7
8- gzip_content = gzip.open(os.path.join(self.root, 'boing.gz')).read()
9+ gzip_file = gzip.open(os.path.join(self.root, 'boing.gz'))
10+ gzip_content = gzip_file.read()
11 bz2_content = bz2.decompress(
12 open(os.path.join(self.root, 'boing.bz2')).read())
13 xz_content = lzma.open(os.path.join(self.root, 'boing.xz')).read()
14@@ -108,6 +109,12 @@
15 self.assertEqual(gzip_content, xz_content)
16 self.assertEqual('hello', gzip_content)
17
18+ # gzip is compressed as if with "-n", ensuring that the hash
19+ # doesn't change just because we're compressing at a different
20+ # point in time. The filename is also blank, but Python's gzip
21+ # module discards it so it's hard to test.
22+ self.assertEqual(0, gzip_file.mtime)
23+
24 def testCompressors(self):
25 """`RepositoryIndexFile` honours the supplied list of compressors."""
26 repo_file = self.getRepoFile(
27
28=== modified file 'lib/lp/archivepublisher/utils.py'
29--- lib/lp/archivepublisher/utils.py 2016-06-06 15:38:54 +0000
30+++ lib/lp/archivepublisher/utils.py 2016-10-04 01:28:49 +0000
31@@ -85,7 +85,10 @@
32 suffix = '.gz'
33
34 def _buildFile(self, fd):
35- return gzip.GzipFile(fileobj=os.fdopen(fd, "wb"))
36+ # Blank the filename and mtime as if using "gzip -n" to avoid
37+ # needless hash changes.
38+ return gzip.GzipFile(
39+ fileobj=os.fdopen(fd, "wb"), filename='', mtime=0)
40
41
42 class Bzip2TempFile(PlainTempFile):