Merge lp:~james-w/launchpad/drop-filecmp into lp:launchpad

Proposed by James Westby
Status: Work in progress
Proposed branch: lp:~james-w/launchpad/drop-filecmp
Merge into: lp:launchpad
Diff against target: 98 lines (+14/-28)
2 files modified
lib/lp/archivepublisher/scripts/generate_ppa_htaccess.py (+6/-12)
lib/lp/archivepublisher/tests/test_generate_ppa_htaccess.py (+8/-16)
To merge this branch: bzr merge lp:~james-w/launchpad/drop-filecmp
Reviewer Review Type Date Requested Status
Robert Collins (community) Needs Fixing
Review via email: mp+108021@code.launchpad.net

Commit message

Drop the use of filecmp from generate_ppa_htaccess

Description of the change

= Summary =

When looking at generate_ppa_htaccess we noticed a pessimism
for the common case that we suspect will be adding extra
I/O operations to the run of the script, and so not be helping
with germanium's load.

This branch drops that, and moves to just overwriting the current
.htpasswd file in all cases, even if it wouldn't change the content.

== Proposed fix ==

Do that.

== Pre-implementation notes ==

None.

== LOC Rationale ==

-14 lines.

== Implementation details ==

Deletes the code. Also drops the return value and the removal
of the file, because it will now always be renamed.

== Tests ==

Adjusts the test case for the old behaviour to just test
the new behaviour.

== Demo and Q/A ==

A run of generate_ppa_htaccess with a new subscriber for a
private PPA should suffice.

= Launchpad lint =

Checking for conflicts and issues in changed files.

Linting changed files:
  lib/lp/archivepublisher/scripts/generate_ppa_htaccess.py
  lib/lp/archivepublisher/tests/test_generate_ppa_htaccess.py

To post a comment you must log in.
Revision history for this message
Robert Collins (lifeless) wrote :

Do we have data on this? Replacing an unaltered file will cause apache to re-read it, which will increase load. That is, I believe, why the cmp is in there in the first place. Have we got data on how long the cmp is taking?

I ask because this is exactly the sort of small tweak that can bite us rather hard.

Perhaps doing this under a feature flag will let it be phase in under examination and see if it has any impact.

review: Needs Fixing
Revision history for this message
James Westby (james-w) wrote :

On Wed, 30 May 2012 22:35:20 -0000, Robert Collins <email address hidden> wrote:
> Review: Needs Fixing
>
> Do we have data on this? Replacing an unaltered file will cause apache
> to re-read it, which will increase load. That is, I believe, why the
> cmp is in there in the first place. Have we got data on how long the
> cmp is taking?

Good point. We have no data at this point, this was just something we
thought might help, but just based on considering the write side of
things, and not apache reading the files.

> I ask because this is exactly the sort of small tweak that can bite us rather hard.

Yeah, but so can performance optimisations done without data, such as
checking if the file has changed before overwriting it :-)

> Perhaps doing this under a feature flag will let it be phase in under
> examination and see if it has any impact.

If a feature flag is acceptable for this sort of thing then I agree that
it would be preferable to landing it without a trivial way to rollback
if it has an adverse affect.

Other ways to do that would be adding a config option, or a command line
option. A feature flag would definitely be cheaper if it is an
acceptable use for them.

Thanks,

James

Revision history for this message
Robert Collins (lifeless) wrote :

Feature flags are designed for this. I don't know if the cmp was added
based on data or speculation; its speculation that it was added on
speculation :)

Anyhow, lets feature flag this up, and get the load graphs on
germanium working before we test it, and subsequent to that we can
start to see if it is a net win, loss, or noise.

Unmerged revisions

15228. By James Westby

Drop the use of filecmp in generate_ppa_htaccess.

The common case in this script is for the file to have changed, as
it carefully looks for changed PPAs/tokens. Therefore confirming
that with filecmp is just extra I/O in the common case, and that
is certainly undesirable currently on germanium.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/archivepublisher/scripts/generate_ppa_htaccess.py'
2--- lib/lp/archivepublisher/scripts/generate_ppa_htaccess.py 2012-05-10 21:44:21 +0000
3+++ lib/lp/archivepublisher/scripts/generate_ppa_htaccess.py 2012-05-30 17:22:22 +0000
4@@ -9,7 +9,6 @@
5 datetime,
6 timedelta,
7 )
8-import filecmp
9 import os
10 import tempfile
11
12@@ -104,21 +103,17 @@
13 :return: True if the file was replaced.
14 """
15 if self.options.dryrun:
16- return False
17+ os.remove(temp_htpasswd_file)
18+ return
19
20 # The publisher Config object does not have an
21 # interface, so we need to remove the security wrapper.
22 pub_config = getPubConfig(ppa)
23 htpasswd_filename = os.path.join(pub_config.htaccessroot, ".htpasswd")
24
25- if (not os.path.isfile(htpasswd_filename) or
26- not filecmp.cmp(htpasswd_filename, temp_htpasswd_file)):
27- # Atomically replace the old file or create a new file.
28- os.rename(temp_htpasswd_file, htpasswd_filename)
29- self.logger.debug("Replaced htpasswd for %s" % ppa.displayname)
30- return True
31-
32- return False
33+ # Atomically replace the old file or create a new file.
34+ os.rename(temp_htpasswd_file, htpasswd_filename)
35+ self.logger.debug("Replaced htpasswd for %s" % ppa.displayname)
36
37 def sendCancellationEmail(self, token):
38 """Send an email to the person whose subscription was cancelled."""
39@@ -323,8 +318,7 @@
40
41 self.ensureHtaccess(ppa)
42 temp_htpasswd = self.generateHtpasswd(ppa, valid_tokens)
43- if not self.replaceUpdatedHtpasswd(ppa, temp_htpasswd):
44- os.remove(temp_htpasswd)
45+ self.replaceUpdatedHtpasswd(ppa, temp_htpasswd)
46
47 if self.options.no_deactivation or self.options.dryrun:
48 self.logger.info('Dry run, so not committing transaction.')
49
50=== modified file 'lib/lp/archivepublisher/tests/test_generate_ppa_htaccess.py'
51--- lib/lp/archivepublisher/tests/test_generate_ppa_htaccess.py 2012-05-10 21:44:21 +0000
52+++ lib/lp/archivepublisher/tests/test_generate_ppa_htaccess.py 2012-05-30 17:22:22 +0000
53@@ -163,8 +163,7 @@
54 os.remove(filename)
55
56 def testReplaceUpdatedHtpasswd(self):
57- """Test that the htpasswd file is only replaced if it changes."""
58- FILE_CONTENT = "Kneel before Zod!"
59+ """Test that the htpasswd file is replaced."""
60 # The publisher Config object does not have an interface, so we
61 # need to remove the security wrapper.
62 pub_config = getPubConfig(self.ppa)
63@@ -174,28 +173,21 @@
64 if not os.path.exists(pub_config.htaccessroot):
65 os.makedirs(pub_config.htaccessroot)
66 file = open(filename, "w")
67- file.write(FILE_CONTENT)
68+ file.write("Kneel before Zod!")
69 file.close()
70
71 # Write the same contents in a temp file.
72+ expected_content = "Come to me, son of Jor-El!"
73 fd, temp_filename = tempfile.mkstemp(dir=pub_config.htaccessroot)
74 file = os.fdopen(fd, "w")
75- file.write(FILE_CONTENT)
76+ file.write(expected_content)
77 file.close()
78
79- # Replacement should not happen.
80 script = self.getScript()
81- self.assertFalse(
82- script.replaceUpdatedHtpasswd(self.ppa, temp_filename))
83-
84- # Writing a different .htpasswd should see it get replaced.
85- file = open(filename, "w")
86- file.write("Come to me, son of Jor-El!")
87- file.close()
88-
89- self.assertTrue(
90- script.replaceUpdatedHtpasswd(self.ppa, temp_filename))
91-
92+ script.replaceUpdatedHtpasswd(self.ppa, temp_filename)
93+
94+ with open(filename, 'r') as f:
95+ self.assertEqual(expected_content, f.read())
96 os.remove(filename)
97
98 def assertDeactivated(self, token):