Merge lp:~jml/launchpad/better-sourcecode-update into lp:launchpad

Proposed by Jonathan Lange
Status: Merged
Merged at revision: not available
Proposed branch: lp:~jml/launchpad/better-sourcecode-update
Merge into: lp:launchpad
Diff against target: None lines
To merge this branch: bzr merge lp:~jml/launchpad/better-sourcecode-update
Reviewer Review Type Date Requested Status
Michael Hudson-Doyle Approve
Review via email: mp+11580@code.launchpad.net

This proposal supersedes a proposal from 2009-09-10.

To post a comment you must log in.
Revision history for this message
Jonathan Lange (jml) wrote : Posted in a previous version of this proposal

This branch adds a script that updates the branches in sourcecode, using Bazaar itself. Although rocketfuel-get already does this, the facility for doing so is untested and grouped together with a lot of other code.

By extracting the behaviour out into its own script, and indeed its owned tested functions, this patch opens up opportunities for ec2 image generation, and for getting us closer to being able to get started with Launchpad development without running rocketfuel-setup.

The tests are reasonably complete, in terms of logic. However, they begin to fall down at the point of actually doing stuff with Bazaar. This is mostly because I got stuck there, not able to figure out how to test that bit sanely.

For the utility to be useful, it really should have a nice command-line UI, be well-documented and provide good errors. It doesn't do that right now, and I'd appreciate it if you pushed me into making it do so, along with some specific notes.

One interesting thing about this branch is the way it makes a new top-level package called 'devutils'. It adds this package to the bin/test template so we can actually run the tests ("./bin/test -cvv devutils", btw). The script itself is put in utilities/ and deliberately kept as empty as possible.

The script takes about 2m to run when there's nothing to actually do. rsync takes about 12s to do the same. Although I have no way of actually measuring this, I'm confident that the script will be faster than rsync when there actually are updates. I don't really know where that extra 6s per branch is going.

There are two things I can think of to take this work further. One is to have the script be symlink-aware, so that it can be run on the sourcecode/ directory of a branch, but actually make new branches in the _real_ sourcecode directory. The second is to have rocketfuel-get use this script.

That's about all I can think of. Your opinions are greatly valued.

jml

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

I think you should probably add a test that the utilities/sourcedeps.conf file can be parsed by your code. Otherwise, I think it's fine, land it :)

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file '.bzrignore'
2--- .bzrignore 2009-08-31 01:16:03 +0000
3+++ .bzrignore 2009-09-10 06:11:21 +0000
4@@ -50,3 +50,4 @@
5 ./_pythonpath.py
6 ./production-configs
7 bzr.dev
8+_trial_temp
9
10=== modified file 'buildout-templates/bin/test.in'
11--- buildout-templates/bin/test.in 2009-08-10 22:08:05 +0000
12+++ buildout-templates/bin/test.in 2009-09-10 00:17:39 +0000
13@@ -140,6 +140,7 @@
14 '--test-path=${buildout:directory}/lib',
15 '--package=canonical',
16 '--package=lp',
17+ '--package=devscripts',
18 '--layer=!MailmanLayer',
19 ]
20
21
22=== added directory 'lib/devscripts'
23=== added file 'lib/devscripts/__init__.py'
24--- lib/devscripts/__init__.py 1970-01-01 00:00:00 +0000
25+++ lib/devscripts/__init__.py 2009-09-09 03:27:47 +0000
26@@ -0,0 +1,4 @@
27+# Copyright 2009 Canonical Ltd. This software is licensed under the
28+# GNU Affero General Public License version 3 (see the file LICENSE).
29+
30+"""Scripts that are used in developing Launchpad."""
31
32=== added file 'lib/devscripts/sourcecode.py'
33--- lib/devscripts/sourcecode.py 1970-01-01 00:00:00 +0000
34+++ lib/devscripts/sourcecode.py 2009-09-11 05:01:06 +0000
35@@ -0,0 +1,199 @@
36+# Copyright 2009 Canonical Ltd. This software is licensed under the
37+# GNU Affero General Public License version 3 (see the file LICENSE).
38+
39+"""Tools for maintaining the Launchpad source code."""
40+
41+__metaclass__ = type
42+__all__ = [
43+ 'interpret_config',
44+ 'parse_config_file',
45+ 'plan_update',
46+ ]
47+
48+import os
49+import shutil
50+import sys
51+
52+from bzrlib.branch import Branch
53+from bzrlib.bzrdir import BzrDir
54+from bzrlib.errors import BzrError
55+from bzrlib.plugin import load_plugins
56+from bzrlib.trace import report_exception
57+from bzrlib.transport import get_transport
58+from bzrlib.workingtree import WorkingTree
59+
60+
61+def parse_config_file(file_handle):
62+ """Parse the source code config file 'file_handle'.
63+
64+ :param file_handle: A file-like object containing sourcecode
65+ configuration.
66+ :return: A sequence of lines of either '[key, value]' or
67+ '[key, value, optional]'.
68+ """
69+ for line in file_handle:
70+ if line.startswith('#'):
71+ continue
72+ yield [token.strip() for token in line.split('=')]
73+
74+
75+def interpret_config_entry(entry):
76+ """Interpret a single parsed line from the config file."""
77+ return (entry[0], (entry[1], len(entry) > 2))
78+
79+
80+def interpret_config(config_entries):
81+ """Interpret a configuration stream, as parsed by 'parse_config_file'.
82+
83+ :param configuration: A sequence of parsed configuration entries.
84+ :return: A dict mapping the names of the sourcecode dependencies to a
85+ 2-tuple of their branches and whether or not they are optional.
86+ """
87+ return dict(map(interpret_config_entry, config_entries))
88+
89+
90+def _subset_dict(d, keys):
91+ """Return a dict that's a subset of 'd', based on the keys in 'keys'."""
92+ return dict((key, d[key]) for key in keys)
93+
94+
95+def plan_update(existing_branches, configuration):
96+ """Plan the update to existing branches based on 'configuration'.
97+
98+ :param existing_branches: A sequence of branches that already exist.
99+ :param configuration: A dictionary of sourcecode configuration, such as is
100+ returned by `interpret_config`.
101+ :return: (new_branches, update_branches, removed_branches), where
102+ 'new_branches' are the branches in the configuration that don't exist
103+ yet, 'update_branches' are the branches in the configuration that do
104+ exist, and 'removed_branches' are the branches that exist locally, but
105+ not in the configuration. 'new_branches' and 'update_branches' are
106+ dicts of the same form as 'configuration', 'removed_branches' is a
107+ set of the same form as 'existing_branches'.
108+ """
109+ existing_branches = set(existing_branches)
110+ config_branches = set(configuration.keys())
111+ new_branches = config_branches - existing_branches
112+ removed_branches = existing_branches - config_branches
113+ update_branches = config_branches.intersection(existing_branches)
114+ return (
115+ _subset_dict(configuration, new_branches),
116+ _subset_dict(configuration, update_branches),
117+ removed_branches)
118+
119+
120+def find_branches(directory):
121+ """List the directory names in 'directory' that are branches."""
122+ transport = get_transport(directory)
123+ return (
124+ os.path.basename(branch.base.rstrip('/'))
125+ for branch in BzrDir.find_branches(transport))
126+
127+
128+def get_branches(sourcecode_directory, new_branches,
129+ possible_transports=None):
130+ """Get the new branches into sourcecode."""
131+ for project, (branch_url, optional) in new_branches.iteritems():
132+ destination = os.path.join(sourcecode_directory, project)
133+ remote_branch = Branch.open(
134+ branch_url, possible_transports=possible_transports)
135+ possible_transports.append(
136+ remote_branch.bzrdir.root_transport)
137+ print 'Getting %s from %s' % (project, branch_url)
138+ # If the 'optional' flag is set, then it's a branch that shares
139+ # history with Launchpad, so we should share repositories. Otherwise,
140+ # we should avoid sharing repositories to avoid format
141+ # incompatibilities.
142+ force_new_repo = not optional
143+ try:
144+ remote_branch.bzrdir.sprout(
145+ destination, create_tree_if_local=True,
146+ source_branch=remote_branch, force_new_repo=force_new_repo,
147+ possible_transports=possible_transports)
148+ except BzrError:
149+ if optional:
150+ report_exception(sys.exc_info(), sys.stderr)
151+ else:
152+ raise
153+
154+
155+def update_branches(sourcecode_directory, update_branches,
156+ possible_transports=None):
157+ """Update the existing branches in sourcecode."""
158+ if possible_transports is None:
159+ possible_transports = []
160+ # XXX: JonathanLange 2009-11-09: Rather than updating one branch after
161+ # another, we could instead try to get them in parallel.
162+ for project, (branch_url, optional) in update_branches.iteritems():
163+ destination = os.path.join(sourcecode_directory, project)
164+ print 'Updating %s' % (project,)
165+ local_tree = WorkingTree.open(destination)
166+ remote_branch = Branch.open(
167+ branch_url, possible_transports=possible_transports)
168+ possible_transports.append(
169+ remote_branch.bzrdir.root_transport)
170+ try:
171+ local_tree.pull(
172+ remote_branch,
173+ possible_transports=possible_transports)
174+ except BzrError:
175+ if optional:
176+ report_exception(sys.exc_info(), sys.stderr)
177+ else:
178+ raise
179+
180+
181+def remove_branches(sourcecode_directory, removed_branches):
182+ """Remove sourcecode that's no longer there."""
183+ for project in removed_branches:
184+ destination = os.path.join(sourcecode_directory, project)
185+ print 'Removing %s' % project
186+ try:
187+ shutil.rmtree(destination)
188+ except OSError:
189+ os.unlink(destination)
190+
191+
192+def update_sourcecode(sourcecode_directory, config_filename):
193+ """Update the sourcecode."""
194+ config_file = open(config_filename)
195+ config = interpret_config(parse_config_file(config_file))
196+ config_file.close()
197+ branches = find_branches(sourcecode_directory)
198+ new, updated, removed = plan_update(branches, config)
199+ possible_transports = []
200+ get_branches(sourcecode_directory, new, possible_transports)
201+ update_branches(sourcecode_directory, updated, possible_transports)
202+ remove_branches(sourcecode_directory, removed)
203+
204+
205+def get_launchpad_root():
206+ return os.path.dirname(os.path.dirname(os.path.dirname(__file__)))
207+
208+
209+# XXX: JonathanLange 2009-09-11: By default, the script will operate on the
210+# current checkout. Most people only have symlinks to sourcecode in their
211+# checkouts. This is fine for updating, but breaks for removing (you can't
212+# shutil.rmtree a symlink) and breaks for adding, since it adds the new branch
213+# to the checkout, rather than to the shared sourcecode area. Ideally, the
214+# script would see that the sourcecode directory is full of symlinks and then
215+# follow these symlinks to find the shared source directory. If the symlinks
216+# differ from each other (because of developers fiddling with things), we can
217+# take a survey of all of them, and choose the most popular.
218+
219+
220+def main(args):
221+ root = get_launchpad_root()
222+ if len(args) > 1:
223+ sourcecode_directory = args[1]
224+ else:
225+ sourcecode_directory = os.path.join(root, 'sourcecode')
226+ if len(args) > 2:
227+ config_filename = args[2]
228+ else:
229+ config_filename = os.path.join(root, 'utilities', 'sourcedeps.conf')
230+ print 'Sourcecode: %s' % (sourcecode_directory,)
231+ print 'Config: %s' % (config_filename,)
232+ load_plugins()
233+ update_sourcecode(sourcecode_directory, config_filename)
234+ return 0
235
236=== added directory 'lib/devscripts/tests'
237=== added file 'lib/devscripts/tests/__init__.py'
238--- lib/devscripts/tests/__init__.py 1970-01-01 00:00:00 +0000
239+++ lib/devscripts/tests/__init__.py 2009-09-09 03:27:47 +0000
240@@ -0,0 +1,4 @@
241+# Copyright 2009 Canonical Ltd. This software is licensed under the
242+# GNU Affero General Public License version 3 (see the file LICENSE).
243+
244+"""Tests for devscripts."""
245
246=== added file 'lib/devscripts/tests/test_sourcecode.py'
247--- lib/devscripts/tests/test_sourcecode.py 1970-01-01 00:00:00 +0000
248+++ lib/devscripts/tests/test_sourcecode.py 2009-09-11 04:57:50 +0000
249@@ -0,0 +1,156 @@
250+# Copyright 2009 Canonical Ltd. This software is licensed under the
251+# GNU Affero General Public License version 3 (see the file LICENSE).
252+
253+"""Module docstring goes here."""
254+
255+__metaclass__ = type
256+
257+import shutil
258+from StringIO import StringIO
259+import tempfile
260+import unittest
261+
262+from bzrlib.bzrdir import BzrDir
263+from bzrlib.tests import TestCase
264+from bzrlib.transport import get_transport
265+
266+from devscripts.sourcecode import (
267+ find_branches, interpret_config, parse_config_file, plan_update)
268+
269+
270+class TestParseConfigFile(unittest.TestCase):
271+ """Tests for the config file parser."""
272+
273+ def makeFile(self, contents):
274+ return StringIO(contents)
275+
276+ def test_empty(self):
277+ # Parsing an empty config file returns an empty sequence.
278+ empty_file = self.makeFile("")
279+ self.assertEqual([], list(parse_config_file(empty_file)))
280+
281+ def test_single_value(self):
282+ # Parsing a file containing a single key=value pair returns a sequence
283+ # containing the (key, value) as a list.
284+ config_file = self.makeFile("key=value")
285+ self.assertEqual(
286+ [['key', 'value']], list(parse_config_file(config_file)))
287+
288+ def test_comment_ignored(self):
289+ # If a line begins with a '#', then its a comment.
290+ comment_only = self.makeFile('# foo')
291+ self.assertEqual([], list(parse_config_file(comment_only)))
292+
293+ def test_optional_value(self):
294+ # Lines in the config file can have a third optional entry.
295+ config_file = self.makeFile('key=value=optional')
296+ self.assertEqual(
297+ [['key', 'value', 'optional']],
298+ list(parse_config_file(config_file)))
299+
300+ def test_whitespace_stripped(self):
301+ # Any whitespace around any of the tokens in the config file are
302+ # stripped out.
303+ config_file = self.makeFile(' key = value = optional ')
304+ self.assertEqual(
305+ [['key', 'value', 'optional']],
306+ list(parse_config_file(config_file)))
307+
308+
309+class TestInterpretConfiguration(unittest.TestCase):
310+ """Tests for the configuration interpreter."""
311+
312+ def test_empty(self):
313+ # An empty configuration stream means no configuration.
314+ config = interpret_config([])
315+ self.assertEqual({}, config)
316+
317+ def test_key_value(self):
318+ # A (key, value) pair without a third optional value is returned in
319+ # the configuration as a dictionary entry under 'key' with '(value,
320+ # False)' as its value.
321+ config = interpret_config([['key', 'value']])
322+ self.assertEqual({'key': ('value', False)}, config)
323+
324+ def test_key_value_optional(self):
325+ # A (key, value, optional) entry is returned in the configuration as a
326+ # dictionary entry under 'key' with '(value, True)' as its value.
327+ config = interpret_config([['key', 'value', 'optional']])
328+ self.assertEqual({'key': ('value', True)}, config)
329+
330+
331+class TestPlanUpdate(unittest.TestCase):
332+ """Tests for how to plan the update."""
333+
334+ def test_trivial(self):
335+ # In the trivial case, there are no existing branches and no
336+ # configured branches, so there are no branches to add, none to
337+ # update, and none to remove.
338+ new, existing, removed = plan_update([], {})
339+ self.assertEqual({}, new)
340+ self.assertEqual({}, existing)
341+ self.assertEqual(set(), removed)
342+
343+ def test_all_new(self):
344+ # If there are no existing branches, then the all of the configured
345+ # branches are new, none are existing and none have been removed.
346+ new, existing, removed = plan_update([], {'a': ('b', False)})
347+ self.assertEqual({'a': ('b', False)}, new)
348+ self.assertEqual({}, existing)
349+ self.assertEqual(set(), removed)
350+
351+ def test_all_old(self):
352+ # If there configuration is now empty, but there are existing
353+ # branches, then that means all the branches have been removed from
354+ # the configuration, none are new and none are updated.
355+ new, existing, removed = plan_update(['a', 'b', 'c'], {})
356+ self.assertEqual({}, new)
357+ self.assertEqual({}, existing)
358+ self.assertEqual(set(['a', 'b', 'c']), removed)
359+
360+ def test_all_same(self):
361+ # If the set of existing branches is the same as the set of
362+ # non-existing branches, then they all need to be updated.
363+ config = {'a': ('b', False), 'c': ('d', True)}
364+ new, existing, removed = plan_update(config.keys(), config)
365+ self.assertEqual({}, new)
366+ self.assertEqual(config, existing)
367+ self.assertEqual(set(), removed)
368+
369+
370+class TestFindBranches(TestCase):
371+ """Tests the way that we find branches."""
372+
373+ def makeBranch(self, path):
374+ transport = get_transport(path)
375+ transport.ensure_base()
376+ BzrDir.create_branch_convenience(
377+ transport.base, possible_transports=[transport])
378+
379+ def makeDirectory(self):
380+ directory = tempfile.mkdtemp()
381+ self.addCleanup(shutil.rmtree, directory)
382+ return directory
383+
384+ def test_empty_directory_has_no_branches(self):
385+ # An empty directory has no branches.
386+ empty = self.makeDirectory()
387+ self.assertEqual([], list(find_branches(empty)))
388+
389+ def test_directory_with_branches(self):
390+ # find_branches finds branches in the directory.
391+ directory = self.makeDirectory()
392+ self.makeBranch('%s/a' % directory)
393+ self.assertEqual(['a'], list(find_branches(directory)))
394+
395+ def test_ignores_files(self):
396+ # find_branches ignores any files in the directory.
397+ directory = self.makeDirectory()
398+ some_file = open('%s/a' % directory, 'w')
399+ some_file.write('hello\n')
400+ some_file.close()
401+ self.assertEqual([], list(find_branches(directory)))
402+
403+
404+def test_suite():
405+ return unittest.TestLoader().loadTestsFromName(__name__)
406
407=== modified file 'utilities/rocketfuel-get'
408--- utilities/rocketfuel-get 2009-07-24 15:37:03 +0000
409+++ utilities/rocketfuel-get 2009-09-11 05:01:06 +0000
410@@ -64,45 +64,9 @@
411 echo " See https://bugs.edge.launchpad.net/bzr/+bug/401617 for why."
412 echo ""
413 echo " Sourcedeps: $LP_SOURCEDEPS_PATH"
414-while IFS="=" read package branch optional
415-do
416- package_path="$LP_SOURCEDEPS_PATH/$package"
417- echo " Checking $package_path"
418- if [ "$optional" ]
419- then
420- # The 'set +e' below is so that we don't exit when the 'bzr log' call
421- # returns an error.
422- set +e
423- bzr log -l 1 -q "$branch" &> /dev/null
424- retval=$?
425- set -e
426- if [ $retval -ne 0 ]
427- then
428- # We can't reach this branch, but it's optional anyway, so we'll
429- # just move on.
430- echo "Couldn't find $branch, skipping it."
431- continue
432- fi
433- if [ -d "$package_path" ]
434- then
435- run-child bzr pull --remember "$branch" --directory "$package_path"
436- else
437- # This is either shipit or the openid-provider, and they share the
438- # history with our main tree, so we'll do this hack to stack them
439- # onto LP's trunk branch.
440- run-child bzr branch --no-tree --stacked "$LP_TRUNK_PATH" "$package_path"
441- run-child bzr pull --overwrite "$branch" -d "$package_path"
442- run-child bzr checkout "$package_path" "$package_path"
443- fi
444- else
445- if [ -d "$package_path" ]
446- then
447- run-child bzr pull --remember "$branch" --directory "$package_path"
448- else
449- run-child bzr branch --standalone "$branch" "$package_path"
450- fi
451- fi
452-done < "$sourcedeps_conf"
453+
454+run-child $LP_TRUNK_PATH/utilities/update-sourcecode $LP_SOURCEDEPS_PATH $sourcedeps_conf
455+
456
457 # Update the current trees in the repo.
458 echo "Updating sourcecode dependencies in current local branches:"
459
460=== added file 'utilities/update-sourcecode'
461--- utilities/update-sourcecode 1970-01-01 00:00:00 +0000
462+++ utilities/update-sourcecode 2009-09-10 04:31:12 +0000
463@@ -0,0 +1,16 @@
464+#!/usr/bin/python2.4
465+#
466+# Copyright 2009 Canonical Ltd. This software is licensed under the
467+# GNU Affero General Public License version 3 (see the file LICENSE).
468+
469+"""Update the sourcecode managed in sourcecode/."""
470+
471+import _pythonpath
472+
473+import sys
474+
475+from devscripts import sourcecode
476+
477+
478+if __name__ == '__main__':
479+ sys.exit(sourcecode.main(sys.argv))