Merge lp:~jml/launchpad/better-sourcecode-update into lp:launchpad

Proposed by Jonathan Lange
Status: Superseded
Proposed branch: lp:~jml/launchpad/better-sourcecode-update
Merge into: lp:launchpad
Diff against target: None lines
To merge this branch: bzr merge lp:~jml/launchpad/better-sourcecode-update
Reviewer Review Type Date Requested Status
Michael Hudson-Doyle Pending
Review via email: mp+11502@code.launchpad.net

This proposal has been superseded by a proposal from 2009-09-11.

To post a comment you must log in.
Revision history for this message
Jonathan Lange (jml) wrote :

This branch adds a script that updates the branches in sourcecode, using Bazaar itself. Although rocketfuel-get already does this, the facility for doing so is untested and grouped together with a lot of other code.

By extracting the behaviour out into its own script, and indeed its owned tested functions, this patch opens up opportunities for ec2 image generation, and for getting us closer to being able to get started with Launchpad development without running rocketfuel-setup.

The tests are reasonably complete, in terms of logic. However, they begin to fall down at the point of actually doing stuff with Bazaar. This is mostly because I got stuck there, not able to figure out how to test that bit sanely.

For the utility to be useful, it really should have a nice command-line UI, be well-documented and provide good errors. It doesn't do that right now, and I'd appreciate it if you pushed me into making it do so, along with some specific notes.

One interesting thing about this branch is the way it makes a new top-level package called 'devutils'. It adds this package to the bin/test template so we can actually run the tests ("./bin/test -cvv devutils", btw). The script itself is put in utilities/ and deliberately kept as empty as possible.

The script takes about 2m to run when there's nothing to actually do. rsync takes about 12s to do the same. Although I have no way of actually measuring this, I'm confident that the script will be faster than rsync when there actually are updates. I don't really know where that extra 6s per branch is going.

There are two things I can think of to take this work further. One is to have the script be symlink-aware, so that it can be run on the sourcecode/ directory of a branch, but actually make new branches in the _real_ sourcecode directory. The second is to have rocketfuel-get use this script.

That's about all I can think of. Your opinions are greatly valued.

jml

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file '.bzrignore'
2--- .bzrignore 2009-08-31 01:16:03 +0000
3+++ .bzrignore 2009-09-10 06:11:21 +0000
4@@ -50,3 +50,4 @@
5 ./_pythonpath.py
6 ./production-configs
7 bzr.dev
8+_trial_temp
9
10=== modified file 'buildout-templates/bin/test.in'
11--- buildout-templates/bin/test.in 2009-08-10 22:08:05 +0000
12+++ buildout-templates/bin/test.in 2009-09-10 00:17:39 +0000
13@@ -140,6 +140,7 @@
14 '--test-path=${buildout:directory}/lib',
15 '--package=canonical',
16 '--package=lp',
17+ '--package=devscripts',
18 '--layer=!MailmanLayer',
19 ]
20
21
22=== added directory 'lib/devscripts'
23=== added file 'lib/devscripts/__init__.py'
24--- lib/devscripts/__init__.py 1970-01-01 00:00:00 +0000
25+++ lib/devscripts/__init__.py 2009-09-09 03:27:47 +0000
26@@ -0,0 +1,4 @@
27+# Copyright 2009 Canonical Ltd. This software is licensed under the
28+# GNU Affero General Public License version 3 (see the file LICENSE).
29+
30+"""Scripts that are used in developing Launchpad."""
31
32=== added file 'lib/devscripts/sourcecode.py'
33--- lib/devscripts/sourcecode.py 1970-01-01 00:00:00 +0000
34+++ lib/devscripts/sourcecode.py 2009-09-10 06:04:42 +0000
35@@ -0,0 +1,163 @@
36+# Copyright 2009 Canonical Ltd. This software is licensed under the
37+# GNU Affero General Public License version 3 (see the file LICENSE).
38+
39+"""Tools for maintaining the Launchpad source code."""
40+
41+__metaclass__ = type
42+__all__ = [
43+ 'interpret_config',
44+ 'parse_config_file',
45+ 'plan_update',
46+ ]
47+
48+import os
49+import shutil
50+
51+from bzrlib.branch import Branch
52+from bzrlib.bzrdir import BzrDir
53+from bzrlib.plugin import load_plugins
54+from bzrlib.transport import get_transport
55+from bzrlib.workingtree import WorkingTree
56+
57+
58+def parse_config_file(file_handle):
59+ """Parse the source code config file 'file_handle'.
60+
61+ :param file_handle: A file-like object containing sourcecode
62+ configuration.
63+ :return: A sequence of lines of either '[key, value]' or
64+ '[key, value, optional]'.
65+ """
66+ for line in file_handle:
67+ if line.startswith('#'):
68+ continue
69+ yield [token.strip() for token in line.split('=')]
70+
71+
72+def interpret_config_entry(entry):
73+ """Interpret a single parsed line from the config file."""
74+ return (entry[0], (entry[1], len(entry) > 2))
75+
76+
77+def interpret_config(config_entries):
78+ """Interpret a configuration stream, as parsed by 'parse_config_file'.
79+
80+ :param configuration: A sequence of parsed configuration entries.
81+ :return: A dict mapping the names of the sourcecode dependencies to a
82+ 2-tuple of their branches and whether or not they are optional.
83+ """
84+ return dict(map(interpret_config_entry, config_entries))
85+
86+
87+def _subset_dict(d, keys):
88+ """Return a dict that's a subset of 'd', based on the keys in 'keys'."""
89+ return dict((key, d[key]) for key in keys)
90+
91+
92+def plan_update(existing_branches, configuration):
93+ """Plan the update to existing branches based on 'configuration'.
94+
95+ :param existing_branches: A sequence of branches that already exist.
96+ :param configuration: A dictionary of sourcecode configuration, such as is
97+ returned by `interpret_config`.
98+ :return: (new_branches, update_branches, removed_branches), where
99+ 'new_branches' are the branches in the configuration that don't exist
100+ yet, 'update_branches' are the branches in the configuration that do
101+ exist, and 'removed_branches' are the branches that exist locally, but
102+ not in the configuration. 'new_branches' and 'update_branches' are
103+ dicts of the same form as 'configuration', 'removed_branches' is a
104+ set of the same form as 'existing_branches'.
105+ """
106+ existing_branches = set(existing_branches)
107+ config_branches = set(configuration.keys())
108+ new_branches = config_branches - existing_branches
109+ removed_branches = existing_branches - config_branches
110+ update_branches = config_branches.intersection(existing_branches)
111+ return (
112+ _subset_dict(configuration, new_branches),
113+ _subset_dict(configuration, update_branches),
114+ removed_branches)
115+
116+
117+def find_branches(directory):
118+ """List the directory names in 'directory' that are branches."""
119+ transport = get_transport(directory)
120+ return (
121+ os.path.basename(branch.base.rstrip('/'))
122+ for branch in BzrDir.find_branches(transport))
123+
124+
125+def get_branches(sourcecode_directory, new_branches,
126+ possible_transports=None):
127+ """Get the new branches into sourcecode."""
128+ for project, (branch_url, optional) in new_branches.iteritems():
129+ destination = os.path.join(sourcecode_directory, project)
130+ remote_branch = Branch.open(
131+ branch_url, possible_transports=possible_transports)
132+ possible_transports.append(
133+ remote_branch.bzrdir.root_transport)
134+ print 'Getting %s from %s' % (project, branch_url)
135+ remote_branch.bzrdir.sprout(
136+ destination, create_tree_if_local=True,
137+ source_branch=remote_branch,
138+ possible_transports=possible_transports)
139+
140+
141+def update_branches(sourcecode_directory, update_branches,
142+ possible_transports=None):
143+ """Update the existing branches in sourcecode."""
144+ if possible_transports is None:
145+ possible_transports = []
146+ for project, (branch_url, optional) in update_branches.iteritems():
147+ destination = os.path.join(sourcecode_directory, project)
148+ print 'Updating %s' % (project,)
149+ local_tree = WorkingTree.open(destination)
150+ remote_branch = Branch.open(
151+ branch_url, possible_transports=possible_transports)
152+ possible_transports.append(
153+ remote_branch.bzrdir.root_transport)
154+ local_tree.pull(
155+ remote_branch,
156+ possible_transports=possible_transports)
157+
158+
159+def remove_branches(sourcecode_directory, removed_branches):
160+ """Remove sourcecode that's no longer there."""
161+ for project in removed_branches:
162+ destination = os.path.join(sourcecode_directory, project)
163+ print 'Removing %s' % project
164+ try:
165+ shutil.rmtree(destination)
166+ except OSError:
167+ os.unlink(destination)
168+
169+
170+def update_sourcecode(sourcecode_directory, config_filename):
171+ """Update the sourcecode."""
172+ config_file = open(config_filename)
173+ config = interpret_config(parse_config_file(config_file))
174+ config_file.close()
175+ branches = find_branches(sourcecode_directory)
176+ new, updated, removed = plan_update(branches, config)
177+ possible_transports = []
178+ get_branches(sourcecode_directory, new, possible_transports)
179+ update_branches(sourcecode_directory, updated, possible_transports)
180+ remove_branches(sourcecode_directory, removed)
181+
182+
183+def get_launchpad_root():
184+ return os.path.dirname(os.path.dirname(os.path.dirname(__file__)))
185+
186+
187+def main(args):
188+ root = get_launchpad_root()
189+ if len(args) > 1:
190+ sourcecode_directory = args[1]
191+ else:
192+ sourcecode_directory = os.path.join(root, 'sourcecode')
193+ config_filename = os.path.join(root, 'utilities', 'sourcedeps.conf')
194+ print 'Sourcecode: %s' % (sourcecode_directory,)
195+ print 'Config: %s' % (config_filename,)
196+ load_plugins()
197+ update_sourcecode(sourcecode_directory, config_filename)
198+ return 0
199
200=== added directory 'lib/devscripts/tests'
201=== added file 'lib/devscripts/tests/__init__.py'
202--- lib/devscripts/tests/__init__.py 1970-01-01 00:00:00 +0000
203+++ lib/devscripts/tests/__init__.py 2009-09-09 03:27:47 +0000
204@@ -0,0 +1,4 @@
205+# Copyright 2009 Canonical Ltd. This software is licensed under the
206+# GNU Affero General Public License version 3 (see the file LICENSE).
207+
208+"""Tests for devscripts."""
209
210=== added file 'lib/devscripts/tests/test_sourcecode.py'
211--- lib/devscripts/tests/test_sourcecode.py 1970-01-01 00:00:00 +0000
212+++ lib/devscripts/tests/test_sourcecode.py 2009-09-10 00:44:21 +0000
213@@ -0,0 +1,187 @@
214+# Copyright 2009 Canonical Ltd. This software is licensed under the
215+# GNU Affero General Public License version 3 (see the file LICENSE).
216+
217+"""Module docstring goes here."""
218+
219+__metaclass__ = type
220+
221+import shutil
222+from StringIO import StringIO
223+import tempfile
224+import unittest
225+
226+from bzrlib.bzrdir import BzrDir
227+from bzrlib.tests import TestCase
228+from bzrlib.transport import get_transport
229+
230+from devscripts.sourcecode import (
231+ find_branches, interpret_config, parse_config_file, plan_update)
232+
233+
234+class TestParseConfigFile(unittest.TestCase):
235+ """Tests for the config file parser."""
236+
237+ def makeFile(self, contents):
238+ return StringIO(contents)
239+
240+ def test_empty(self):
241+ # Parsing an empty config file returns an empty sequence.
242+ empty_file = self.makeFile("")
243+ self.assertEqual([], list(parse_config_file(empty_file)))
244+
245+ def test_single_value(self):
246+ # Parsing a file containing a single key=value pair returns a sequence
247+ # containing the (key, value) as a list.
248+ config_file = self.makeFile("key=value")
249+ self.assertEqual(
250+ [['key', 'value']], list(parse_config_file(config_file)))
251+
252+ def test_comment_ignored(self):
253+ # If a line begins with a '#', then its a comment.
254+ comment_only = self.makeFile('# foo')
255+ self.assertEqual([], list(parse_config_file(comment_only)))
256+
257+ def test_optional_value(self):
258+ # Lines in the config file can have a third optional entry.
259+ config_file = self.makeFile('key=value=optional')
260+ self.assertEqual(
261+ [['key', 'value', 'optional']],
262+ list(parse_config_file(config_file)))
263+
264+ def test_whitespace_stripped(self):
265+ # Any whitespace around any of the tokens in the config file are
266+ # stripped out.
267+ config_file = self.makeFile(' key = value = optional ')
268+ self.assertEqual(
269+ [['key', 'value', 'optional']],
270+ list(parse_config_file(config_file)))
271+
272+
273+class TestInterpretConfiguration(unittest.TestCase):
274+ """Tests for the configuration interpreter."""
275+
276+ def test_empty(self):
277+ # An empty configuration stream means no configuration.
278+ config = interpret_config([])
279+ self.assertEqual({}, config)
280+
281+ def test_key_value(self):
282+ # A (key, value) pair without a third optional value is returned in
283+ # the configuration as a dictionary entry under 'key' with '(value,
284+ # False)' as its value.
285+ config = interpret_config([['key', 'value']])
286+ self.assertEqual({'key': ('value', False)}, config)
287+
288+ def test_key_value_optional(self):
289+ # A (key, value, optional) entry is returned in the configuration as a
290+ # dictionary entry under 'key' with '(value, True)' as its value.
291+ config = interpret_config([['key', 'value', 'optional']])
292+ self.assertEqual({'key': ('value', True)}, config)
293+
294+
295+class TestPlanUpdate(unittest.TestCase):
296+ """Tests for how to plan the update."""
297+
298+ def test_trivial(self):
299+ # In the trivial case, there are no existing branches and no
300+ # configured branches, so there are no branches to add, none to
301+ # update, and none to remove.
302+ new, existing, removed = plan_update([], {})
303+ self.assertEqual({}, new)
304+ self.assertEqual({}, existing)
305+ self.assertEqual(set(), removed)
306+
307+ def test_all_new(self):
308+ # If there are no existing branches, then the all of the configured
309+ # branches are new, none are existing and none have been removed.
310+ new, existing, removed = plan_update([], {'a': ('b', False)})
311+ self.assertEqual({'a': ('b', False)}, new)
312+ self.assertEqual({}, existing)
313+ self.assertEqual(set(), removed)
314+
315+ def test_all_old(self):
316+ # If there configuration is now empty, but there are existing
317+ # branches, then that means all the branches have been removed from
318+ # the configuration, none are new and none are updated.
319+ new, existing, removed = plan_update(['a', 'b', 'c'], {})
320+ self.assertEqual({}, new)
321+ self.assertEqual({}, existing)
322+ self.assertEqual(set(['a', 'b', 'c']), removed)
323+
324+ def test_all_same(self):
325+ # If the set of existing branches is the same as the set of
326+ # non-existing branches, then they all need to be updated.
327+ config = {'a': ('b', False), 'c': ('d', True)}
328+ new, existing, removed = plan_update(config.keys(), config)
329+ self.assertEqual({}, new)
330+ self.assertEqual(config, existing)
331+ self.assertEqual(set(), removed)
332+
333+
334+class TestFindBranches(TestCase):
335+ """Tests the way that we find branches."""
336+
337+ def makeBranch(self, path):
338+ transport = get_transport(path)
339+ transport.ensure_base()
340+ BzrDir.create_branch_convenience(
341+ transport.base, possible_transports=[transport])
342+
343+ def makeDirectory(self):
344+ directory = tempfile.mkdtemp()
345+ self.addCleanup(shutil.rmtree, directory)
346+ return directory
347+
348+ def test_empty_directory_has_no_branches(self):
349+ # An empty directory has no branches.
350+ empty = self.makeDirectory()
351+ self.assertEqual([], list(find_branches(empty)))
352+
353+ def test_directory_with_branches(self):
354+ # find_branches finds branches in the directory.
355+ directory = self.makeDirectory()
356+ self.makeBranch('%s/a' % directory)
357+ self.assertEqual(['a'], list(find_branches(directory)))
358+
359+ def test_ignores_files(self):
360+ # find_branches ignores any files in the directory.
361+ directory = self.makeDirectory()
362+ some_file = open('%s/a' % directory, 'w')
363+ some_file.write('hello\n')
364+ some_file.close()
365+ self.assertEqual([], list(find_branches(directory)))
366+
367+
368+# XXX: Actually remove branches
369+
370+# XXX: Update existing branches
371+
372+# XXX: Branch new branches
373+
374+# XXX: Should we parallelize? If so, how? Can we rely on Twisted being
375+# present.
376+
377+# XXX: Find the sourcecode directory.
378+
379+# XXX: Handle symlinks. Lots of people manage sourcecode via symlinks
380+
381+# XXX: Actual storage location can be inferred from symlinks, since presumably
382+# they link into the actual store. However, some of the symlinks might differ
383+# (because of developers fiddling with things). We can take a survey of all of
384+# them, and choose the most popular.
385+
386+# XXX: How to report errors?
387+
388+# XXX: rocketfuel-get does stacking onto launchpad for some branches. Should
389+# we actually do this? It seems a bit silly if in a shared repo. (Although
390+# does the branch mask the shared repo bit?)
391+
392+# XXX: Can we get rocketfuel-setup to use this?
393+
394+# XXX: (unrelated). Finding the highest common ancestor is easy, but what if
395+# you have two (N?) trees? Is there a way to algorithmically find the two (N?)
396+# interesting HCAs? Can the question even be framed well?
397+
398+
399+def test_suite():
400+ return unittest.TestLoader().loadTestsFromName(__name__)
401
402=== added file 'utilities/update-sourcecode'
403--- utilities/update-sourcecode 1970-01-01 00:00:00 +0000
404+++ utilities/update-sourcecode 2009-09-10 04:31:12 +0000
405@@ -0,0 +1,16 @@
406+#!/usr/bin/python2.4
407+#
408+# Copyright 2009 Canonical Ltd. This software is licensed under the
409+# GNU Affero General Public License version 3 (see the file LICENSE).
410+
411+"""Update the sourcecode managed in sourcecode/."""
412+
413+import _pythonpath
414+
415+import sys
416+
417+from devscripts import sourcecode
418+
419+
420+if __name__ == '__main__':
421+ sys.exit(sourcecode.main(sys.argv))