Merge lp:~spiv/bzr/better-news-merge into lp:bzr

Proposed by Andrew Bennetts
Status: Work in progress
Proposed branch: lp:~spiv/bzr/better-news-merge
Merge into: lp:bzr
Diff against target: 747 lines (+587/-64)
7 files modified
NEWS (+4/-0)
bzrlib/merge3.py (+18/-4)
bzrlib/plugins/news_merge/__init__.py (+22/-2)
bzrlib/plugins/news_merge/news_merge.py (+230/-57)
bzrlib/plugins/news_merge/parser.py (+196/-1)
bzrlib/plugins/news_merge/tests/__init__.py (+1/-0)
bzrlib/plugins/news_merge/tests/test_parser.py (+116/-0)
To merge this branch: bzr merge lp:~spiv/bzr/better-news-merge
Reviewer Review Type Date Requested Status
Robert Collins (community) Needs Fixing
Review via email: mp+19247@code.launchpad.net

This proposal supersedes a proposal from 2010-02-12.

To post a comment you must log in.
Revision history for this message
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal

This patch makes the news_merge plugin more capable. Hopefully the changes to comments and docstrings in the patch explain the changes well enough, but in short it makes it capable of coping with conflicts that span section headings (like 'Bug Fixes'), whereas before it only dealt with conflicts between bullet points within a section. See the comments and code for the details (and limitations and tradeoffs).

With this change I expect it news_merge will handle the vast majority of NEWS merges.

A minor orthogonal change adds a couple of trivial mutters to let you a reader of ~/.bzr.log know when news_merge has been used.

Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

The patch seems to have conflicts and a lot of noise :(

-Rob

Revision history for this message
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal

Oh, I wrote the code against 2.1, but proposed against lp:bzr. Hmm.

Revision history for this message
Andrew Bennetts (spiv) wrote :

Resubmitted in hope of getting a better diff now that lp:bzr/2.1 (which this branch was based on) has been merged to lp:bzr.

Original proposal:

"""
This patch makes the news_merge plugin more capable. Hopefully the changes to comments and docstrings in the patch explain the changes well enough, but in short it makes it capable of coping with conflicts that span section headings (like 'Bug Fixes'), whereas before it only dealt with conflicts between bullet points within a section. See the comments and code for the details (and limitations and tradeoffs).

With this change I expect it news_merge will handle the vast majority of NEWS merges.

A minor orthogonal change adds a couple of trivial mutters to let you a reader of ~/.bzr.log know when news_merge has been used.
"""

Revision history for this message
Robert Collins (lifeless) wrote :

On Sat, 2010-02-13 at 01:30 +0000, Andrew Bennetts wrote:
>
> magic_marker = '|NEWS-MERGE-MAGIC-MARKER|'
>
> +# The order sections are supposed to appear in. See the template at
> the
> +# bottom of NEWS. None is a placeholder for an unseen section
> heading.
> +canonical_section_order = [
> + None, 'Compatibility Breaks', 'New Features', 'Bug Fixes',
> 'Improvements',
> + 'Documentation', 'API Changes', 'Internals', 'Testing']

This is duplicated with the template; perhaps you could use the template
instead? That would make this usable by other projects.
...
> + # Are all the conflicting lines bullets or sections?
> If so, we
> + # can merge this.
> + try:
> + base_sections =
> munged_lines_to_section_dict(base)
> + a_sections = munged_lines_to_section_dict(a)
> + b_sections = munged_lines_to_section_dict(b)
> + except MergeTooHard:
> + # Something else :(
> + # Maybe the default merge can cope.
> + trace.mutter('news_merge giving up')
> + return 'not_applicable', None

In the NEWS entry you aren't entirely clear about the implications of
'using bzr's builtin merge'. .. from the code it looks like 'if there
are conflicts outside the structured section none of the news file is
smart merged'. Perhaps we could merge just the non-section data with
bzr's built in merge, or make the NEWS entry clearer.
...
> # Transform the merged elements back into real blocks of
> lines.
> + trace.mutter('news_merge giving up')
> return 'success', list(fakelines_to_blocks(result_lines))

This mutter seems...wrong.

 review: needsfixing

review: Needs Fixing
lp:~spiv/bzr/better-news-merge updated
4815. By Andrew Bennetts

Read section order from template file named in config.

4816. By Andrew Bennetts

Add a XXX comment for future improvement.

4817. By Andrew Bennetts

First steps towards a better NEWS file parser.

4818. By Andrew Bennetts

Rename test.

4819. By Andrew Bennetts

Merge lp:bzr.

4820. By Andrew Bennetts

Merge object-3way-merge.

4821. By Andrew Bennetts

Possibly working news_merge built on a richer structure than just lines.

4822. By Andrew Bennetts

Fix some simple bugs.

Unmerged revisions

4822. By Andrew Bennetts

Fix some simple bugs.

4821. By Andrew Bennetts

Possibly working news_merge built on a richer structure than just lines.

4820. By Andrew Bennetts

Merge object-3way-merge.

4819. By Andrew Bennetts

Merge lp:bzr.

4818. By Andrew Bennetts

Rename test.

4817. By Andrew Bennetts

First steps towards a better NEWS file parser.

4816. By Andrew Bennetts

Add a XXX comment for future improvement.

4815. By Andrew Bennetts

Read section order from template file named in config.

4814. By Andrew Bennetts

Teach news_merge to handle conflicts involving section headings as well as bullets.

4813. By Andrew Bennetts

Add some simple mutters so that it's easy to tell if news_merge has been triggered.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'NEWS'
--- NEWS 2010-04-20 10:30:30 +0000
+++ NEWS 2010-04-20 13:35:41 +0000
@@ -107,6 +107,10 @@
107 less.)107 less.)
108 (Martin Pool, #553017)108 (Martin Pool, #553017)
109109
110* The ``news_merge`` plugin is now smarter. It can resolve conflicts
111 involving section headings as well as bullet points.
112 (Andrew Bennetts)
113
110Documentation114Documentation
111*************115*************
112116
113117
=== modified file 'bzrlib/merge3.py'
--- bzrlib/merge3.py 2009-03-23 14:59:43 +0000
+++ bzrlib/merge3.py 2010-04-20 13:35:41 +0000
@@ -66,10 +66,24 @@
66 Given BASE, OTHER, THIS, tries to produce a combined text66 Given BASE, OTHER, THIS, tries to produce a combined text
67 incorporating the changes from both BASE->OTHER and BASE->THIS.67 incorporating the changes from both BASE->OTHER and BASE->THIS.
68 All three will typically be sequences of lines."""68 All three will typically be sequences of lines."""
69 def __init__(self, base, a, b, is_cherrypick=False):69
70 check_text_lines(base)70 def __init__(self, base, a, b, is_cherrypick=False, allow_objects=False):
71 check_text_lines(a)71 """Constructor.
72 check_text_lines(b)72
73 :param base: lines in BASE
74 :param a: lines in A
75 :param b: lines in B
76 :param is_cherrypick: flag indicating if this merge is a cherrypick.
77 When cherrypicking b => a, matches with b and base do not conflict.
78 :param allow_objects: if True, do not require that base, a and b are
79 plain Python strs. Also prevents BinaryFile from being raised.
80 Lines can be any sequence of comparable and hashable Python
81 objects.
82 """
83 if not allow_objects:
84 check_text_lines(base)
85 check_text_lines(a)
86 check_text_lines(b)
73 self.base = base87 self.base = base
74 self.a = a88 self.a = a
75 self.b = b89 self.b = b
7690
=== modified file 'bzrlib/plugins/news_merge/__init__.py'
--- bzrlib/plugins/news_merge/__init__.py 2010-01-28 17:27:16 +0000
+++ bzrlib/plugins/news_merge/__init__.py 2010-04-20 13:35:41 +0000
@@ -26,10 +26,30 @@
26The news_merge_files config option takes a list of file paths, separated by26The news_merge_files config option takes a list of file paths, separated by
27commas.27commas.
2828
29The basic approach is that this plugin parses the NEWS file into a simple
30series of versions, with sections of bullets in those versions. Sections
31contain a sorted set of bullets, and sections within a version also have a
32fixed order (see the template at the bottom of NEWS). The plugin merges
33additions and deletions to the set of bullets (and sections of bullets), then
34sorts the contents of these sets and turns them back into a series of lines of
35text.
36
29Limitations:37Limitations:
3038
31* if there's a conflict in more than just bullet points, this doesn't yet know39* invisible whitespace in blank lines is not tracked, so is discarded. (i.e.
32 how to resolve that, so bzr will fallback to the default line-based merge.40 [newline, space, newline] is collapsed to just [newline, newline])
41
42* empty sections are generally deleted, even if they were present in the
43 originals.
44
45* modified sections will typically be reordered to match the standard order (as
46 shown in the template at the bottom of NEWS).
47
48* if there's a conflict that involves more than simple sections of bullets,
49 this plugin doesn't know how to handle that. e.g. a conflict in preamble
50 text describing a new version, or sufficiently many conflicts that the
51 matcher thinks a conflict spans a version heading. bzr's builtin merge logic
52 will be tried instead.
33"""53"""
3454
35# Since we are a built-in plugin we share the bzrlib version55# Since we are a built-in plugin we share the bzrlib version
3656
=== modified file 'bzrlib/plugins/news_merge/news_merge.py'
--- bzrlib/plugins/news_merge/news_merge.py 2010-01-28 18:05:44 +0000
+++ bzrlib/plugins/news_merge/news_merge.py 2010-04-20 13:35:41 +0000
@@ -16,12 +16,21 @@
1616
17"""Merge logic for news_merge plugin."""17"""Merge logic for news_merge plugin."""
1818
1919import copy
20from bzrlib.plugins.news_merge.parser import simple_parse20
21from bzrlib import merge, merge321from bzrlib.plugins.news_merge.parser import (
2222 ContainerChunk,
2323 parse_lines_to_structure,
24magic_marker = '|NEWS-MERGE-MAGIC-MARKER|'24 simple_parse,
25 )
26from bzrlib import merge, merge3, trace
27
28
29class Infinity(object):
30 """Object that always sorts to the end of a list."""
31
32 def __lt__(self, other):
33 return True
2534
2635
27class NewsMerger(merge.ConfigurableFileMerger):36class NewsMerger(merge.ConfigurableFileMerger):
@@ -29,6 +38,51 @@
2938
30 name_prefix = "news"39 name_prefix = "news"
3140
41 def __init__(self, merger):
42 super(NewsMerger, self).__init__(merger)
43 self.canonical_section_order = None
44
45 def get_section_ordering(self):
46 if self.canonical_section_order is None:
47 # None is a placeholder for an unseen section heading.
48 sections = [None]
49 try:
50 # Read file named by ${name_prefix}_template config option, and
51 # extract the preferred section order from that.
52 this_tree = self.merger.this_tree
53 config = this_tree.branch.get_config()
54 config_key = self.name_prefix + '_template'
55 template_path = config.get_user_option(config_key)
56 template_file_id = this_tree.path2id(template_path)
57 template = this_tree.get_file_text(template_file_id)
58 for kind, text in simple_parse(template):
59 if kind == 'section':
60 sections.append(text.split('\n', 1)[0])
61 except Exception:
62 trace.mutter('could not read NEWS template')
63 trace.log_exception_quietly()
64 trace.mutter('news merge section order: %r', sections)
65 self.canonical_section_order = sections
66 return self.canonical_section_order
67
68 def sort_sections(self, sections):
69 return sorted(sections, key=self.section_sort_key)
70
71 def sort_section_names(self, section_names):
72 return sorted(section_names, key=self.section_name_sort_key)
73
74 def section_sort_key(self, section):
75 section_name = section.text.split('\n', 1)[0]
76 return self.section_name_sort_key(section_name)
77
78 def section_name_sort_key(self, section):
79 canonical_section_order = self.get_section_ordering()
80 try:
81 return canonical_section_order.index(section)
82 except ValueError:
83 # Put unexpected sections last.
84 return Infinity()
85
32 def merge_text(self, params):86 def merge_text(self, params):
33 """Perform a simple 3-way merge of a bzr NEWS file.87 """Perform a simple 3-way merge of a bzr NEWS file.
34 88
@@ -36,59 +90,178 @@
36 points, so we can simply take a set of bullet points, determine which90 points, so we can simply take a set of bullet points, determine which
37 bullets to add and which to remove, sort, and reserialize.91 bullets to add and which to remove, sort, and reserialize.
38 """92 """
39 # Transform the different versions of the NEWS file into a bunch of93 trace.mutter('news_merge triggered')
40 # text lines where each line matches one part of the overall94 this_news_file = canonicalise_news_file(parse_lines_to_structure(params.this_lines), self)
41 # structure, e.g. a heading or bullet.95 other_news_file = canonicalise_news_file(parse_lines_to_structure(params.other_lines), self)
42 def munge(lines):96 base_news_file = canonicalise_news_file(parse_lines_to_structure(params.base_lines), self)
43 return list(blocks_to_fakelines(simple_parse(''.join(lines))))97 m3 = merge3.Merge3(list(base_news_file.flatten()),
44 this_lines = munge(params.this_lines)98 list(this_news_file.flatten()),
45 other_lines = munge(params.other_lines)99 list(other_news_file.flatten()), allow_objects=True)
46 base_lines = munge(params.base_lines)100 result_chunks = []
47 m3 = merge3.Merge3(base_lines, this_lines, other_lines)
48 result_lines = []
49 for group in m3.merge_groups():101 for group in m3.merge_groups():
50 if group[0] == 'conflict':102 if group[0] == 'conflict':
51 _, base, a, b = group103 _, base, a, b = group
52 # Are all the conflicting lines bullets? If so, we can merge104 # Are all the conflicting lines bullets or sections? If so, we
53 # this.105 # can merge this.
54 for line_set in [base, a, b]:106 try:
55 for line in line_set:107 base_sections = chunks_to_section_dict(base)
56 if not line.startswith('bullet'):108 a_sections = chunks_to_section_dict(a)
57 # Something else :(109 b_sections = chunks_to_section_dict(b)
58 # Maybe the default merge can cope.110 except MergeTooHard, mth:
59 return 'not_applicable', None111 # Something else :(
60 # Calculate additions and deletions.112 # Maybe the default merge can cope.
61 new_in_a = set(a).difference(base)113 trace.mutter('news_merge giving up: %s', mth)
62 new_in_b = set(b).difference(base)114 return 'not_applicable', None
63 all_new = new_in_a.union(new_in_b)115
64 deleted_in_a = set(base).difference(a)116 # Basically, for every section present in any version, call
65 deleted_in_b = set(base).difference(b)117 # merge_bullets (passing an empty set for versions missing
66 # Combine into the final set of bullet points.118 # that section), and if the resulting set of bullets is not
67 final = all_new.difference(deleted_in_a).difference(119 # empty, emit the section heading and the sorted set of
68 deleted_in_b)120 # bullets.
69 # Sort, and emit.121 all_sections = set(
70 final = sorted(final, key=sort_key)122 base_sections.keys() + a_sections.keys() +
71 result_lines.extend(final)123 b_sections.keys())
124 sections_in_order = self.sort_section_names(all_sections)
125 for section in sections_in_order:
126 bullets = merge_bullets(
127 base_sections.get(section, set()),
128 a_sections.get(section, set()),
129 b_sections.get(section, set()))
130 if bullets:
131 # Emit section heading (if any), then sorted bullets.
132 if section is not None:
133 result_chunks.append(
134 ContainerChunk(
135 'section',
136 section + '\n' + '*'*len(section)))
137 final = sorted(bullets, key=sort_key)
138 result_chunks.extend(final)
72 else:139 else:
73 result_lines.extend(group[1])140 result_chunks.extend(group[1])
74 # Transform the merged elements back into real blocks of lines.141 # Transform the merged elements back into real blocks of lines.
75 return 'success', list(fakelines_to_blocks(result_lines))142 trace.mutter('news_merge succeeded.')
76143 filename = self.merger.this_tree.id2path(params.file_id)
77144 trace.note('Merged by news_merge: %s', filename)
78def blocks_to_fakelines(blocks):145 result_lines = ''.join(chunk.text for chunk in result_chunks)
79 for kind, text in blocks:146 return 'success', result_lines
80 yield '%s%s%s' % (kind, magic_marker, text)147
81148
82149def merge_bullets(base_bullets, a_bullets, b_bullets):
83def fakelines_to_blocks(fakelines):150 # Calculate additions and deletions.
84 fakelines = list(fakelines)151 new_in_a = a_bullets.difference(base_bullets)
85 # Strip out the magic_marker, and reinstate the \n\n between blocks152 new_in_b = b_bullets.difference(base_bullets)
86 for fakeline in fakelines[:-1]:153 all_new = new_in_a.union(new_in_b)
87 yield fakeline.split(magic_marker, 1)[1] + '\n\n'154 deleted_in_a = base_bullets.difference(a_bullets)
88 # The final block doesn't have a trailing \n\n.155 deleted_in_b = base_bullets.difference(b_bullets)
89 for fakeline in fakelines[-1:]:156 # Combine into the final set of bullet points.
90 yield fakeline.split(magic_marker, 1)[1]157 final = all_new.difference(deleted_in_a).difference(deleted_in_b)
91158 return final
92159
93def sort_key(s):160
94 return s.replace('`', '').lower()161class MergeTooHard(Exception):
162 pass
163
164
165def chunks_to_section_dict(chunks):
166 """Takes a sequence of chunks, and returns a dict mapping section to
167 a set of bullets.
168
169 :param chunks: a sequence of chunks
170 :raises MergeTooHard: when chunks contain anything other than sections or
171 bullets
172 :returns: a dict of section name -> set of bullet chunks. Any
173 bullets encounted before a section will have a name of None.
174 """
175 section_name = None
176 section_dict = {}
177 for chunk in chunks:
178 if chunk.kind == 'section':
179 section_name = chunk.text.split('\n', 1)[0]
180 elif chunk.kind == 'bullet':
181 try:
182 bullets = section_dict[section_name]
183 except KeyError:
184 bullets = section_dict[section_name] = set()
185 bullets.add(chunk)
186 else:
187 raise MergeTooHard(chunk)
188 return section_dict
189
190
191def sort_key(chunk):
192 return chunk.text.replace('`', '').lower()
193
194
195def canonicalise_news_file(news_file, merger):
196 new_chunks = []
197 for chunk in news_file.chunks:
198 if chunk.kind == 'release':
199 chunk = canonicalise_release(chunk, merger)
200 new_chunks.append(chunk)
201 news_file = copy.copy(news_file)
202 news_file.chunks = new_chunks
203 return news_file
204
205
206def canonicalise_release(release, merger):
207 preamble = True
208 new_chunks = []
209 sections = []
210 for chunk in release.chunks:
211 if preamble and chunk.kind != 'section':
212 new_chunks.append(chunk)
213 continue
214 elif chunk.kind == 'section':
215 preamble = False
216 section = canonicalise_section(chunk)
217 sections.append(section)
218 else:
219 # not preamble, not section... must be trailing garbage. Blah.
220 # XXX: should probably raise an error or something. For now just
221 # add it to new_chunks, it'll become part of the preamble.
222 new_chunks.append(chunk)
223
224 # Sort the sections by name
225 sections = merger.sort_sections(sections)
226 # Combine duplicated sections (which will be adjacent after the sorting)
227 canonical_sections = []
228 for section in sections:
229 if canonical_sections and canonical_sections[-1].text == section.text:
230 # Identical. Combine them.
231 chunks = canonical_sections[-1].chunks + section.chunks
232 section = copy.copy(section)
233 section.chunks = chunks
234 section = canonicalise_section(section)
235 canonical_sections[-1] = section
236 continue
237 else:
238 canonical_sections.append(section)
239 new_chunks.extend(canonical_sections)
240 release = copy.copy(release)
241 release.chunks = new_chunks
242 return release
243
244
245def canonicalise_section(section):
246 preamble = True
247 new_chunks = []
248 bullets = set()
249 for chunk in section.chunks:
250 if preamble and chunk.kind != 'bullet':
251 new_chunks.append(chunk)
252 continue
253 elif chunk.kind == 'bullet':
254 preamble = False
255 bullets.add(chunk.text)
256 else:
257 # not preamble, not bullet... must be trailing garbage. Blah.
258 # XXX: should probably raise an error or something. For now just
259 # add it to new_chunks, it'll become part of the preamble.
260 new_chunks.append(chunk)
261 new_section = copy.copy(section)
262 new_section.chunks = new_chunks
263 bullets = sorted(bullets, key=sort_key)
264 for bullet in bullets:
265 new_section.add_leaf('bullet', bullet)
266 return new_section
267
95268
=== modified file 'bzrlib/plugins/news_merge/parser.py'
--- bzrlib/plugins/news_merge/parser.py 2010-01-18 07:00:11 +0000
+++ bzrlib/plugins/news_merge/parser.py 2010-04-20 13:35:41 +0000
@@ -24,6 +24,194 @@
24simple_parse's docstring).24simple_parse's docstring).
25"""25"""
2626
27# [root]
28# - Heading
29# - Text
30# - Release
31# - Text
32# - Section
33# - Bullet
34# - Section
35# - Bullet
36# - Bullet
37# - Release
38# - Text
39# - Section
40# - Bullet
41# - Section
42# - Text
43# - Bullet
44# - Text
45
46class ContainerChunk(object):
47
48 def __init__(self, kind, text):
49 self.chunks = []
50 self.kind = kind
51 self.text = text
52
53 def __repr__(self):
54 if len(self.text) > 20:
55 abbr_text = self.text[:20] + '...'
56 else:
57 abbr_text = self.text
58 return '<%s kind=%s text=%s>' % (
59 self.__class__.__name__, self.kind, repr(abbr_text))
60
61 def __cmp__(self, other):
62 if not isinstance(other, ContainerChunk):
63 return NotImplemented
64 return cmp(
65 (self.kind, self.text, self.chunks),
66 (other.kind, other.text, self.chunks))
67
68 def __hash__(self):
69 return hash((self.kind, self.text, tuple(self.chunks)))
70
71# def __eq__(self, other):
72# return (
73# self.kind == other.kind and
74# self.text == other.text and
75# self.chunks == other.chunks)
76#
77# def __lt__(self, other):
78# return (
79# self.kind < other.kind or
80# self.text < other.text or
81# self.chunks < other.chunks)
82#
83 def add_container(self, kind, text):
84 container = ContainerChunk(kind, text)
85 self.chunks.append(container)
86 return container
87
88 def add_leaf(self, kind, text):
89 if kind == 'blank':
90 # Attach this blank text to the previous chunk (which might be
91 # self), rather than tracking it as its own leaf.
92 if self.chunks:
93 self.chunks[-1].text += text
94 else:
95 self.text += text
96 return
97 chunk = LeafChunk(kind, text)
98 self.chunks.append(chunk)
99 return chunk
100
101 def flatten(self):
102 yield self
103 for chunk in self.chunks:
104 for elem in chunk.flatten():
105 yield elem
106
107 def as_text_iter(self):
108 yield self.text
109 for chunk in self.chunks:
110 for elem in chunk.as_text_iter():
111 yield elem
112
113 def as_text(self):
114 return ''.join(self.as_text_iter())
115
116
117class NewsFile(ContainerChunk):
118
119 def __init__(self):
120 ContainerChunk.__init__(self, '(root)', '')
121
122
123class LeafChunk(object):
124
125 def __init__(self, kind, text):
126 self.kind = kind
127 self.text = text
128
129 def __repr__(self):
130 if len(self.text) > 20:
131 abbr_text = self.text[:20] + '...'
132 else:
133 abbr_text = self.text
134 return '<%s kind=%s text=%s>' % (
135 self.__class__.__name__, self.kind, repr(abbr_text))
136
137 def __cmp__(self, other):
138 if not isinstance(other, LeafChunk):
139 return NotImplemented
140 return cmp((self.kind, self.text), (other.kind, other.text))
141
142 def __hash__(self):
143 return hash((self.kind, self.text))
144#
145# def __eq__(self, other):
146# return (self.kind == other.kind and self.text == other.text)
147#
148# def __lt__(self, other):
149# return (self.kind < other.kind or self.text < other.text)
150#
151 def flatten(self):
152 yield self
153
154 def as_text_iter(self):
155 yield self.text
156
157
158import re
159
160
161class ParseState(object):
162 def __init__(self):
163 #self.news_file = NewsFile()
164 self.object_stack = []
165
166
167class BadNewsFile(Exception):
168 """The NEWS file could not be parsed."""
169
170
171def parse_lines_to_structure(lines):
172 """Same as parse_to_structure, but takes an iterable of strs rather than a
173 single str.
174 """
175 return parse_to_structure(''.join(lines))
176
177
178def parse_to_structure(content):
179 news_file = NewsFile()
180 leaf_kinds = ('bullet', 'empty', 'text', 'blank')
181 # There's a strict hierarchy:
182 # Headings contain releases contain sections
183 # Releases never contain releases, etc.
184 # (Any container may contain a leaf, though.)
185 container_hierarchy = ['(root)', 'heading', 'release', 'section']
186
187 stack = [news_file]
188 #import pdb; pdb.set_trace()
189 for kind, text in simple_parse(content):
190 #print kind, repr(text)
191 if kind in leaf_kinds:
192 stack[-1].add_leaf(kind, text)
193 elif kind in container_hierarchy:
194 # Pop the container stack until we find the right level to add this
195 # chunk.
196 new_rank = container_hierarchy.index(kind)
197 while True:
198 old_rank = container_hierarchy.index(stack[-1].kind)
199 if new_rank > old_rank:
200 break
201 stack.pop()
202 container = stack[-1].add_container(kind, text)
203 stack.append(container)
204 else:
205 raise AssertionError('unexpected chunk kind: %r' % (kind,))
206 return news_file
207
208
209def simple_parse_lines(lines):
210 """Same as simple_parse, but takes an iterable of strs rather than a single
211 str.
212 """
213 return simple_parse(''.join(lines))
214
27215
28def simple_parse(content):216def simple_parse(content):
29 """Returns blocks, where each block is a 2-tuple (kind, text).217 """Returns blocks, where each block is a 2-tuple (kind, text).
@@ -31,8 +219,15 @@
31 :kind: one of 'heading', 'release', 'section', 'empty' or 'text'.219 :kind: one of 'heading', 'release', 'section', 'empty' or 'text'.
32 :text: a str, including newlines.220 :text: a str, including newlines.
33 """221 """
34 blocks = content.split('\n\n')222 # Split on blank lines.
223 blankline_re = '(\n *\n)'
224 blocks = re.split(blankline_re, content)
35 for block in blocks:225 for block in blocks:
226 match = re.match(blankline_re, block)
227 if match is not None and match.groups()[0] == block:
228 # blank line
229 yield 'blank', block
230 continue
36 if block.startswith('###'):231 if block.startswith('###'):
37 # First line is ###...: Top heading232 # First line is ###...: Top heading
38 yield 'heading', block233 yield 'heading', block
39234
=== modified file 'bzrlib/plugins/news_merge/tests/__init__.py'
--- bzrlib/plugins/news_merge/tests/__init__.py 2010-01-20 16:05:28 +0000
+++ bzrlib/plugins/news_merge/tests/__init__.py 2010-04-20 13:35:41 +0000
@@ -16,6 +16,7 @@
1616
17def load_tests(basic_tests, module, loader):17def load_tests(basic_tests, module, loader):
18 testmod_names = [18 testmod_names = [
19 'test_parser',
19 'test_news_merge',20 'test_news_merge',
20 ]21 ]
21 basic_tests.addTest(loader.loadTestsFromModuleNames(22 basic_tests.addTest(loader.loadTestsFromModuleNames(
2223
=== added file 'bzrlib/plugins/news_merge/tests/test_parser.py'
--- bzrlib/plugins/news_merge/tests/test_parser.py 1970-01-01 00:00:00 +0000
+++ bzrlib/plugins/news_merge/tests/test_parser.py 2010-04-20 13:35:41 +0000
@@ -0,0 +1,116 @@
1# Copyright (C) 2010 by Canonical Ltd
2#
3# This program is free software; you can redistribute it and/or modify
4# it under the terms of the GNU General Public License as published by
5# the Free Software Foundation; either version 2 of the License, or
6# (at your option) any later version.
7#
8# This program is distributed in the hope that it will be useful,
9# but WITHOUT ANY WARRANTY; without even the implied warranty of
10# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11# GNU General Public License for more details.
12#
13# You should have received a copy of the GNU General Public License
14# along with this program; if not, write to the Free Software
15# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
16
17
18from bzrlib.tests import TestCase
19
20from bzrlib.plugins.news_merge import parser
21
22
23# Define an example NEWS file with the following structure:
24# [root]
25# - Heading
26# - Text
27# - Release
28# - Text
29# - Section
30# - Bullet
31# - Section
32# - Bullet
33# - Bullet
34# - Release
35# - Text
36# - Section
37# - Bullet
38# - Section
39# - Text
40# - Bullet
41# - Text
42
43example_file = """\
44####################
45Bazaar Release Notes
46####################
47
48.. contents:: List of Releases
49 :depth: 1
50
51bzr x.y.z (not released yet)
52############################
53
54:Codename: template
55:x.y.z: ???
56
57Compatibility Breaks
58********************
59
60* Bullet
61
62New Features
63************
64
65* Bullet 1
66
67* Bullet 2
68
69Bug Fixes
70*********
71
72bzr x.y.y
73#########
74
75:Codename: previous
76
77Compatibility Breaks
78********************
79
80* Bullet
81
82New Features
83************
84
85Preamble text for section.
86
87* Bullet, not text.
88
89Footnote.
90"""
91
92class TestStructuredParseSmokeTests(TestCase):
93 """Smoke tests parse_to_structure using example_file."""
94
95 def test_parse(self):
96 """example_file can be parsed without an error."""
97 news_file = parser.parse_to_structure(example_file)
98
99 def test_roundtrip(self):
100 """The NewsFile object can regenerate the original bytes."""
101 news_file = parser.parse_to_structure(example_file)
102 self.assertEqualDiff(example_file, news_file.as_text())
103
104 def test_flatten(self):
105 """NewsFile.flatten shows the file has been interpreted as
106 releases/sections/bullets etc.
107 """
108 news_file = parser.parse_to_structure(example_file)
109 expected_kinds = ['(root)', 'heading', 'text', 'release', 'text',
110 'section', 'bullet', 'section', 'bullet', 'bullet', 'section',
111 'release', 'text', 'section', 'bullet', 'section', 'text',
112 'bullet', 'text']
113 kinds = [chunk.kind for chunk in news_file.flatten()
114 if chunk.kind != 'blank']
115 self.assertEqual(expected_kinds, kinds)
116