Merge lp:~henninge/launchpad/format-imports into lp:launchpad

Proposed by Henning Eggers
Status: Merged
Approved by: Henning Eggers
Approved revision: no longer in the source branch.
Merged at revision: 11469
Proposed branch: lp:~henninge/launchpad/format-imports
Merge into: lp:launchpad
Diff against target: 636 lines (+627/-0)
2 files modified
utilities/format-imports (+387/-0)
utilities/python_standard_libs.py (+240/-0)
To merge this branch: bzr merge lp:~henninge/launchpad/format-imports
Reviewer Review Type Date Requested Status
Brad Crittenden (community) code Approve
Review via email: mp+33926@code.launchpad.net

Commit message

New script utilities/format-imports is now available.

Description of the change

This moves the import-formats script into the launchpad tree as "utilities/format-imports". The code of the actual script was left unchanged, just some documentation and helpful output was added.

The second file is a list of standard python libraries which I kept in a different file because it is just a long and boring sequence of strings. To be able to import it the script updates sys.path with its own directory, so the two files must be in the same directory.

I felt that tests were not really needed since the tool changes other files in the Launchpad tree and the real test is if those changes break the test suite or not.

To post a comment you must log in.
Revision history for this message
Brad Crittenden (bac) wrote :

Henning thank you again for writing this tool and shepherding the initial reformatting branch. Great job!

typo: s/Tow/Two
typo: s/seperated/separated
type: s/reformated/reformatted

I think "directly preceded" should be "immediately preceded by". It sounds better to me.

Other than that it looks good.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== added file 'utilities/format-imports'
--- utilities/format-imports 1970-01-01 00:00:00 +0000
+++ utilities/format-imports 2010-08-27 20:21:03 +0000
@@ -0,0 +1,387 @@
1#!/usr/bin/python
2#
3# Copyright 2010 Canonical Ltd. This software is licensed under the
4# GNU Affero General Public License version 3 (see the file LICENSE).
5
6""" Format import sections in python files
7
8= Usage =
9
10format-imports <file or directory> ...
11
12= Operation =
13
14The script will process each filename on the command line. If the file is a
15directory it recurses into it an process all *.py files found in the tree.
16It will output the paths of all the files that have been changed.
17
18For Launchpad it was applied to the "lib/canonical/launchpad" and the "lib/lp"
19subtrees. Running it with those parameters on a freshly branched LP tree
20should not produce any output, meaning that all the files in the tree should
21be formatted correctly.
22
23The script identifies the import section of each file as a block of lines
24that start with "import" or "from" or are indented with at least one space or
25are blank lines. Comment lines are also included if they are followed by an
26import statement. An inital __future__ import and a module docstring are
27explicitly skipped.
28
29The import section is rewritten as three subsections, each separated by a
30blank line. Any of the sections may be empty.
31 1. Standard python library modules
32 2. Import statements explicitly ordered to the top (see below)
33 3. Third-party modules, meaning anything not fitting one of the other
34 subsection criteria
35 4. Local modules that begin with "canonical" or "lp".
36
37Each section is sorted alphabetically by module name. Each module is put
38on its own line, i.e.
39{{{
40 import os, sys
41}}}
42becomes
43{{{
44 import os
45 import sys
46}}}
47Multiple import statements for the same module are conflated into one
48statement, or two if the module was imported alongside an object inside it,
49i.e.
50{{{
51 import sys
52 from sys import stdin
53}}}
54
55Statements that import more than one objects are put on multiple lines in
56list style, i.e.
57{{{
58 from sys import (
59 stdin,
60 stdout,
61 )
62}}}
63Objects are sorted alphabetically and case-insensitively. One-object imports
64are only formatted in this manner if the statement exceeds 78 characters in
65length.
66
67Comments stick with the import statement that followed them. Comments at the
68end of one-line statements are moved to be be in front of it, .i.e.
69{{{
70 from sys import exit # Have a way out
71}}}
72becomes
73{{{
74 # Have a way out
75 from sys import exit
76}}}
77
78= Format control =
79
80Two special comments allow to control the operation of the formatter.
81
82When an import statement is immediately preceded by a comment that starts
83with the word "FIRST", it is placed into the second subsection (see above).
84
85When the first import statement is directly preceded by a comment that starts
86with the word "SKIP", the entire file is exempt from formatting.
87
88= Known bugs =
89
90Make sure to always check the result of the re-formatting to see if you have
91been bitten by one of these.
92
93Comments inside multi-line import statements break the formatter. A statement
94like this will be ignored:
95{{{
96 from lp.app.interfaces import (
97 # Don't do this.
98 IMyInterface,
99 IMyOtherInterface, # Don't do this either
100 )
101}}}
102Actually, this will make the statement and all following to be ignored:
103{{{
104 from lp.app.interfaces import (
105 # Breaks indentation rules anyway.
106 IMyInterface,
107 IMyOtherInterface,
108 )
109}}}
110
111If a single-line statement has both a comment in front of it and at the end
112of the line, only the end-line comment will survive. This could probably
113easily be fixed to concatenate the too.
114{{{
115 # I am a gonner.
116 from lp.app.interfaces import IMyInterface # I will survive!
117}}}
118
119Line continuation characters are recognized and resolved but
120not re-introduced. This may leave the re-formatted text with a line that
121is over the length limit.
122{{{
123 from lp.app.verylongnames.orverlydeep.modulestructure.leavenoroom \
124 import object
125}}}
126"""
127
128__metaclass__ = type
129
130# SKIP this file when reformatting.
131import os
132import re
133import sys
134from textwrap import dedent
135
136sys.path[0:0] = [os.path.dirname(__file__)]
137from python_standard_libs import python_standard_libs
138
139
140# To search for escaped newline chars.
141escaped_nl_regex = re.compile("\\\\\n", re.M)
142import_regex = re.compile("^import +(?P<module>.+)$", re.M)
143from_import_single_regex = re.compile(
144 "^from (?P<module>.+) +import +"
145 "(?P<objects>[*]|[a-zA-Z0-9_, ]+)"
146 "(?P<comment>#.*)?$", re.M)
147from_import_multi_regex = re.compile(
148 "^from +(?P<module>.+) +import *[(](?P<objects>[a-zA-Z0-9_, \n]+)[)]$", re.M)
149comment_regex = re.compile(
150 "(?P<comment>(^#.+\n)+)(^import|^from) +(?P<module>[a-zA-Z0-9_.]+)", re.M)
151split_regex = re.compile(",\s*")
152
153# Module docstrings are multiline (""") strings that are not indented and are
154# followed at some point by an import .
155module_docstring_regex = re.compile(
156 '(?P<docstring>^["]{3}[^"]+["]{3}\n).*^(import |from .+ import)', re.M | re.S)
157# The imports section starts with an import state that is not a __future__
158# import and consists of import lines, indented lines, empty lines and
159# comments which are followed by an import line. Sometimes we even find
160# lines that contain a single ")"... :-(
161imports_section_regex = re.compile(
162 "(^#.+\n)*^(import|(from ((?!__future__)\S+) import)).*\n"
163 "(^import .+\n|^from .+\n|^[\t ]+.+\n|(^#.+\n)+((^import|^from) .+\n)|^\n|^[)]\n)*",
164 re.M)
165
166
167def format_import_lines(module, objects):
168 """Generate correct from...import strings."""
169 if len(objects) == 1:
170 statement = "from %s import %s" % (module, objects[0])
171 if len(statement) < 79:
172 return statement
173 return "from %s import (\n %s,\n )" % (
174 module, ",\n ".join(objects))
175
176
177def find_imports_section(content):
178 """Return that part of the file that contains the import statements."""
179 # Skip module docstring.
180 match = module_docstring_regex.search(content)
181 if match is None:
182 startpos = 0
183 else:
184 startpos = match.end('docstring')
185
186 match = imports_section_regex.search(content, startpos)
187 if match is None:
188 return (None, None)
189 startpos = match.start()
190 endpos = match.end()
191 if content[startpos:endpos].startswith('# SKIP'):
192 # Skip files explicitely.
193 return(None, None)
194 return (startpos, endpos)
195
196
197class ImportStatement:
198 """Holds information about an import statement."""
199
200 def __init__(self, objects=None, comment=None):
201 self.import_module = objects is None
202 if objects is None:
203 self.objects = None
204 else:
205 self.objects = sorted(objects, key=str.lower)
206 self.comment = comment
207
208 def addObjects(self, new_objects):
209 """More objects in this statement; eliminate duplicates."""
210 if self.objects is None:
211 # No objects so far.
212 self.objects = new_objects
213 else:
214 # Use set to eliminate double objects.
215 more_objects = set(self.objects + new_objects)
216 self.objects = sorted(list(more_objects), key=str.lower)
217
218 def setComment(self, comment):
219 """Add a comment to the statement."""
220 self.comment = comment
221
222
223def parse_import_statements(import_section):
224 """Split the import section into statements.
225
226 Returns a dictionary with the module as the key and the objects being
227 imported as a sorted list of strings."""
228 imports = {}
229 # Search for escaped newlines and remove them.
230 searchpos = 0
231 while True:
232 match = escaped_nl_regex.search(import_section, searchpos)
233 if match is None:
234 break
235 start = match.start()
236 end = match.end()
237 import_section = import_section[:start]+import_section[end:]
238 searchpos = start
239 # Search for simple one-line import statements.
240 searchpos = 0
241 while True:
242 match = import_regex.search(import_section, searchpos)
243 if match is None:
244 break
245 # These imports are marked by a "None" value.
246 # Multiple modules in one statement are split up.
247 for module in split_regex.split(match.group('module').strip()):
248 imports[module] = ImportStatement()
249 searchpos = match.end()
250 # Search for "from ... import" statements.
251 for pattern in (from_import_single_regex, from_import_multi_regex):
252 searchpos = 0
253 while True:
254 match = pattern.search(import_section, searchpos)
255 if match is None:
256 break
257 import_objects = split_regex.split(
258 match.group('objects').strip(" \n,"))
259 module = match.group('module').strip()
260 # Only one pattern has a 'comment' group.
261 comment = match.groupdict().get('comment', None)
262 if module in imports:
263 # Catch double import lines.
264 imports[module].addObjects(import_objects)
265 else:
266 imports[module] = ImportStatement(import_objects)
267 if comment is not None:
268 imports[module].setComment(comment)
269 searchpos = match.end()
270 # Search for comments in import section.
271 searchpos = 0
272 while True:
273 match = comment_regex.search(import_section, searchpos)
274 if match is None:
275 break
276 module = match.group('module').strip()
277 comment = match.group('comment').strip()
278 imports[module].setComment(comment)
279 searchpos = match.end()
280
281 return imports
282
283
284def format_imports(imports):
285 """Group and order imports, return the new import statements."""
286 standard_section = {}
287 first_section = {}
288 thirdparty_section = {}
289 local_section = {}
290 # Group modules into sections.
291 for module, statement in imports.iteritems():
292 module_base = module.split('.')[0]
293 comment = statement.comment
294 if comment is not None and comment.startswith("# FIRST"):
295 first_section[module] = statement
296 elif module_base in ('canonical', 'lp'):
297 local_section[module] = statement
298 elif module_base in python_standard_libs:
299 standard_section[module] = statement
300 else:
301 thirdparty_section[module] = statement
302
303 all_import_lines = []
304 # Sort within each section and generate statement strings.
305 sections = (
306 standard_section,
307 first_section,
308 thirdparty_section,
309 local_section,
310 )
311 for section in sections:
312 import_lines = []
313 for module in sorted(section.keys(), key=str.lower):
314 if section[module].comment is not None:
315 import_lines.append(section[module].comment)
316 if section[module].import_module:
317 import_lines.append("import %s" % module)
318 if section[module].objects is not None:
319 import_lines.append(
320 format_import_lines(module, section[module].objects))
321 if len(import_lines) > 0:
322 all_import_lines.append('\n'.join(import_lines))
323 # Sections are separated by two blank lines.
324 return '\n\n'.join(all_import_lines)
325
326
327def reformat_importsection(filename):
328 """Replace the given file with a reformatted version of it."""
329 pyfile = file(filename).read()
330 import_start, import_end = find_imports_section(pyfile)
331 if import_start is None:
332 # Skip files with no import section.
333 return False
334 imports_section = pyfile[import_start:import_end]
335 imports = parse_import_statements(imports_section)
336
337 if pyfile[import_end:import_end+1] != '#':
338 # Two newlines before anything but comments.
339 number_of_newlines = 3
340 else:
341 number_of_newlines = 2
342
343 new_imports = format_imports(imports)+"\n"*number_of_newlines
344 if new_imports == imports_section:
345 # No change, no need to write a new file.
346 return False
347
348 new_file = open(filename, "w")
349 new_file.write(pyfile[:import_start])
350 new_file.write(new_imports)
351 new_file.write(pyfile[import_end:])
352
353 return True
354
355
356def process_file(fpath):
357 """Process the file with the given path."""
358 changed = reformat_importsection(fpath)
359 if changed:
360 print fpath
361
362
363def process_tree(dpath):
364 """Walk a directory tree and process all *.py files."""
365 for dirpath, dirnames, filenames in os.walk(dpath):
366 for filename in filenames:
367 if filename.endswith('.py'):
368 process_file(os.path.join(dirpath, filename))
369
370
371if __name__ == "__main__":
372 if len(sys.argv) == 1 or sys.argv[1] in ("-h", "-?", "--help"):
373 sys.stderr.write(dedent("""\
374 usage: format-imports <file or directory> ...
375
376 Type "format-imports --docstring | less" to see the documentation.
377 """))
378 sys.exit(1)
379 if sys.argv[1] == "--docstring":
380 sys.stdout.write(__doc__)
381 sys.exit(2)
382 for filename in sys.argv[1:]:
383 if os.path.isdir(filename):
384 process_tree(filename)
385 else:
386 process_file(filename)
387 sys.exit(0)
0388
=== added file 'utilities/python_standard_libs.py'
--- utilities/python_standard_libs.py 1970-01-01 00:00:00 +0000
+++ utilities/python_standard_libs.py 2010-08-27 20:21:03 +0000
@@ -0,0 +1,240 @@
1# Copyright 2010 Canonical Ltd. This software is licensed under the
2# GNU Affero General Public License version 3 (see the file LICENSE).
3
4""" A list of top-level standard python library names.
5
6This list is used by format-imports to determine if a module is in this group
7or not.
8The list is taken from http://docs.python.org/release/2.5.4/lib/modindex.html
9but modules specific to other OSs have been taken out. It may need to be
10updated from time to time.
11"""
12
13python_standard_libs = [
14 'aifc',
15 'anydbm',
16 'array',
17 'asynchat',
18 'asyncore',
19 'atexit',
20 'audioop',
21 'base64',
22 'BaseHTTPServer',
23 'Bastion',
24 'binascii',
25 'binhex',
26 'bisect',
27 'bsddb',
28 'bz2',
29 'calendar',
30 'cgi',
31 'CGIHTTPServer',
32 'cgitb',
33 'chunk',
34 'cmath',
35 'cmd',
36 'code',
37 'codecs',
38 'codeop',
39 'collections',
40 'colorsys',
41 'commands',
42 'compileall',
43 'compiler',
44 'ConfigParser',
45 'contextlib',
46 'Cookie',
47 'cookielib',
48 'copy',
49 'copy_reg',
50 'cPickle',
51 'cProfile',
52 'crypt',
53 'cStringIO',
54 'csv',
55 'ctypes',
56 'curses',
57 'datetime',
58 'dbhash',
59 'dbm',
60 'decimal',
61 'difflib',
62 'dircache',
63 'dis',
64 'distutils',
65 'dl',
66 'doctest',
67 'DocXMLRPCServer',
68 'dumbdbm',
69 'dummy_thread',
70 'dummy_threading',
71 'email',
72 'encodings',
73 'errno',
74 'exceptions',
75 'fcntl',
76 'filecmp',
77 'fileinput',
78 'fnmatch',
79 'formatter',
80 'fpectl',
81 'fpformat',
82 'ftplib',
83 'functools',
84 'gc',
85 'gdbm',
86 'getopt',
87 'getpass',
88 'gettext',
89 'glob',
90 'gopherlib',
91 'grp',
92 'gzip',
93 'hashlib',
94 'heapq',
95 'hmac',
96 'hotshot',
97 'htmlentitydefs',
98 'htmllib',
99 'HTMLParser',
100 'httplib',
101 'imageop',
102 'imaplib',
103 'imghdr',
104 'imp',
105 'inspect',
106 'itertools',
107 'keyword',
108 'linecache',
109 'locale',
110 'logging',
111 'mailbox',
112 'mailcap',
113 'marshal',
114 'math',
115 'md5',
116 'mhlib',
117 'mimetools',
118 'mimetypes',
119 'MimeWriter',
120 'mimify',
121 'mmap',
122 'modulefinder',
123 'multifile',
124 'mutex',
125 'netrc',
126 'new',
127 'nis',
128 'nntplib',
129 'operator',
130 'optparse',
131 'os',
132 'ossaudiodev',
133 'parser',
134 'pdb',
135 'pickle',
136 'pickletools',
137 'pipes',
138 'pkgutil',
139 'platform',
140 'popen2',
141 'poplib',
142 'posix',
143 'posixfile',
144 'pprint',
145 'profile',
146 'pstats',
147 'pty',
148 'pwd',
149 'py_compile',
150 'pyclbr',
151 'pydoc',
152 'Queue',
153 'quopri',
154 'random',
155 're',
156 'readline',
157 'repr',
158 'resource',
159 'rexec',
160 'rfc822',
161 'rgbimg',
162 'rlcompleter',
163 'robotparser',
164 'runpy',
165 'sched',
166 'ScrolledText',
167 'select',
168 'sets',
169 'sgmllib',
170 'sha',
171 'shelve',
172 'shlex',
173 'shutil',
174 'signal',
175 'SimpleHTTPServer',
176 'SimpleXMLRPCServer',
177 'site',
178 'smtpd',
179 'smtplib',
180 'sndhdr',
181 'socket',
182 'SocketServer',
183 'spwd',
184 'sqlite3',
185 'stat',
186 'statvfs',
187 'string',
188 'StringIO',
189 'stringprep',
190 'struct',
191 'subprocess',
192 'sunau',
193 'symbol',
194 'sys',
195 'syslog',
196 'tabnanny',
197 'tarfile',
198 'telnetlib',
199 'tempfile',
200 'termios',
201 'test.test_support',
202 'test',
203 'textwrap',
204 'thread',
205 'threading',
206 'time',
207 'timeit',
208 'Tix',
209 'Tkinter',
210 'token',
211 'tokenize',
212 'trace',
213 'traceback',
214 'tty',
215 'turtle',
216 'types',
217 'unicodedata',
218 'unittest',
219 'urllib2',
220 'urllib',
221 'urlparse',
222 'user',
223 'UserDict',
224 'UserList',
225 'UserString',
226 'uu',
227 'uuid',
228 'warnings',
229 'wave',
230 'weakref',
231 'webbrowser',
232 'whichdb',
233 'wsgiref',
234 'xdrlib',
235 'xml',
236 'xmlrpclib',
237 'zipfile',
238 'zipimport',
239 'zlib',
240 ]