Merge lp:~henninge/launchpad/format-imports into lp:launchpad

Proposed by Henning Eggers
Status: Merged
Approved by: Henning Eggers
Approved revision: no longer in the source branch.
Merged at revision: 11469
Proposed branch: lp:~henninge/launchpad/format-imports
Merge into: lp:launchpad
Diff against target: 636 lines (+627/-0)
2 files modified
utilities/format-imports (+387/-0)
utilities/python_standard_libs.py (+240/-0)
To merge this branch: bzr merge lp:~henninge/launchpad/format-imports
Reviewer Review Type Date Requested Status
Brad Crittenden (community) code Approve
Review via email: mp+33926@code.launchpad.net

Commit message

New script utilities/format-imports is now available.

Description of the change

This moves the import-formats script into the launchpad tree as "utilities/format-imports". The code of the actual script was left unchanged, just some documentation and helpful output was added.

The second file is a list of standard python libraries which I kept in a different file because it is just a long and boring sequence of strings. To be able to import it the script updates sys.path with its own directory, so the two files must be in the same directory.

I felt that tests were not really needed since the tool changes other files in the Launchpad tree and the real test is if those changes break the test suite or not.

To post a comment you must log in.
Revision history for this message
Brad Crittenden (bac) wrote :

Henning thank you again for writing this tool and shepherding the initial reformatting branch. Great job!

typo: s/Tow/Two
typo: s/seperated/separated
type: s/reformated/reformatted

I think "directly preceded" should be "immediately preceded by". It sounds better to me.

Other than that it looks good.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== added file 'utilities/format-imports'
2--- utilities/format-imports 1970-01-01 00:00:00 +0000
3+++ utilities/format-imports 2010-08-27 20:21:03 +0000
4@@ -0,0 +1,387 @@
5+#!/usr/bin/python
6+#
7+# Copyright 2010 Canonical Ltd. This software is licensed under the
8+# GNU Affero General Public License version 3 (see the file LICENSE).
9+
10+""" Format import sections in python files
11+
12+= Usage =
13+
14+format-imports <file or directory> ...
15+
16+= Operation =
17+
18+The script will process each filename on the command line. If the file is a
19+directory it recurses into it an process all *.py files found in the tree.
20+It will output the paths of all the files that have been changed.
21+
22+For Launchpad it was applied to the "lib/canonical/launchpad" and the "lib/lp"
23+subtrees. Running it with those parameters on a freshly branched LP tree
24+should not produce any output, meaning that all the files in the tree should
25+be formatted correctly.
26+
27+The script identifies the import section of each file as a block of lines
28+that start with "import" or "from" or are indented with at least one space or
29+are blank lines. Comment lines are also included if they are followed by an
30+import statement. An inital __future__ import and a module docstring are
31+explicitly skipped.
32+
33+The import section is rewritten as three subsections, each separated by a
34+blank line. Any of the sections may be empty.
35+ 1. Standard python library modules
36+ 2. Import statements explicitly ordered to the top (see below)
37+ 3. Third-party modules, meaning anything not fitting one of the other
38+ subsection criteria
39+ 4. Local modules that begin with "canonical" or "lp".
40+
41+Each section is sorted alphabetically by module name. Each module is put
42+on its own line, i.e.
43+{{{
44+ import os, sys
45+}}}
46+becomes
47+{{{
48+ import os
49+ import sys
50+}}}
51+Multiple import statements for the same module are conflated into one
52+statement, or two if the module was imported alongside an object inside it,
53+i.e.
54+{{{
55+ import sys
56+ from sys import stdin
57+}}}
58+
59+Statements that import more than one objects are put on multiple lines in
60+list style, i.e.
61+{{{
62+ from sys import (
63+ stdin,
64+ stdout,
65+ )
66+}}}
67+Objects are sorted alphabetically and case-insensitively. One-object imports
68+are only formatted in this manner if the statement exceeds 78 characters in
69+length.
70+
71+Comments stick with the import statement that followed them. Comments at the
72+end of one-line statements are moved to be be in front of it, .i.e.
73+{{{
74+ from sys import exit # Have a way out
75+}}}
76+becomes
77+{{{
78+ # Have a way out
79+ from sys import exit
80+}}}
81+
82+= Format control =
83+
84+Two special comments allow to control the operation of the formatter.
85+
86+When an import statement is immediately preceded by a comment that starts
87+with the word "FIRST", it is placed into the second subsection (see above).
88+
89+When the first import statement is directly preceded by a comment that starts
90+with the word "SKIP", the entire file is exempt from formatting.
91+
92+= Known bugs =
93+
94+Make sure to always check the result of the re-formatting to see if you have
95+been bitten by one of these.
96+
97+Comments inside multi-line import statements break the formatter. A statement
98+like this will be ignored:
99+{{{
100+ from lp.app.interfaces import (
101+ # Don't do this.
102+ IMyInterface,
103+ IMyOtherInterface, # Don't do this either
104+ )
105+}}}
106+Actually, this will make the statement and all following to be ignored:
107+{{{
108+ from lp.app.interfaces import (
109+ # Breaks indentation rules anyway.
110+ IMyInterface,
111+ IMyOtherInterface,
112+ )
113+}}}
114+
115+If a single-line statement has both a comment in front of it and at the end
116+of the line, only the end-line comment will survive. This could probably
117+easily be fixed to concatenate the too.
118+{{{
119+ # I am a gonner.
120+ from lp.app.interfaces import IMyInterface # I will survive!
121+}}}
122+
123+Line continuation characters are recognized and resolved but
124+not re-introduced. This may leave the re-formatted text with a line that
125+is over the length limit.
126+{{{
127+ from lp.app.verylongnames.orverlydeep.modulestructure.leavenoroom \
128+ import object
129+}}}
130+"""
131+
132+__metaclass__ = type
133+
134+# SKIP this file when reformatting.
135+import os
136+import re
137+import sys
138+from textwrap import dedent
139+
140+sys.path[0:0] = [os.path.dirname(__file__)]
141+from python_standard_libs import python_standard_libs
142+
143+
144+# To search for escaped newline chars.
145+escaped_nl_regex = re.compile("\\\\\n", re.M)
146+import_regex = re.compile("^import +(?P<module>.+)$", re.M)
147+from_import_single_regex = re.compile(
148+ "^from (?P<module>.+) +import +"
149+ "(?P<objects>[*]|[a-zA-Z0-9_, ]+)"
150+ "(?P<comment>#.*)?$", re.M)
151+from_import_multi_regex = re.compile(
152+ "^from +(?P<module>.+) +import *[(](?P<objects>[a-zA-Z0-9_, \n]+)[)]$", re.M)
153+comment_regex = re.compile(
154+ "(?P<comment>(^#.+\n)+)(^import|^from) +(?P<module>[a-zA-Z0-9_.]+)", re.M)
155+split_regex = re.compile(",\s*")
156+
157+# Module docstrings are multiline (""") strings that are not indented and are
158+# followed at some point by an import .
159+module_docstring_regex = re.compile(
160+ '(?P<docstring>^["]{3}[^"]+["]{3}\n).*^(import |from .+ import)', re.M | re.S)
161+# The imports section starts with an import state that is not a __future__
162+# import and consists of import lines, indented lines, empty lines and
163+# comments which are followed by an import line. Sometimes we even find
164+# lines that contain a single ")"... :-(
165+imports_section_regex = re.compile(
166+ "(^#.+\n)*^(import|(from ((?!__future__)\S+) import)).*\n"
167+ "(^import .+\n|^from .+\n|^[\t ]+.+\n|(^#.+\n)+((^import|^from) .+\n)|^\n|^[)]\n)*",
168+ re.M)
169+
170+
171+def format_import_lines(module, objects):
172+ """Generate correct from...import strings."""
173+ if len(objects) == 1:
174+ statement = "from %s import %s" % (module, objects[0])
175+ if len(statement) < 79:
176+ return statement
177+ return "from %s import (\n %s,\n )" % (
178+ module, ",\n ".join(objects))
179+
180+
181+def find_imports_section(content):
182+ """Return that part of the file that contains the import statements."""
183+ # Skip module docstring.
184+ match = module_docstring_regex.search(content)
185+ if match is None:
186+ startpos = 0
187+ else:
188+ startpos = match.end('docstring')
189+
190+ match = imports_section_regex.search(content, startpos)
191+ if match is None:
192+ return (None, None)
193+ startpos = match.start()
194+ endpos = match.end()
195+ if content[startpos:endpos].startswith('# SKIP'):
196+ # Skip files explicitely.
197+ return(None, None)
198+ return (startpos, endpos)
199+
200+
201+class ImportStatement:
202+ """Holds information about an import statement."""
203+
204+ def __init__(self, objects=None, comment=None):
205+ self.import_module = objects is None
206+ if objects is None:
207+ self.objects = None
208+ else:
209+ self.objects = sorted(objects, key=str.lower)
210+ self.comment = comment
211+
212+ def addObjects(self, new_objects):
213+ """More objects in this statement; eliminate duplicates."""
214+ if self.objects is None:
215+ # No objects so far.
216+ self.objects = new_objects
217+ else:
218+ # Use set to eliminate double objects.
219+ more_objects = set(self.objects + new_objects)
220+ self.objects = sorted(list(more_objects), key=str.lower)
221+
222+ def setComment(self, comment):
223+ """Add a comment to the statement."""
224+ self.comment = comment
225+
226+
227+def parse_import_statements(import_section):
228+ """Split the import section into statements.
229+
230+ Returns a dictionary with the module as the key and the objects being
231+ imported as a sorted list of strings."""
232+ imports = {}
233+ # Search for escaped newlines and remove them.
234+ searchpos = 0
235+ while True:
236+ match = escaped_nl_regex.search(import_section, searchpos)
237+ if match is None:
238+ break
239+ start = match.start()
240+ end = match.end()
241+ import_section = import_section[:start]+import_section[end:]
242+ searchpos = start
243+ # Search for simple one-line import statements.
244+ searchpos = 0
245+ while True:
246+ match = import_regex.search(import_section, searchpos)
247+ if match is None:
248+ break
249+ # These imports are marked by a "None" value.
250+ # Multiple modules in one statement are split up.
251+ for module in split_regex.split(match.group('module').strip()):
252+ imports[module] = ImportStatement()
253+ searchpos = match.end()
254+ # Search for "from ... import" statements.
255+ for pattern in (from_import_single_regex, from_import_multi_regex):
256+ searchpos = 0
257+ while True:
258+ match = pattern.search(import_section, searchpos)
259+ if match is None:
260+ break
261+ import_objects = split_regex.split(
262+ match.group('objects').strip(" \n,"))
263+ module = match.group('module').strip()
264+ # Only one pattern has a 'comment' group.
265+ comment = match.groupdict().get('comment', None)
266+ if module in imports:
267+ # Catch double import lines.
268+ imports[module].addObjects(import_objects)
269+ else:
270+ imports[module] = ImportStatement(import_objects)
271+ if comment is not None:
272+ imports[module].setComment(comment)
273+ searchpos = match.end()
274+ # Search for comments in import section.
275+ searchpos = 0
276+ while True:
277+ match = comment_regex.search(import_section, searchpos)
278+ if match is None:
279+ break
280+ module = match.group('module').strip()
281+ comment = match.group('comment').strip()
282+ imports[module].setComment(comment)
283+ searchpos = match.end()
284+
285+ return imports
286+
287+
288+def format_imports(imports):
289+ """Group and order imports, return the new import statements."""
290+ standard_section = {}
291+ first_section = {}
292+ thirdparty_section = {}
293+ local_section = {}
294+ # Group modules into sections.
295+ for module, statement in imports.iteritems():
296+ module_base = module.split('.')[0]
297+ comment = statement.comment
298+ if comment is not None and comment.startswith("# FIRST"):
299+ first_section[module] = statement
300+ elif module_base in ('canonical', 'lp'):
301+ local_section[module] = statement
302+ elif module_base in python_standard_libs:
303+ standard_section[module] = statement
304+ else:
305+ thirdparty_section[module] = statement
306+
307+ all_import_lines = []
308+ # Sort within each section and generate statement strings.
309+ sections = (
310+ standard_section,
311+ first_section,
312+ thirdparty_section,
313+ local_section,
314+ )
315+ for section in sections:
316+ import_lines = []
317+ for module in sorted(section.keys(), key=str.lower):
318+ if section[module].comment is not None:
319+ import_lines.append(section[module].comment)
320+ if section[module].import_module:
321+ import_lines.append("import %s" % module)
322+ if section[module].objects is not None:
323+ import_lines.append(
324+ format_import_lines(module, section[module].objects))
325+ if len(import_lines) > 0:
326+ all_import_lines.append('\n'.join(import_lines))
327+ # Sections are separated by two blank lines.
328+ return '\n\n'.join(all_import_lines)
329+
330+
331+def reformat_importsection(filename):
332+ """Replace the given file with a reformatted version of it."""
333+ pyfile = file(filename).read()
334+ import_start, import_end = find_imports_section(pyfile)
335+ if import_start is None:
336+ # Skip files with no import section.
337+ return False
338+ imports_section = pyfile[import_start:import_end]
339+ imports = parse_import_statements(imports_section)
340+
341+ if pyfile[import_end:import_end+1] != '#':
342+ # Two newlines before anything but comments.
343+ number_of_newlines = 3
344+ else:
345+ number_of_newlines = 2
346+
347+ new_imports = format_imports(imports)+"\n"*number_of_newlines
348+ if new_imports == imports_section:
349+ # No change, no need to write a new file.
350+ return False
351+
352+ new_file = open(filename, "w")
353+ new_file.write(pyfile[:import_start])
354+ new_file.write(new_imports)
355+ new_file.write(pyfile[import_end:])
356+
357+ return True
358+
359+
360+def process_file(fpath):
361+ """Process the file with the given path."""
362+ changed = reformat_importsection(fpath)
363+ if changed:
364+ print fpath
365+
366+
367+def process_tree(dpath):
368+ """Walk a directory tree and process all *.py files."""
369+ for dirpath, dirnames, filenames in os.walk(dpath):
370+ for filename in filenames:
371+ if filename.endswith('.py'):
372+ process_file(os.path.join(dirpath, filename))
373+
374+
375+if __name__ == "__main__":
376+ if len(sys.argv) == 1 or sys.argv[1] in ("-h", "-?", "--help"):
377+ sys.stderr.write(dedent("""\
378+ usage: format-imports <file or directory> ...
379+
380+ Type "format-imports --docstring | less" to see the documentation.
381+ """))
382+ sys.exit(1)
383+ if sys.argv[1] == "--docstring":
384+ sys.stdout.write(__doc__)
385+ sys.exit(2)
386+ for filename in sys.argv[1:]:
387+ if os.path.isdir(filename):
388+ process_tree(filename)
389+ else:
390+ process_file(filename)
391+ sys.exit(0)
392
393=== added file 'utilities/python_standard_libs.py'
394--- utilities/python_standard_libs.py 1970-01-01 00:00:00 +0000
395+++ utilities/python_standard_libs.py 2010-08-27 20:21:03 +0000
396@@ -0,0 +1,240 @@
397+# Copyright 2010 Canonical Ltd. This software is licensed under the
398+# GNU Affero General Public License version 3 (see the file LICENSE).
399+
400+""" A list of top-level standard python library names.
401+
402+This list is used by format-imports to determine if a module is in this group
403+or not.
404+The list is taken from http://docs.python.org/release/2.5.4/lib/modindex.html
405+but modules specific to other OSs have been taken out. It may need to be
406+updated from time to time.
407+"""
408+
409+python_standard_libs = [
410+ 'aifc',
411+ 'anydbm',
412+ 'array',
413+ 'asynchat',
414+ 'asyncore',
415+ 'atexit',
416+ 'audioop',
417+ 'base64',
418+ 'BaseHTTPServer',
419+ 'Bastion',
420+ 'binascii',
421+ 'binhex',
422+ 'bisect',
423+ 'bsddb',
424+ 'bz2',
425+ 'calendar',
426+ 'cgi',
427+ 'CGIHTTPServer',
428+ 'cgitb',
429+ 'chunk',
430+ 'cmath',
431+ 'cmd',
432+ 'code',
433+ 'codecs',
434+ 'codeop',
435+ 'collections',
436+ 'colorsys',
437+ 'commands',
438+ 'compileall',
439+ 'compiler',
440+ 'ConfigParser',
441+ 'contextlib',
442+ 'Cookie',
443+ 'cookielib',
444+ 'copy',
445+ 'copy_reg',
446+ 'cPickle',
447+ 'cProfile',
448+ 'crypt',
449+ 'cStringIO',
450+ 'csv',
451+ 'ctypes',
452+ 'curses',
453+ 'datetime',
454+ 'dbhash',
455+ 'dbm',
456+ 'decimal',
457+ 'difflib',
458+ 'dircache',
459+ 'dis',
460+ 'distutils',
461+ 'dl',
462+ 'doctest',
463+ 'DocXMLRPCServer',
464+ 'dumbdbm',
465+ 'dummy_thread',
466+ 'dummy_threading',
467+ 'email',
468+ 'encodings',
469+ 'errno',
470+ 'exceptions',
471+ 'fcntl',
472+ 'filecmp',
473+ 'fileinput',
474+ 'fnmatch',
475+ 'formatter',
476+ 'fpectl',
477+ 'fpformat',
478+ 'ftplib',
479+ 'functools',
480+ 'gc',
481+ 'gdbm',
482+ 'getopt',
483+ 'getpass',
484+ 'gettext',
485+ 'glob',
486+ 'gopherlib',
487+ 'grp',
488+ 'gzip',
489+ 'hashlib',
490+ 'heapq',
491+ 'hmac',
492+ 'hotshot',
493+ 'htmlentitydefs',
494+ 'htmllib',
495+ 'HTMLParser',
496+ 'httplib',
497+ 'imageop',
498+ 'imaplib',
499+ 'imghdr',
500+ 'imp',
501+ 'inspect',
502+ 'itertools',
503+ 'keyword',
504+ 'linecache',
505+ 'locale',
506+ 'logging',
507+ 'mailbox',
508+ 'mailcap',
509+ 'marshal',
510+ 'math',
511+ 'md5',
512+ 'mhlib',
513+ 'mimetools',
514+ 'mimetypes',
515+ 'MimeWriter',
516+ 'mimify',
517+ 'mmap',
518+ 'modulefinder',
519+ 'multifile',
520+ 'mutex',
521+ 'netrc',
522+ 'new',
523+ 'nis',
524+ 'nntplib',
525+ 'operator',
526+ 'optparse',
527+ 'os',
528+ 'ossaudiodev',
529+ 'parser',
530+ 'pdb',
531+ 'pickle',
532+ 'pickletools',
533+ 'pipes',
534+ 'pkgutil',
535+ 'platform',
536+ 'popen2',
537+ 'poplib',
538+ 'posix',
539+ 'posixfile',
540+ 'pprint',
541+ 'profile',
542+ 'pstats',
543+ 'pty',
544+ 'pwd',
545+ 'py_compile',
546+ 'pyclbr',
547+ 'pydoc',
548+ 'Queue',
549+ 'quopri',
550+ 'random',
551+ 're',
552+ 'readline',
553+ 'repr',
554+ 'resource',
555+ 'rexec',
556+ 'rfc822',
557+ 'rgbimg',
558+ 'rlcompleter',
559+ 'robotparser',
560+ 'runpy',
561+ 'sched',
562+ 'ScrolledText',
563+ 'select',
564+ 'sets',
565+ 'sgmllib',
566+ 'sha',
567+ 'shelve',
568+ 'shlex',
569+ 'shutil',
570+ 'signal',
571+ 'SimpleHTTPServer',
572+ 'SimpleXMLRPCServer',
573+ 'site',
574+ 'smtpd',
575+ 'smtplib',
576+ 'sndhdr',
577+ 'socket',
578+ 'SocketServer',
579+ 'spwd',
580+ 'sqlite3',
581+ 'stat',
582+ 'statvfs',
583+ 'string',
584+ 'StringIO',
585+ 'stringprep',
586+ 'struct',
587+ 'subprocess',
588+ 'sunau',
589+ 'symbol',
590+ 'sys',
591+ 'syslog',
592+ 'tabnanny',
593+ 'tarfile',
594+ 'telnetlib',
595+ 'tempfile',
596+ 'termios',
597+ 'test.test_support',
598+ 'test',
599+ 'textwrap',
600+ 'thread',
601+ 'threading',
602+ 'time',
603+ 'timeit',
604+ 'Tix',
605+ 'Tkinter',
606+ 'token',
607+ 'tokenize',
608+ 'trace',
609+ 'traceback',
610+ 'tty',
611+ 'turtle',
612+ 'types',
613+ 'unicodedata',
614+ 'unittest',
615+ 'urllib2',
616+ 'urllib',
617+ 'urlparse',
618+ 'user',
619+ 'UserDict',
620+ 'UserList',
621+ 'UserString',
622+ 'uu',
623+ 'uuid',
624+ 'warnings',
625+ 'wave',
626+ 'weakref',
627+ 'webbrowser',
628+ 'whichdb',
629+ 'wsgiref',
630+ 'xdrlib',
631+ 'xml',
632+ 'xmlrpclib',
633+ 'zipfile',
634+ 'zipimport',
635+ 'zlib',
636+ ]