Merge lp:~jameinel/bzr/2.1.0b3-win32-shell-completion into lp:bzr

Proposed by John A Meinel
Status: Merged
Merged at revision: not available
Proposed branch: lp:~jameinel/bzr/2.1.0b3-win32-shell-completion
Merge into: lp:bzr
Diff against target: 410 lines
5 files modified
NEWS (+9/-0)
bzrlib/builtins.py (+0/-1)
bzrlib/commands.py (+6/-5)
bzrlib/tests/test_win32utils.py (+92/-2)
bzrlib/win32utils.py (+152/-27)
To merge this branch: bzr merge lp:~jameinel/bzr/2.1.0b3-win32-shell-completion
Reviewer Review Type Date Requested Status
David Roberts (community) Needs Information
Alexander Belchenko Needs Fixing
Martin Packman (community) Abstain
Robert Collins (community) Approve
Review via email: mp+14445@code.launchpad.net
To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote :

We've had a lot of proposals that sort of piece-meal implement support for wildcards on Windows. However, a while back I realized we could just implement global wildcard expansion, because we have access to the raw Unicode string that was used to start this process.

At the time Alexander was skeptical because we couldn't just use 'shlex.shlex' because it didn't handle unicode, and it always removed '\' as an escape character. Which doesn't work well if you supply "bzr status C:\foo\bar\baz".

However, I'm happy to note that shlex.shlex is actually not that complex if you get rid of all of its extra flags, etc. It is a rather simple state parser. I certainly don't think we need the ability to support injecting extra command lines into the middle of parsing, or anything else crazy.

I think I have reasonable tests for the actual parsing functionality, and I've tested things manually to make sure it works.

This means that all bzr commands will now get filename globbing, akin to 'bash' style globbing. So you can do: "bzr st *.h" or "bzr commit *.c", or "bzr mv *.h foo/"

It does the bash behavior that if "*.h" doesn't match anything it uses a literal '*.h'. We *could* error on those circumstances, though this seemed fine.

It also means that we get "bzr ignore '*.h'" and "bzr commit -m 'single quoted strings are supported.'".

This is a potential point of breakage for Windows users, so I'd like to get more people testing it than me. (I still call 'win32' bzr from a cygwin shell, so I rarely will see the glob support actually activate. And it probably will break 'bzr ignore "*.h"' for me, because bash will strip the "" and then Win32 won't see the "" to skip expanding the glob. However, I rarely do that, so I'll survive.)

Revision history for this message
Robert Collins (lifeless) wrote :

 review: +1

awesome!

Code looks good to me

-Rob

review: Approve
Revision history for this message
Martin Packman (gz) wrote :

I am unconviced that it's a good idea to try and make cmd behvave more like bash. Nix refugees tend to use cygwin or something like John does, and some of these behaviours will be suprising to anyone used to windows.

This particular change isn't so bad, as programs tend to glob inconsistently anyway, but I don't like the addition of single quotes as escape characters.

C:\bzr\bzr\2.1.0b3-win32-shell-completion>echo Be nice; be mean > dos_and_don'ts.txt

C:\bzr\bzr\2.1.0b3-win32-shell-completion>python bzr add dos_and_don'ts.txt
bzr: ERROR: No such file: u'C:/bzr/bzr/2.1.0b3-win32-shell-completion/dos_and_donts.txt'

review: Abstain
Revision history for this message
Alexander Belchenko (bialix) wrote :

I've late here but I want to say my word.

John, I agree with Martin that using single quotes is bad idea. I hope you decided to remove them. All Windows people used to double quotes so single quotes is not Windows way.

Also I think will be nice to have a way to disable this glob expansion via environment variable, e.g. BZR_GLOB=0 or BZR_GLOB=no. Using configs there is wrong idea, but os.environ should be OK and fast.

review: Needs Fixing
Revision history for this message
David Roberts (smartgpx) wrote :

Is it possible for an end-user who runs from a *setup.exe binary installer on a physical MS WinXP system to test this?

(Since I don't run from a branch it would seem that simply merging the changes is not an option? Would it be valid to 'bzr init' the installed directory tree and then switch to John's branch to update or merge the changes?)

review: Needs Information
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

David Roberts wrote:
> Review: Needs Information
> Is it possible for an end-user who runs from a *setup.exe binary installer on a physical MS WinXP system to test this?
>
> (Since I don't run from a branch it would seem that simply merging the changes is not an option? Would it be valid to 'bzr init' the installed directory tree and then switch to John's branch to update or merge the changes?)
>

Well, you could use the setup.exe installer to download the latest
source, and then install a few things manually (python being the big one.)

Or you could wait about 1-2 days when 2.1.0b3 will be released :)

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkr4bYAACgkQJdeBCYSNAANx5QCgizrpX2LrYqzP8gEadiXHVvb7
6mUAmgILGGHqEVn+0rCy7CuO4m5btdDW
=UH5U
-----END PGP SIGNATURE-----

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2009-11-04 09:52:44 +0000
3+++ NEWS 2009-11-04 22:45:22 +0000
4@@ -29,6 +29,15 @@
5 allow those because XML store silently translate it anyway. (The parser
6 auto-translates \r\n => \n in ways that are hard for us to catch.)
7
8+* On Windows, do glob expansion at the command-line level (as is usually
9+ done in bash, etc.) This means that *all* commands get glob expansion
10+ (bzr status, bzr add, bzr mv, etc). It uses a custom command line
11+ parser, which allows us to know if a given section was quoted. It means
12+ you can now do ``bzr ignore "*.py"``. It also means that single-quotes
13+ are now treated as quoted ``bzr ignore '*.py'``.
14+ (John Arbash Meinel, #425510, #426410, #194450)
15+
16+
17 Improvements
18 ************
19
20
21=== modified file 'bzrlib/builtins.py'
22--- bzrlib/builtins.py 2009-11-03 20:24:25 +0000
23+++ bzrlib/builtins.py 2009-11-04 22:45:22 +0000
24@@ -655,7 +655,6 @@
25 if base_tree:
26 base_tree.lock_read()
27 try:
28- file_list = self._maybe_expand_globs(file_list)
29 tree, file_list = tree_files_for_add(file_list)
30 added, ignored = tree.smart_add(file_list, not
31 no_recurse, action=action, save=not dry_run)
32
33=== modified file 'bzrlib/commands.py'
34--- bzrlib/commands.py 2009-10-14 20:02:28 +0000
35+++ bzrlib/commands.py 2009-11-04 22:45:22 +0000
36@@ -56,6 +56,7 @@
37 from bzrlib.symbol_versioning import (
38 deprecated_function,
39 deprecated_in,
40+ deprecated_method,
41 suppress_deprecation_warnings,
42 )
43
44@@ -383,18 +384,18 @@
45 # List of standard options directly supported
46 self.supported_std_options = []
47
48+ @deprecated_method(deprecated_in((2, 1, 0)))
49 def _maybe_expand_globs(self, file_list):
50 """Glob expand file_list if the platform does not do that itself.
51
52+ Not used anymore, now that the bzr command-line parser globs on
53+ Windows.
54+
55 :return: A possibly empty list of unicode paths.
56
57 Introduced in bzrlib 0.18.
58 """
59- if not file_list:
60- file_list = []
61- if sys.platform == 'win32':
62- file_list = win32utils.glob_expand(file_list)
63- return list(file_list)
64+ return file_list
65
66 def _usage(self):
67 """Return single-line grammar for this command.
68
69=== modified file 'bzrlib/tests/test_win32utils.py'
70--- bzrlib/tests/test_win32utils.py 2009-07-03 14:26:34 +0000
71+++ bzrlib/tests/test_win32utils.py 2009-11-04 22:45:22 +0000
72@@ -17,7 +17,11 @@
73 import os
74 import sys
75
76-from bzrlib import osutils
77+from bzrlib import (
78+ osutils,
79+ tests,
80+ win32utils,
81+ )
82 from bzrlib.tests import (
83 Feature,
84 TestCase,
85@@ -26,7 +30,6 @@
86 UnicodeFilenameFeature,
87 )
88 from bzrlib.win32utils import glob_expand, get_app_path
89-from bzrlib import win32utils
90
91
92 # Features
93@@ -261,3 +264,90 @@
94 os.makedirs(u'\u1234\\.bzr')
95 path = osutils.abspath(u'\u1234\\.bzr')
96 win32utils.set_file_attr_hidden(path)
97+
98+
99+
100+class TestUnicodeShlex(tests.TestCase):
101+
102+ def assertAsTokens(self, expected, line):
103+ s = win32utils.UnicodeShlex(line)
104+ self.assertEqual(expected, list(s))
105+
106+ def test_simple(self):
107+ self.assertAsTokens([(False, u'foo'), (False, u'bar'), (False, u'baz')],
108+ u'foo bar baz')
109+
110+ def test_ignore_multiple_spaces(self):
111+ self.assertAsTokens([(False, u'foo'), (False, u'bar')], u'foo bar')
112+
113+ def test_ignore_leading_space(self):
114+ self.assertAsTokens([(False, u'foo'), (False, u'bar')], u' foo bar')
115+
116+ def test_ignore_trailing_space(self):
117+ self.assertAsTokens([(False, u'foo'), (False, u'bar')], u'foo bar ')
118+
119+ def test_posix_quotations(self):
120+ self.assertAsTokens([(True, u'foo bar')], u'"foo bar"')
121+ self.assertAsTokens([(True, u'foo bar')], u"'foo bar'")
122+ self.assertAsTokens([(True, u'foo bar')], u"'fo''o b''ar'")
123+ self.assertAsTokens([(True, u'foo bar')], u'"fo""o b""ar"')
124+ self.assertAsTokens([(True, u'foo bar')], u'"fo"\'o b\'"ar"')
125+
126+ def test_nested_quotations(self):
127+ self.assertAsTokens([(True, u'foo"" bar')], u"'foo\"\" bar'")
128+ self.assertAsTokens([(True, u'foo\'\' bar')], u"\"foo'' bar\"")
129+
130+ def test_empty_result(self):
131+ self.assertAsTokens([], u'')
132+ self.assertAsTokens([], u' ')
133+
134+ def test_quoted_empty(self):
135+ self.assertAsTokens([(True, '')], u'""')
136+ self.assertAsTokens([(True, '')], u"''")
137+
138+ def test_unicode_chars(self):
139+ self.assertAsTokens([(False, u'f\xb5\xee'), (False, u'\u1234\u3456')],
140+ u'f\xb5\xee \u1234\u3456')
141+
142+ def test_newline_in_quoted_section(self):
143+ self.assertAsTokens([(True, u'foo\nbar\nbaz\n')], u'"foo\nbar\nbaz\n"')
144+ self.assertAsTokens([(True, u'foo\nbar\nbaz\n')], u"'foo\nbar\nbaz\n'")
145+
146+ def test_escape_chars(self):
147+ self.assertAsTokens([(False, u'foo\\bar')], u'foo\\bar')
148+
149+ def test_escape_quote(self):
150+ self.assertAsTokens([(True, u'foo"bar')], u'"foo\\"bar"')
151+ self.assertAsTokens([(True, u'foo\\"bar')], u"'foo\\\"bar'")
152+
153+ def test_double_escape(self):
154+ self.assertAsTokens([(True, u'foo\\bar')], u'"foo\\\\bar"')
155+ self.assertAsTokens([(True, u'foo\\\\bar')], u"'foo\\\\bar'")
156+ self.assertAsTokens([(False, u'foo\\\\bar')], u"foo\\\\bar")
157+
158+
159+class Test_CommandLineToArgv(tests.TestCaseInTempDir):
160+
161+ def assertCommandLine(self, expected, line):
162+ self.assertEqual(expected, win32utils._command_line_to_argv(line))
163+
164+ def test_glob_paths(self):
165+ self.build_tree(['a/', 'a/b.c', 'a/c.c', 'a/c.h'])
166+ self.assertCommandLine([u'a/b.c', u'a/c.c'], 'a/*.c')
167+ self.build_tree(['b/', 'b/b.c', 'b/d.c', 'b/d.h'])
168+ self.assertCommandLine([u'a/b.c', u'b/b.c'], '*/b.c')
169+ self.assertCommandLine([u'a/b.c', u'a/c.c', u'b/b.c', u'b/d.c'],
170+ '*/*.c')
171+ # Bash style, just pass through the argument if nothing matches
172+ self.assertCommandLine([u'*/*.qqq'], '*/*.qqq')
173+
174+ def test_quoted_globs(self):
175+ self.build_tree(['a/', 'a/b.c', 'a/c.c', 'a/c.h'])
176+ self.assertCommandLine([u'a/*.c'], '"a/*.c"')
177+ self.assertCommandLine([u'a/*.c'], "'a/*.c'")
178+
179+ def test_slashes_changed(self):
180+ self.assertCommandLine([u'a/*.c'], '"a\\*.c"')
181+ # Expands the glob, but nothing matches
182+ self.assertCommandLine([u'a/*.c'], 'a\\*.c')
183+ self.assertCommandLine([u'a/foo.c'], 'a\\foo.c')
184
185=== modified file 'bzrlib/win32utils.py'
186--- bzrlib/win32utils.py 2009-07-08 14:37:25 +0000
187+++ bzrlib/win32utils.py 2009-11-04 22:45:22 +0000
188@@ -19,8 +19,12 @@
189 Only one dependency: ctypes should be installed.
190 """
191
192+import glob
193 import os
194+import re
195+import shlex
196 import struct
197+import StringIO
198 import sys
199
200
201@@ -422,6 +426,26 @@
202
203
204
205+def glob_one(possible_glob):
206+ """Same as glob.glob().
207+
208+ work around bugs in glob.glob()
209+ - Python bug #1001604 ("glob doesn't return unicode with ...")
210+ - failing expansion for */* with non-iso-8859-* chars
211+ """
212+ corrected_glob, corrected = _ensure_with_dir(possible_glob)
213+ glob_files = glob.glob(corrected_glob)
214+
215+ if not glob_files:
216+ # special case to let the normal code path handle
217+ # files that do not exist, etc.
218+ glob_files = [possible_glob]
219+ elif corrected:
220+ glob_files = [_undo_ensure_with_dir(elem, corrected)
221+ for elem in glob_files]
222+ return [elem.replace(u'\\', u'/') for elem in glob_files]
223+
224+
225 def glob_expand(file_list):
226 """Replacement for glob expansion by the shell.
227
228@@ -435,25 +459,10 @@
229 """
230 if not file_list:
231 return []
232- import glob
233 expanded_file_list = []
234 for possible_glob in file_list:
235- # work around bugs in glob.glob()
236- # - Python bug #1001604 ("glob doesn't return unicode with ...")
237- # - failing expansion for */* with non-iso-8859-* chars
238- possible_glob, corrected = _ensure_with_dir(possible_glob)
239- glob_files = glob.glob(possible_glob)
240-
241- if glob_files == []:
242- # special case to let the normal code path handle
243- # files that do not exists
244- expanded_file_list.append(
245- _undo_ensure_with_dir(possible_glob, corrected))
246- else:
247- glob_files = [_undo_ensure_with_dir(elem, corrected) for elem in glob_files]
248- expanded_file_list += glob_files
249-
250- return [elem.replace(u'\\', u'/') for elem in expanded_file_list]
251+ expanded_file_list.extend(glob_one(possible_glob))
252+ return expanded_file_list
253
254
255 def get_app_path(appname):
256@@ -511,6 +520,124 @@
257 trace.mutter('Unable to set hidden attribute on %r: %s', path, e)
258
259
260+
261+class UnicodeShlex(object):
262+ """This is a very simplified version of shlex.shlex.
263+
264+ The main change is that it supports non-ascii input streams. The internal
265+ structure is quite simplified relative to shlex.shlex, since we aren't
266+ trying to handle multiple input streams, etc. In fact, we don't use a
267+ file-like api either.
268+ """
269+
270+ def __init__(self, uni_string):
271+ self._input = uni_string
272+ self._input_iter = iter(self._input)
273+ self._whitespace_match = re.compile(u'\s').match
274+ self._word_match = re.compile(u'\S').match
275+ self._quote_chars = u'\'"'
276+ # self._quote_match = re.compile(u'[\'"]').match
277+ self._escape_match = lambda x: None # Never matches
278+ self._escape = '\\'
279+ # State can be
280+ # ' ' - after whitespace, starting a new token
281+ # 'a' - after text, currently working on a token
282+ # '"' - after ", currently in a "-delimited quoted section
283+ # "'" - after ', currently in a '-delimited quotod section
284+ # "\" - after '\', checking the next char
285+ self._state = ' '
286+ self._token = [] # Current token being parsed
287+
288+ def _get_token(self):
289+ # Were there quote chars as part of this token?
290+ quoted = False
291+ quoted_state = None
292+ for nextchar in self._input_iter:
293+ if self._state == ' ':
294+ if self._whitespace_match(nextchar):
295+ # if self._token: return token
296+ continue
297+ elif nextchar in self._quote_chars:
298+ self._state = nextchar # quoted state
299+ elif self._word_match(nextchar):
300+ self._token.append(nextchar)
301+ self._state = 'a'
302+ else:
303+ raise AssertionError('wtttf?')
304+ elif self._state in self._quote_chars:
305+ quoted = True
306+ if nextchar == self._state: # End of quote
307+ self._state = 'a' # posix allows 'foo'bar to translate to
308+ # foobar
309+ elif self._state == '"' and nextchar == self._escape:
310+ quoted_state = self._state
311+ self._state = nextchar
312+ else:
313+ self._token.append(nextchar)
314+ elif self._state == self._escape:
315+ if nextchar == '\\':
316+ self._token.append('\\')
317+ elif nextchar == '"':
318+ self._token.append(nextchar)
319+ else:
320+ self._token.append('\\' + nextchar)
321+ self._state = quoted_state
322+ elif self._state == 'a':
323+ if self._whitespace_match(nextchar):
324+ if self._token:
325+ break # emit this token
326+ else:
327+ continue # no token to emit
328+ elif nextchar in self._quote_chars:
329+ # Start a new quoted section
330+ self._state = nextchar
331+ # escape?
332+ elif (self._word_match(nextchar)
333+ or nextchar in self._quote_chars
334+ # or whitespace_split?
335+ ):
336+ self._token.append(nextchar)
337+ else:
338+ raise AssertionError('state == "a", char: %r'
339+ % (nextchar,))
340+ else:
341+ raise AssertionError('unknown state: %r' % (self._state,))
342+ result = ''.join(self._token)
343+ self._token = []
344+ if not quoted and result == '':
345+ result = None
346+ return quoted, result
347+
348+ def __iter__(self):
349+ return self
350+
351+ def next(self):
352+ quoted, token = self._get_token()
353+ if token is None:
354+ raise StopIteration
355+ return quoted, token
356+
357+
358+def _command_line_to_argv(command_line):
359+ """Convert a Unicode command line into a set of argv arguments.
360+
361+ This does wildcard expansion, etc. It is intended to make wildcards act
362+ closer to how they work in posix shells, versus how they work by default on
363+ Windows.
364+ """
365+ s = UnicodeShlex(command_line)
366+ # Now that we've split the content, expand globs
367+ # TODO: Use 'globbing' instead of 'glob.glob', this gives us stuff like
368+ # '**/' style globs
369+ args = []
370+ for is_quoted, arg in s:
371+ if is_quoted or not glob.has_magic(arg):
372+ args.append(arg.replace(u'\\', u'/'))
373+ else:
374+ args.extend(glob_one(arg))
375+ return args
376+
377+
378 if has_ctypes and winver != 'Windows 98':
379 def get_unicode_argv():
380 LPCWSTR = ctypes.c_wchar_p
381@@ -520,21 +647,19 @@
382 GetCommandLine = prototype(("GetCommandLineW",
383 ctypes.windll.kernel32))
384 prototype = ctypes.WINFUNCTYPE(POINTER(LPCWSTR), LPCWSTR, POINTER(INT))
385- CommandLineToArgv = prototype(("CommandLineToArgvW",
386- ctypes.windll.shell32))
387- c = INT(0)
388- pargv = CommandLineToArgv(GetCommandLine(), ctypes.byref(c))
389+ command_line = GetCommandLine()
390 # Skip the first argument, since we only care about parameters
391- argv = [pargv[i] for i in range(1, c.value)]
392+ argv = _command_line_to_argv(GetCommandLine())[1:]
393 if getattr(sys, 'frozen', None) is None:
394 # Invoked via 'python.exe' which takes the form:
395 # python.exe [PYTHON_OPTIONS] C:\Path\bzr [BZR_OPTIONS]
396 # we need to get only BZR_OPTIONS part,
397- # so let's using sys.argv[1:] as reference to get the tail
398- # of unicode argv
399- tail_len = len(sys.argv[1:])
400- ix = len(argv) - tail_len
401- argv = argv[ix:]
402+ # We already removed 'python.exe' so we remove everything up to and
403+ # including the first non-option ('-') argument.
404+ for idx in xrange(len(argv)):
405+ if argv[idx][:1] != '-':
406+ break
407+ argv = argv[idx+1:]
408 return argv
409 else:
410 get_unicode_argv = None