Merge lp:~jameinel/bzr/2.1.0b3-win32-shell-completion into lp:bzr

Proposed by John A Meinel
Status: Merged
Merged at revision: not available
Proposed branch: lp:~jameinel/bzr/2.1.0b3-win32-shell-completion
Merge into: lp:bzr
Diff against target: 410 lines
5 files modified
NEWS (+9/-0)
bzrlib/builtins.py (+0/-1)
bzrlib/commands.py (+6/-5)
bzrlib/tests/test_win32utils.py (+92/-2)
bzrlib/win32utils.py (+152/-27)
To merge this branch: bzr merge lp:~jameinel/bzr/2.1.0b3-win32-shell-completion
Reviewer Review Type Date Requested Status
David Roberts (community) Needs Information
Alexander Belchenko Needs Fixing
Martin Packman (community) Abstain
Robert Collins (community) Approve
Review via email: mp+14445@code.launchpad.net
To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote :

We've had a lot of proposals that sort of piece-meal implement support for wildcards on Windows. However, a while back I realized we could just implement global wildcard expansion, because we have access to the raw Unicode string that was used to start this process.

At the time Alexander was skeptical because we couldn't just use 'shlex.shlex' because it didn't handle unicode, and it always removed '\' as an escape character. Which doesn't work well if you supply "bzr status C:\foo\bar\baz".

However, I'm happy to note that shlex.shlex is actually not that complex if you get rid of all of its extra flags, etc. It is a rather simple state parser. I certainly don't think we need the ability to support injecting extra command lines into the middle of parsing, or anything else crazy.

I think I have reasonable tests for the actual parsing functionality, and I've tested things manually to make sure it works.

This means that all bzr commands will now get filename globbing, akin to 'bash' style globbing. So you can do: "bzr st *.h" or "bzr commit *.c", or "bzr mv *.h foo/"

It does the bash behavior that if "*.h" doesn't match anything it uses a literal '*.h'. We *could* error on those circumstances, though this seemed fine.

It also means that we get "bzr ignore '*.h'" and "bzr commit -m 'single quoted strings are supported.'".

This is a potential point of breakage for Windows users, so I'd like to get more people testing it than me. (I still call 'win32' bzr from a cygwin shell, so I rarely will see the glob support actually activate. And it probably will break 'bzr ignore "*.h"' for me, because bash will strip the "" and then Win32 won't see the "" to skip expanding the glob. However, I rarely do that, so I'll survive.)

Revision history for this message
Robert Collins (lifeless) wrote :

 review: +1

awesome!

Code looks good to me

-Rob

review: Approve
Revision history for this message
Martin Packman (gz) wrote :

I am unconviced that it's a good idea to try and make cmd behvave more like bash. Nix refugees tend to use cygwin or something like John does, and some of these behaviours will be suprising to anyone used to windows.

This particular change isn't so bad, as programs tend to glob inconsistently anyway, but I don't like the addition of single quotes as escape characters.

C:\bzr\bzr\2.1.0b3-win32-shell-completion>echo Be nice; be mean > dos_and_don'ts.txt

C:\bzr\bzr\2.1.0b3-win32-shell-completion>python bzr add dos_and_don'ts.txt
bzr: ERROR: No such file: u'C:/bzr/bzr/2.1.0b3-win32-shell-completion/dos_and_donts.txt'

review: Abstain
Revision history for this message
Alexander Belchenko (bialix) wrote :

I've late here but I want to say my word.

John, I agree with Martin that using single quotes is bad idea. I hope you decided to remove them. All Windows people used to double quotes so single quotes is not Windows way.

Also I think will be nice to have a way to disable this glob expansion via environment variable, e.g. BZR_GLOB=0 or BZR_GLOB=no. Using configs there is wrong idea, but os.environ should be OK and fast.

review: Needs Fixing
Revision history for this message
David Roberts (smartgpx) wrote :

Is it possible for an end-user who runs from a *setup.exe binary installer on a physical MS WinXP system to test this?

(Since I don't run from a branch it would seem that simply merging the changes is not an option? Would it be valid to 'bzr init' the installed directory tree and then switch to John's branch to update or merge the changes?)

review: Needs Information
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

David Roberts wrote:
> Review: Needs Information
> Is it possible for an end-user who runs from a *setup.exe binary installer on a physical MS WinXP system to test this?
>
> (Since I don't run from a branch it would seem that simply merging the changes is not an option? Would it be valid to 'bzr init' the installed directory tree and then switch to John's branch to update or merge the changes?)
>

Well, you could use the setup.exe installer to download the latest
source, and then install a few things manually (python being the big one.)

Or you could wait about 1-2 days when 2.1.0b3 will be released :)

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkr4bYAACgkQJdeBCYSNAANx5QCgizrpX2LrYqzP8gEadiXHVvb7
6mUAmgILGGHqEVn+0rCy7CuO4m5btdDW
=UH5U
-----END PGP SIGNATURE-----

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'NEWS'
--- NEWS 2009-11-04 09:52:44 +0000
+++ NEWS 2009-11-04 22:45:22 +0000
@@ -29,6 +29,15 @@
29 allow those because XML store silently translate it anyway. (The parser29 allow those because XML store silently translate it anyway. (The parser
30 auto-translates \r\n => \n in ways that are hard for us to catch.)30 auto-translates \r\n => \n in ways that are hard for us to catch.)
3131
32* On Windows, do glob expansion at the command-line level (as is usually
33 done in bash, etc.) This means that *all* commands get glob expansion
34 (bzr status, bzr add, bzr mv, etc). It uses a custom command line
35 parser, which allows us to know if a given section was quoted. It means
36 you can now do ``bzr ignore "*.py"``. It also means that single-quotes
37 are now treated as quoted ``bzr ignore '*.py'``.
38 (John Arbash Meinel, #425510, #426410, #194450)
39
40
32Improvements41Improvements
33************42************
3443
3544
=== modified file 'bzrlib/builtins.py'
--- bzrlib/builtins.py 2009-11-03 20:24:25 +0000
+++ bzrlib/builtins.py 2009-11-04 22:45:22 +0000
@@ -655,7 +655,6 @@
655 if base_tree:655 if base_tree:
656 base_tree.lock_read()656 base_tree.lock_read()
657 try:657 try:
658 file_list = self._maybe_expand_globs(file_list)
659 tree, file_list = tree_files_for_add(file_list)658 tree, file_list = tree_files_for_add(file_list)
660 added, ignored = tree.smart_add(file_list, not659 added, ignored = tree.smart_add(file_list, not
661 no_recurse, action=action, save=not dry_run)660 no_recurse, action=action, save=not dry_run)
662661
=== modified file 'bzrlib/commands.py'
--- bzrlib/commands.py 2009-10-14 20:02:28 +0000
+++ bzrlib/commands.py 2009-11-04 22:45:22 +0000
@@ -56,6 +56,7 @@
56from bzrlib.symbol_versioning import (56from bzrlib.symbol_versioning import (
57 deprecated_function,57 deprecated_function,
58 deprecated_in,58 deprecated_in,
59 deprecated_method,
59 suppress_deprecation_warnings,60 suppress_deprecation_warnings,
60 )61 )
6162
@@ -383,18 +384,18 @@
383 # List of standard options directly supported384 # List of standard options directly supported
384 self.supported_std_options = []385 self.supported_std_options = []
385386
387 @deprecated_method(deprecated_in((2, 1, 0)))
386 def _maybe_expand_globs(self, file_list):388 def _maybe_expand_globs(self, file_list):
387 """Glob expand file_list if the platform does not do that itself.389 """Glob expand file_list if the platform does not do that itself.
388390
391 Not used anymore, now that the bzr command-line parser globs on
392 Windows.
393
389 :return: A possibly empty list of unicode paths.394 :return: A possibly empty list of unicode paths.
390395
391 Introduced in bzrlib 0.18.396 Introduced in bzrlib 0.18.
392 """397 """
393 if not file_list:398 return file_list
394 file_list = []
395 if sys.platform == 'win32':
396 file_list = win32utils.glob_expand(file_list)
397 return list(file_list)
398399
399 def _usage(self):400 def _usage(self):
400 """Return single-line grammar for this command.401 """Return single-line grammar for this command.
401402
=== modified file 'bzrlib/tests/test_win32utils.py'
--- bzrlib/tests/test_win32utils.py 2009-07-03 14:26:34 +0000
+++ bzrlib/tests/test_win32utils.py 2009-11-04 22:45:22 +0000
@@ -17,7 +17,11 @@
17import os17import os
18import sys18import sys
1919
20from bzrlib import osutils20from bzrlib import (
21 osutils,
22 tests,
23 win32utils,
24 )
21from bzrlib.tests import (25from bzrlib.tests import (
22 Feature,26 Feature,
23 TestCase,27 TestCase,
@@ -26,7 +30,6 @@
26 UnicodeFilenameFeature,30 UnicodeFilenameFeature,
27 )31 )
28from bzrlib.win32utils import glob_expand, get_app_path32from bzrlib.win32utils import glob_expand, get_app_path
29from bzrlib import win32utils
3033
3134
32# Features35# Features
@@ -261,3 +264,90 @@
261 os.makedirs(u'\u1234\\.bzr')264 os.makedirs(u'\u1234\\.bzr')
262 path = osutils.abspath(u'\u1234\\.bzr')265 path = osutils.abspath(u'\u1234\\.bzr')
263 win32utils.set_file_attr_hidden(path)266 win32utils.set_file_attr_hidden(path)
267
268
269
270class TestUnicodeShlex(tests.TestCase):
271
272 def assertAsTokens(self, expected, line):
273 s = win32utils.UnicodeShlex(line)
274 self.assertEqual(expected, list(s))
275
276 def test_simple(self):
277 self.assertAsTokens([(False, u'foo'), (False, u'bar'), (False, u'baz')],
278 u'foo bar baz')
279
280 def test_ignore_multiple_spaces(self):
281 self.assertAsTokens([(False, u'foo'), (False, u'bar')], u'foo bar')
282
283 def test_ignore_leading_space(self):
284 self.assertAsTokens([(False, u'foo'), (False, u'bar')], u' foo bar')
285
286 def test_ignore_trailing_space(self):
287 self.assertAsTokens([(False, u'foo'), (False, u'bar')], u'foo bar ')
288
289 def test_posix_quotations(self):
290 self.assertAsTokens([(True, u'foo bar')], u'"foo bar"')
291 self.assertAsTokens([(True, u'foo bar')], u"'foo bar'")
292 self.assertAsTokens([(True, u'foo bar')], u"'fo''o b''ar'")
293 self.assertAsTokens([(True, u'foo bar')], u'"fo""o b""ar"')
294 self.assertAsTokens([(True, u'foo bar')], u'"fo"\'o b\'"ar"')
295
296 def test_nested_quotations(self):
297 self.assertAsTokens([(True, u'foo"" bar')], u"'foo\"\" bar'")
298 self.assertAsTokens([(True, u'foo\'\' bar')], u"\"foo'' bar\"")
299
300 def test_empty_result(self):
301 self.assertAsTokens([], u'')
302 self.assertAsTokens([], u' ')
303
304 def test_quoted_empty(self):
305 self.assertAsTokens([(True, '')], u'""')
306 self.assertAsTokens([(True, '')], u"''")
307
308 def test_unicode_chars(self):
309 self.assertAsTokens([(False, u'f\xb5\xee'), (False, u'\u1234\u3456')],
310 u'f\xb5\xee \u1234\u3456')
311
312 def test_newline_in_quoted_section(self):
313 self.assertAsTokens([(True, u'foo\nbar\nbaz\n')], u'"foo\nbar\nbaz\n"')
314 self.assertAsTokens([(True, u'foo\nbar\nbaz\n')], u"'foo\nbar\nbaz\n'")
315
316 def test_escape_chars(self):
317 self.assertAsTokens([(False, u'foo\\bar')], u'foo\\bar')
318
319 def test_escape_quote(self):
320 self.assertAsTokens([(True, u'foo"bar')], u'"foo\\"bar"')
321 self.assertAsTokens([(True, u'foo\\"bar')], u"'foo\\\"bar'")
322
323 def test_double_escape(self):
324 self.assertAsTokens([(True, u'foo\\bar')], u'"foo\\\\bar"')
325 self.assertAsTokens([(True, u'foo\\\\bar')], u"'foo\\\\bar'")
326 self.assertAsTokens([(False, u'foo\\\\bar')], u"foo\\\\bar")
327
328
329class Test_CommandLineToArgv(tests.TestCaseInTempDir):
330
331 def assertCommandLine(self, expected, line):
332 self.assertEqual(expected, win32utils._command_line_to_argv(line))
333
334 def test_glob_paths(self):
335 self.build_tree(['a/', 'a/b.c', 'a/c.c', 'a/c.h'])
336 self.assertCommandLine([u'a/b.c', u'a/c.c'], 'a/*.c')
337 self.build_tree(['b/', 'b/b.c', 'b/d.c', 'b/d.h'])
338 self.assertCommandLine([u'a/b.c', u'b/b.c'], '*/b.c')
339 self.assertCommandLine([u'a/b.c', u'a/c.c', u'b/b.c', u'b/d.c'],
340 '*/*.c')
341 # Bash style, just pass through the argument if nothing matches
342 self.assertCommandLine([u'*/*.qqq'], '*/*.qqq')
343
344 def test_quoted_globs(self):
345 self.build_tree(['a/', 'a/b.c', 'a/c.c', 'a/c.h'])
346 self.assertCommandLine([u'a/*.c'], '"a/*.c"')
347 self.assertCommandLine([u'a/*.c'], "'a/*.c'")
348
349 def test_slashes_changed(self):
350 self.assertCommandLine([u'a/*.c'], '"a\\*.c"')
351 # Expands the glob, but nothing matches
352 self.assertCommandLine([u'a/*.c'], 'a\\*.c')
353 self.assertCommandLine([u'a/foo.c'], 'a\\foo.c')
264354
=== modified file 'bzrlib/win32utils.py'
--- bzrlib/win32utils.py 2009-07-08 14:37:25 +0000
+++ bzrlib/win32utils.py 2009-11-04 22:45:22 +0000
@@ -19,8 +19,12 @@
19Only one dependency: ctypes should be installed.19Only one dependency: ctypes should be installed.
20"""20"""
2121
22import glob
22import os23import os
24import re
25import shlex
23import struct26import struct
27import StringIO
24import sys28import sys
2529
2630
@@ -422,6 +426,26 @@
422426
423427
424428
429def glob_one(possible_glob):
430 """Same as glob.glob().
431
432 work around bugs in glob.glob()
433 - Python bug #1001604 ("glob doesn't return unicode with ...")
434 - failing expansion for */* with non-iso-8859-* chars
435 """
436 corrected_glob, corrected = _ensure_with_dir(possible_glob)
437 glob_files = glob.glob(corrected_glob)
438
439 if not glob_files:
440 # special case to let the normal code path handle
441 # files that do not exist, etc.
442 glob_files = [possible_glob]
443 elif corrected:
444 glob_files = [_undo_ensure_with_dir(elem, corrected)
445 for elem in glob_files]
446 return [elem.replace(u'\\', u'/') for elem in glob_files]
447
448
425def glob_expand(file_list):449def glob_expand(file_list):
426 """Replacement for glob expansion by the shell.450 """Replacement for glob expansion by the shell.
427451
@@ -435,25 +459,10 @@
435 """459 """
436 if not file_list:460 if not file_list:
437 return []461 return []
438 import glob
439 expanded_file_list = []462 expanded_file_list = []
440 for possible_glob in file_list:463 for possible_glob in file_list:
441 # work around bugs in glob.glob()464 expanded_file_list.extend(glob_one(possible_glob))
442 # - Python bug #1001604 ("glob doesn't return unicode with ...")465 return expanded_file_list
443 # - failing expansion for */* with non-iso-8859-* chars
444 possible_glob, corrected = _ensure_with_dir(possible_glob)
445 glob_files = glob.glob(possible_glob)
446
447 if glob_files == []:
448 # special case to let the normal code path handle
449 # files that do not exists
450 expanded_file_list.append(
451 _undo_ensure_with_dir(possible_glob, corrected))
452 else:
453 glob_files = [_undo_ensure_with_dir(elem, corrected) for elem in glob_files]
454 expanded_file_list += glob_files
455
456 return [elem.replace(u'\\', u'/') for elem in expanded_file_list]
457466
458467
459def get_app_path(appname):468def get_app_path(appname):
@@ -511,6 +520,124 @@
511 trace.mutter('Unable to set hidden attribute on %r: %s', path, e)520 trace.mutter('Unable to set hidden attribute on %r: %s', path, e)
512521
513522
523
524class UnicodeShlex(object):
525 """This is a very simplified version of shlex.shlex.
526
527 The main change is that it supports non-ascii input streams. The internal
528 structure is quite simplified relative to shlex.shlex, since we aren't
529 trying to handle multiple input streams, etc. In fact, we don't use a
530 file-like api either.
531 """
532
533 def __init__(self, uni_string):
534 self._input = uni_string
535 self._input_iter = iter(self._input)
536 self._whitespace_match = re.compile(u'\s').match
537 self._word_match = re.compile(u'\S').match
538 self._quote_chars = u'\'"'
539 # self._quote_match = re.compile(u'[\'"]').match
540 self._escape_match = lambda x: None # Never matches
541 self._escape = '\\'
542 # State can be
543 # ' ' - after whitespace, starting a new token
544 # 'a' - after text, currently working on a token
545 # '"' - after ", currently in a "-delimited quoted section
546 # "'" - after ', currently in a '-delimited quotod section
547 # "\" - after '\', checking the next char
548 self._state = ' '
549 self._token = [] # Current token being parsed
550
551 def _get_token(self):
552 # Were there quote chars as part of this token?
553 quoted = False
554 quoted_state = None
555 for nextchar in self._input_iter:
556 if self._state == ' ':
557 if self._whitespace_match(nextchar):
558 # if self._token: return token
559 continue
560 elif nextchar in self._quote_chars:
561 self._state = nextchar # quoted state
562 elif self._word_match(nextchar):
563 self._token.append(nextchar)
564 self._state = 'a'
565 else:
566 raise AssertionError('wtttf?')
567 elif self._state in self._quote_chars:
568 quoted = True
569 if nextchar == self._state: # End of quote
570 self._state = 'a' # posix allows 'foo'bar to translate to
571 # foobar
572 elif self._state == '"' and nextchar == self._escape:
573 quoted_state = self._state
574 self._state = nextchar
575 else:
576 self._token.append(nextchar)
577 elif self._state == self._escape:
578 if nextchar == '\\':
579 self._token.append('\\')
580 elif nextchar == '"':
581 self._token.append(nextchar)
582 else:
583 self._token.append('\\' + nextchar)
584 self._state = quoted_state
585 elif self._state == 'a':
586 if self._whitespace_match(nextchar):
587 if self._token:
588 break # emit this token
589 else:
590 continue # no token to emit
591 elif nextchar in self._quote_chars:
592 # Start a new quoted section
593 self._state = nextchar
594 # escape?
595 elif (self._word_match(nextchar)
596 or nextchar in self._quote_chars
597 # or whitespace_split?
598 ):
599 self._token.append(nextchar)
600 else:
601 raise AssertionError('state == "a", char: %r'
602 % (nextchar,))
603 else:
604 raise AssertionError('unknown state: %r' % (self._state,))
605 result = ''.join(self._token)
606 self._token = []
607 if not quoted and result == '':
608 result = None
609 return quoted, result
610
611 def __iter__(self):
612 return self
613
614 def next(self):
615 quoted, token = self._get_token()
616 if token is None:
617 raise StopIteration
618 return quoted, token
619
620
621def _command_line_to_argv(command_line):
622 """Convert a Unicode command line into a set of argv arguments.
623
624 This does wildcard expansion, etc. It is intended to make wildcards act
625 closer to how they work in posix shells, versus how they work by default on
626 Windows.
627 """
628 s = UnicodeShlex(command_line)
629 # Now that we've split the content, expand globs
630 # TODO: Use 'globbing' instead of 'glob.glob', this gives us stuff like
631 # '**/' style globs
632 args = []
633 for is_quoted, arg in s:
634 if is_quoted or not glob.has_magic(arg):
635 args.append(arg.replace(u'\\', u'/'))
636 else:
637 args.extend(glob_one(arg))
638 return args
639
640
514if has_ctypes and winver != 'Windows 98':641if has_ctypes and winver != 'Windows 98':
515 def get_unicode_argv():642 def get_unicode_argv():
516 LPCWSTR = ctypes.c_wchar_p643 LPCWSTR = ctypes.c_wchar_p
@@ -520,21 +647,19 @@
520 GetCommandLine = prototype(("GetCommandLineW",647 GetCommandLine = prototype(("GetCommandLineW",
521 ctypes.windll.kernel32))648 ctypes.windll.kernel32))
522 prototype = ctypes.WINFUNCTYPE(POINTER(LPCWSTR), LPCWSTR, POINTER(INT))649 prototype = ctypes.WINFUNCTYPE(POINTER(LPCWSTR), LPCWSTR, POINTER(INT))
523 CommandLineToArgv = prototype(("CommandLineToArgvW",650 command_line = GetCommandLine()
524 ctypes.windll.shell32))
525 c = INT(0)
526 pargv = CommandLineToArgv(GetCommandLine(), ctypes.byref(c))
527 # Skip the first argument, since we only care about parameters651 # Skip the first argument, since we only care about parameters
528 argv = [pargv[i] for i in range(1, c.value)]652 argv = _command_line_to_argv(GetCommandLine())[1:]
529 if getattr(sys, 'frozen', None) is None:653 if getattr(sys, 'frozen', None) is None:
530 # Invoked via 'python.exe' which takes the form:654 # Invoked via 'python.exe' which takes the form:
531 # python.exe [PYTHON_OPTIONS] C:\Path\bzr [BZR_OPTIONS]655 # python.exe [PYTHON_OPTIONS] C:\Path\bzr [BZR_OPTIONS]
532 # we need to get only BZR_OPTIONS part,656 # we need to get only BZR_OPTIONS part,
533 # so let's using sys.argv[1:] as reference to get the tail657 # We already removed 'python.exe' so we remove everything up to and
534 # of unicode argv658 # including the first non-option ('-') argument.
535 tail_len = len(sys.argv[1:])659 for idx in xrange(len(argv)):
536 ix = len(argv) - tail_len660 if argv[idx][:1] != '-':
537 argv = argv[ix:]661 break
662 argv = argv[idx+1:]
538 return argv663 return argv
539else:664else:
540 get_unicode_argv = None665 get_unicode_argv = None