Merge lp:~bzr/bzr/faster-ls into lp:~bzr/bzr/trunk-old
- faster-ls
- Merge into trunk-old
Status: | Merged |
---|---|
Approved by: | Martin Pool |
Approved revision: | no longer in the source branch. |
Merged at revision: | not available |
Proposed branch: | lp:~bzr/bzr/faster-ls |
Merge into: | lp:~bzr/bzr/trunk-old |
Diff against target: | 124 lines |
To merge this branch: | bzr merge lp:~bzr/bzr/faster-ls |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Martin Pool | Approve | ||
Review via email: mp+7539@code.launchpad.net |
This proposal supersedes a proposal from 2009-05-18.
Commit message
Description of the change
Ian Clatworthy (ian-clatworthy) wrote : Posted in a previous version of this proposal | # |
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal | # |
I would *like* to see 'ls' work on top of "iter_entries_
As near as I can tell, the only thing that list_files gives is "kind character" (though perhaps it also does more with unversioned files?)
I'd like to know if you think that is possible, or if it is too different of an api.
I'm not sure about having a file named "ls.py", maybe it's okay. I don't really have a much better name for it.
+# Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009 Canonical Ltd
I'm pretty sure 'ls.py' is only (C) 2009 :)
+def ls(tree, outf, from_dir=None, recursive=False, kind=None, unknown=False,
+ versioned=False, ignored=False, verbose=False, null=False, show_ids=False,
+ prefix=None, from_root=False):
^- When I see a function with that many parameters, I wonder if it wouldn't be better implemented as some sort of class... Not saying that is true here, but it seems like it might.
I think we should also think about whether it would be possible to add "bzr ls --added" more easily with this change or not. As near as I can tell, it would be ~ the same effort, so at least it isn't worse.
The main difficulty we've had is that "Versioned" is a state of a single tree, but "Added" is a diff between two trees. So you need an api that can give you the difference among 3 trees:
1) basis_tree
2) current working tree (versioned info)
3) filesystem
Since that gives you "Newly Versioned" "Renamed" "Ignored" etc.
As for the blackbox changes... You changed everything to be '--recurse' to '--no-recurse' (since the default changed), but you didn't add back any "--recurse" tests.
Also, is this the correct behavior:
I'm thinking it probably is, but I'll note that it is a change in behavior. 'bzr ls --from-root' used to be equivalent to 'bzr ls ../../' until you got to the root. We now don't have a way from a subdirectory to say "ls everything" without giving the full path to the root.
While doing "bzr ls --from-root ." would be an easy way to specify the opposite.
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal | # |
Sorry, I hit 'Save' before I was done.
=== modified file 'bzrlib/
I really wonder if this should be in "bzrlib/
I noticed you didn't add "bzrlib/
So overall, I think it is worth landing something like this.
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal | # |
Just a couple of comments.
ls.py doesn't seem to know if its UI layer or logic layer.
I'd personally rather have a clear logic layer - something that takes
the needed parameters and provides an iterator over [semi-]structured
data.
then the command can be
for thing in [dols]:
self.
This would make this useful for GUI's as well.
I think a good home for this would be tree.py, rather than a new python
module, as it is very tree specific code (in fact, in refactoring terms,
the first parameter 'tree' stands out like a ForeignMethod).
-Rob
Martin Pool (mbp) wrote : Posted in a previous version of this proposal | # |
> This patch makes ls faster. In particular, ls on a historical revision is
> faster: ls -r-1 on the OpenOffice trunk drops from 62 seconds to 1.3 seconds.
> Plain ls on a subdirectory gets faster as well, dropping from 3-4 seconds to
> around 1 second.
Just a meta-comment: also explaining in the cover letter what you changed to do that will help with reviews. People may have comments only on the approach without reading the patches and also it helps verify the patch does what was intended.
Ian Clatworthy (ian-clatworthy) wrote : Posted in a previous version of this proposal | # |
> Just a couple of comments.
>
> ls.py doesn't seem to know if its UI layer or logic layer.
>
> I'd personally rather have a clear logic layer - something that takes
> the needed parameters and provides an iterator over [semi-]structured
> data.
>
> then the command can be
> for thing in [dols]:
> self.output.
>
> This would make this useful for GUI's as well.
>
> I think a good home for this would be tree.py, rather than a new python
> module, as it is very tree specific code (in fact, in refactoring terms,
> the first parameter 'tree' stands out like a ForeignMethod).
All good points, though the use case for a GUI ls client isn't really there. (The functionality tends to be provided by far more powerful GUIs like qbrowse and Olive). I'll resubmit this in a less ambitious form, just focusing on the logic bug and performance issue for now.
Ian Clatworthy (ian-clatworthy) wrote : | # |
This patch does two things:
* it fixes 'ls DIR --from-root' so that only things in DIR are shown, not everything
* it fixes ls performance by leveraging recent enhancements to iter_entries_
On OpenOffice.org, here are the speed-ups:
* ls: 2.4 to 1.1 seconds
* ls -r-1: 54.3 to 1.1 seconds
Unlike the previous submission, this one doesn't bother moving the code in builtins.py out into a separate module/method. Right now, YAGNI. Instead, it explicitly tries to minimise the differences to bzr.dev to ease review. (As an example, there's an 'if True' block currently included so that the diff for reviewers makes it clearer as to exactly what got changed. I plan to remove that on landing, of course.)
In summary, I think this is a step forward and is worth landing. It doesn't go as far as John would like and move to an iter_entries_
Martin Pool (mbp) wrote : | # |
This looks reasonable to me; you might mention the changed (fixed) behaviour of --from-root in the NEWS too. It's a useful step.
Preview Diff
1 | === modified file 'NEWS' |
2 | --- NEWS 2009-06-17 10:04:37 +0000 |
3 | +++ NEWS 2009-06-17 20:35:12 +0000 |
4 | @@ -55,6 +55,10 @@ |
5 | Improvements |
6 | ************ |
7 | |
8 | +``bzr ls`` is now faster. On OpenOffice.org, the time drops from 2.4 |
9 | + to 1.1 seconds. The improvement for ``bzr ls -r-1`` is more |
10 | + substantial dropping from 54.3 to 1.1 seconds. (Ian Clatworthy) |
11 | + |
12 | * Resolving a revno to a revision id on a branch accessed via ``bzr://`` |
13 | or ``bzr+ssh://`` is now much faster and involves no VFS operations. |
14 | This speeds up commands like ``bzr pull -r 123``. (Andrew Bennetts) |
15 | |
16 | === modified file 'bzrlib/builtins.py' |
17 | --- bzrlib/builtins.py 2009-06-17 08:08:36 +0000 |
18 | +++ bzrlib/builtins.py 2009-06-17 20:35:13 +0000 |
19 | @@ -2395,19 +2395,22 @@ |
20 | |
21 | if path is None: |
22 | fs_path = '.' |
23 | - prefix = '' |
24 | else: |
25 | if from_root: |
26 | raise errors.BzrCommandError('cannot specify both --from-root' |
27 | ' and PATH') |
28 | fs_path = path |
29 | - prefix = path |
30 | tree, branch, relpath = bzrdir.BzrDir.open_containing_tree_or_branch( |
31 | fs_path) |
32 | + |
33 | + # Calculate the prefix to use |
34 | + prefix = None |
35 | if from_root: |
36 | - relpath = u'' |
37 | - elif relpath: |
38 | - relpath += '/' |
39 | + if relpath: |
40 | + prefix = relpath + '/' |
41 | + elif fs_path != '.': |
42 | + prefix = fs_path + '/' |
43 | + |
44 | if revision is not None or tree is None: |
45 | tree = _get_one_revision_tree('ls', revision, branch=branch) |
46 | |
47 | @@ -2421,21 +2424,27 @@ |
48 | |
49 | tree.lock_read() |
50 | try: |
51 | - for fp, fc, fkind, fid, entry in tree.list_files(include_root=False): |
52 | - if fp.startswith(relpath): |
53 | - rp = fp[len(relpath):] |
54 | - fp = osutils.pathjoin(prefix, rp) |
55 | - if not recursive and '/' in rp: |
56 | - continue |
57 | + for fp, fc, fkind, fid, entry in tree.list_files(include_root=False, |
58 | + from_dir=relpath, recursive=recursive): |
59 | + if True: |
60 | + # Apply additional masking |
61 | if not all and not selection[fc]: |
62 | continue |
63 | if kind is not None and fkind != kind: |
64 | continue |
65 | if apply_view: |
66 | try: |
67 | - views.check_path_in_view(tree, fp) |
68 | + if relpath: |
69 | + fullpath = osutils.pathjoin(relpath, fp) |
70 | + else: |
71 | + fullpath = fp |
72 | + views.check_path_in_view(tree, fullpath) |
73 | except errors.FileOutsideView: |
74 | continue |
75 | + |
76 | + # Output the entry |
77 | + if prefix: |
78 | + fp = osutils.pathjoin(prefix, fp) |
79 | kindch = entry.kind_character() |
80 | outstring = fp + kindch |
81 | ui.ui_factory.clear_term() |
82 | @@ -2452,11 +2461,11 @@ |
83 | self.outf.write('\0') |
84 | self.outf.flush() |
85 | else: |
86 | - if fid is not None: |
87 | - my_id = fid |
88 | - else: |
89 | - my_id = '' |
90 | if show_ids: |
91 | + if fid is not None: |
92 | + my_id = fid |
93 | + else: |
94 | + my_id = '' |
95 | self.outf.write('%-50s %s\n' % (outstring, my_id)) |
96 | else: |
97 | self.outf.write(outstring + '\n') |
98 | |
99 | === modified file 'bzrlib/tests/blackbox/test_ls.py' |
100 | --- bzrlib/tests/blackbox/test_ls.py 2009-05-08 13:39:32 +0000 |
101 | +++ bzrlib/tests/blackbox/test_ls.py 2009-06-17 20:35:13 +0000 |
102 | @@ -129,19 +129,11 @@ |
103 | self.ls_equals('b\n') |
104 | self.ls_equals('b\0' |
105 | , '--null') |
106 | - self.ls_equals('.bzrignore\n' |
107 | - 'a\n' |
108 | - 'subdir/\n' |
109 | - 'subdir/b\n' |
110 | + self.ls_equals('subdir/b\n' |
111 | , '--from-root') |
112 | - self.ls_equals('.bzrignore\0' |
113 | - 'a\0' |
114 | - 'subdir\0' |
115 | - 'subdir/b\0' |
116 | + self.ls_equals('subdir/b\0' |
117 | , '--from-root --null') |
118 | - self.ls_equals('.bzrignore\n' |
119 | - 'a\n' |
120 | - 'subdir/\n' |
121 | + self.ls_equals('subdir/b\n' |
122 | , '--from-root', recursive=False) |
123 | |
124 | def test_ls_path(self): |
This patch makes ls faster. In particular, ls on a historical revision is faster: ls -r-1 on the OpenOffice trunk drops from 62 seconds to 1.3 seconds. Plain ls on a subdirectory gets faster as well, dropping from 3-4 seconds to around 1 second.
As well as improving performance, I've moved most of the ls logic out of builtins.py to ls.py. New unit tests are needed for this new public API. Before I add those, I'd like a reviewer to confirm:
1. the new API looks right
2. my changes to the blackbox tests are correct.
In the latter case, I believe that 'ls dir --from-root' ought to show just the stuff in dir, not everything (as the current test expects).