Merge lp:~bzr/bzr/faster-ls into lp:~bzr/bzr/trunk-old

Proposed by Ian Clatworthy
Status: Merged
Approved by: Martin Pool
Approved revision: no longer in the source branch.
Merged at revision: not available
Proposed branch: lp:~bzr/bzr/faster-ls
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 124 lines
To merge this branch: bzr merge lp:~bzr/bzr/faster-ls
Reviewer Review Type Date Requested Status
Martin Pool Approve
Review via email: mp+7539@code.launchpad.net

This proposal supersedes a proposal from 2009-05-18.

To post a comment you must log in.
Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote : Posted in a previous version of this proposal

This patch makes ls faster. In particular, ls on a historical revision is faster: ls -r-1 on the OpenOffice trunk drops from 62 seconds to 1.3 seconds. Plain ls on a subdirectory gets faster as well, dropping from 3-4 seconds to around 1 second.

As well as improving performance, I've moved most of the ls logic out of builtins.py to ls.py. New unit tests are needed for this new public API. Before I add those, I'd like a reviewer to confirm:

1. the new API looks right
2. my changes to the blackbox tests are correct.

In the latter case, I believe that 'ls dir --from-root' ought to show just the stuff in dir, not everything (as the current test expects).

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

I would *like* to see 'ls' work on top of "iter_entries_by_dir()" rather than list_files().

As near as I can tell, the only thing that list_files gives is "kind character" (though perhaps it also does more with unversioned files?)

I'd like to know if you think that is possible, or if it is too different of an api.

I'm not sure about having a file named "ls.py", maybe it's okay. I don't really have a much better name for it.

+# Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009 Canonical Ltd

I'm pretty sure 'ls.py' is only (C) 2009 :)

+def ls(tree, outf, from_dir=None, recursive=False, kind=None, unknown=False,
+ versioned=False, ignored=False, verbose=False, null=False, show_ids=False,
+ prefix=None, from_root=False):

^- When I see a function with that many parameters, I wonder if it wouldn't be better implemented as some sort of class... Not saying that is true here, but it seems like it might.

I think we should also think about whether it would be possible to add "bzr ls --added" more easily with this change or not. As near as I can tell, it would be ~ the same effort, so at least it isn't worse.

The main difficulty we've had is that "Versioned" is a state of a single tree, but "Added" is a diff between two trees. So you need an api that can give you the difference among 3 trees:
1) basis_tree
2) current working tree (versioned info)
3) filesystem

Since that gives you "Newly Versioned" "Renamed" "Ignored" etc.

As for the blackbox changes... You changed everything to be '--recurse' to '--no-recurse' (since the default changed), but you didn't add back any "--recurse" tests.

Also, is this the correct behavior:
        self.ls_equals('subdir/b\n'
                       , '--from-root')

I'm thinking it probably is, but I'll note that it is a change in behavior. 'bzr ls --from-root' used to be equivalent to 'bzr ls ../../' until you got to the root. We now don't have a way from a subdirectory to say "ls everything" without giving the full path to the root.

While doing "bzr ls --from-root ." would be an easy way to specify the opposite.

review: Needs Information
Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

Sorry, I hit 'Save' before I was done.

=== modified file 'bzrlib/transform.py'

I really wonder if this should be in "bzrlib/preview_tree.py". Not that you have to do that.

I noticed you didn't add "bzrlib/tests/test_ls.py". Since the code is now refactored, it would be nice if most of the 'blackbox/tests_ls.py' was actually done as whitebox tests. Again, not required, but if you at least added the test file and some very basic tests, it would be a place that could be obviously expanded later.

So overall, I think it is worth landing something like this.

Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

Just a couple of comments.

ls.py doesn't seem to know if its UI layer or logic layer.

I'd personally rather have a clear logic layer - something that takes
the needed parameters and provides an iterator over [semi-]structured
data.

then the command can be
for thing in [dols]:
    self.output.write("%s%s\n" % (thing[0], thing[1]))

This would make this useful for GUI's as well.

I think a good home for this would be tree.py, rather than a new python
module, as it is very tree specific code (in fact, in refactoring terms,
the first parameter 'tree' stands out like a ForeignMethod).

-Rob

Revision history for this message
Martin Pool (mbp) wrote : Posted in a previous version of this proposal

> This patch makes ls faster. In particular, ls on a historical revision is
> faster: ls -r-1 on the OpenOffice trunk drops from 62 seconds to 1.3 seconds.
> Plain ls on a subdirectory gets faster as well, dropping from 3-4 seconds to
> around 1 second.

Just a meta-comment: also explaining in the cover letter what you changed to do that will help with reviews. People may have comments only on the approach without reading the patches and also it helps verify the patch does what was intended.

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote : Posted in a previous version of this proposal

> Just a couple of comments.
>
> ls.py doesn't seem to know if its UI layer or logic layer.
>
> I'd personally rather have a clear logic layer - something that takes
> the needed parameters and provides an iterator over [semi-]structured
> data.
>
> then the command can be
> for thing in [dols]:
> self.output.write("%s%s\n" % (thing[0], thing[1]))
>
> This would make this useful for GUI's as well.
>
> I think a good home for this would be tree.py, rather than a new python
> module, as it is very tree specific code (in fact, in refactoring terms,
> the first parameter 'tree' stands out like a ForeignMethod).

All good points, though the use case for a GUI ls client isn't really there. (The functionality tends to be provided by far more powerful GUIs like qbrowse and Olive). I'll resubmit this in a less ambitious form, just focusing on the logic bug and performance issue for now.

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

This patch does two things:

* it fixes 'ls DIR --from-root' so that only things in DIR are shown, not everything

* it fixes ls performance by leveraging recent enhancements to iter_entries_by_dir() and list_files().

On OpenOffice.org, here are the speed-ups:

* ls: 2.4 to 1.1 seconds

* ls -r-1: 54.3 to 1.1 seconds

Unlike the previous submission, this one doesn't bother moving the code in builtins.py out into a separate module/method. Right now, YAGNI. Instead, it explicitly tries to minimise the differences to bzr.dev to ease review. (As an example, there's an 'if True' block currently included so that the diff for reviewers makes it clearer as to exactly what got changed. I plan to remove that on landing, of course.)

In summary, I think this is a step forward and is worth landing. It doesn't go as far as John would like and move to an iter_entries_by_dir() solution instead of the existing list_files() one. But frankly, ls just isn't that important as a command to *me* (vs other priorities), so I don't plan to spend any more time on it beyond the bare minimum above.

Revision history for this message
Martin Pool (mbp) wrote :

This looks reasonable to me; you might mention the changed (fixed) behaviour of --from-root in the NEWS too. It's a useful step.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2009-06-17 10:04:37 +0000
3+++ NEWS 2009-06-17 20:35:12 +0000
4@@ -55,6 +55,10 @@
5 Improvements
6 ************
7
8+``bzr ls`` is now faster. On OpenOffice.org, the time drops from 2.4
9+ to 1.1 seconds. The improvement for ``bzr ls -r-1`` is more
10+ substantial dropping from 54.3 to 1.1 seconds. (Ian Clatworthy)
11+
12 * Resolving a revno to a revision id on a branch accessed via ``bzr://``
13 or ``bzr+ssh://`` is now much faster and involves no VFS operations.
14 This speeds up commands like ``bzr pull -r 123``. (Andrew Bennetts)
15
16=== modified file 'bzrlib/builtins.py'
17--- bzrlib/builtins.py 2009-06-17 08:08:36 +0000
18+++ bzrlib/builtins.py 2009-06-17 20:35:13 +0000
19@@ -2395,19 +2395,22 @@
20
21 if path is None:
22 fs_path = '.'
23- prefix = ''
24 else:
25 if from_root:
26 raise errors.BzrCommandError('cannot specify both --from-root'
27 ' and PATH')
28 fs_path = path
29- prefix = path
30 tree, branch, relpath = bzrdir.BzrDir.open_containing_tree_or_branch(
31 fs_path)
32+
33+ # Calculate the prefix to use
34+ prefix = None
35 if from_root:
36- relpath = u''
37- elif relpath:
38- relpath += '/'
39+ if relpath:
40+ prefix = relpath + '/'
41+ elif fs_path != '.':
42+ prefix = fs_path + '/'
43+
44 if revision is not None or tree is None:
45 tree = _get_one_revision_tree('ls', revision, branch=branch)
46
47@@ -2421,21 +2424,27 @@
48
49 tree.lock_read()
50 try:
51- for fp, fc, fkind, fid, entry in tree.list_files(include_root=False):
52- if fp.startswith(relpath):
53- rp = fp[len(relpath):]
54- fp = osutils.pathjoin(prefix, rp)
55- if not recursive and '/' in rp:
56- continue
57+ for fp, fc, fkind, fid, entry in tree.list_files(include_root=False,
58+ from_dir=relpath, recursive=recursive):
59+ if True:
60+ # Apply additional masking
61 if not all and not selection[fc]:
62 continue
63 if kind is not None and fkind != kind:
64 continue
65 if apply_view:
66 try:
67- views.check_path_in_view(tree, fp)
68+ if relpath:
69+ fullpath = osutils.pathjoin(relpath, fp)
70+ else:
71+ fullpath = fp
72+ views.check_path_in_view(tree, fullpath)
73 except errors.FileOutsideView:
74 continue
75+
76+ # Output the entry
77+ if prefix:
78+ fp = osutils.pathjoin(prefix, fp)
79 kindch = entry.kind_character()
80 outstring = fp + kindch
81 ui.ui_factory.clear_term()
82@@ -2452,11 +2461,11 @@
83 self.outf.write('\0')
84 self.outf.flush()
85 else:
86- if fid is not None:
87- my_id = fid
88- else:
89- my_id = ''
90 if show_ids:
91+ if fid is not None:
92+ my_id = fid
93+ else:
94+ my_id = ''
95 self.outf.write('%-50s %s\n' % (outstring, my_id))
96 else:
97 self.outf.write(outstring + '\n')
98
99=== modified file 'bzrlib/tests/blackbox/test_ls.py'
100--- bzrlib/tests/blackbox/test_ls.py 2009-05-08 13:39:32 +0000
101+++ bzrlib/tests/blackbox/test_ls.py 2009-06-17 20:35:13 +0000
102@@ -129,19 +129,11 @@
103 self.ls_equals('b\n')
104 self.ls_equals('b\0'
105 , '--null')
106- self.ls_equals('.bzrignore\n'
107- 'a\n'
108- 'subdir/\n'
109- 'subdir/b\n'
110+ self.ls_equals('subdir/b\n'
111 , '--from-root')
112- self.ls_equals('.bzrignore\0'
113- 'a\0'
114- 'subdir\0'
115- 'subdir/b\0'
116+ self.ls_equals('subdir/b\0'
117 , '--from-root --null')
118- self.ls_equals('.bzrignore\n'
119- 'a\n'
120- 'subdir/\n'
121+ self.ls_equals('subdir/b\n'
122 , '--from-root', recursive=False)
123
124 def test_ls_path(self):