Merge lp:~bzr/bzr/faster-ls into lp:~bzr/bzr/trunk-old

Proposed by Ian Clatworthy
Status: Merged
Approved by: Martin Pool
Approved revision: no longer in the source branch.
Merged at revision: not available
Proposed branch: lp:~bzr/bzr/faster-ls
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 124 lines
To merge this branch: bzr merge lp:~bzr/bzr/faster-ls
Reviewer Review Type Date Requested Status
Martin Pool Approve
Review via email: mp+7539@code.launchpad.net

This proposal supersedes a proposal from 2009-05-18.

To post a comment you must log in.
Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote : Posted in a previous version of this proposal

This patch makes ls faster. In particular, ls on a historical revision is faster: ls -r-1 on the OpenOffice trunk drops from 62 seconds to 1.3 seconds. Plain ls on a subdirectory gets faster as well, dropping from 3-4 seconds to around 1 second.

As well as improving performance, I've moved most of the ls logic out of builtins.py to ls.py. New unit tests are needed for this new public API. Before I add those, I'd like a reviewer to confirm:

1. the new API looks right
2. my changes to the blackbox tests are correct.

In the latter case, I believe that 'ls dir --from-root' ought to show just the stuff in dir, not everything (as the current test expects).

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

I would *like* to see 'ls' work on top of "iter_entries_by_dir()" rather than list_files().

As near as I can tell, the only thing that list_files gives is "kind character" (though perhaps it also does more with unversioned files?)

I'd like to know if you think that is possible, or if it is too different of an api.

I'm not sure about having a file named "ls.py", maybe it's okay. I don't really have a much better name for it.

+# Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009 Canonical Ltd

I'm pretty sure 'ls.py' is only (C) 2009 :)

+def ls(tree, outf, from_dir=None, recursive=False, kind=None, unknown=False,
+ versioned=False, ignored=False, verbose=False, null=False, show_ids=False,
+ prefix=None, from_root=False):

^- When I see a function with that many parameters, I wonder if it wouldn't be better implemented as some sort of class... Not saying that is true here, but it seems like it might.

I think we should also think about whether it would be possible to add "bzr ls --added" more easily with this change or not. As near as I can tell, it would be ~ the same effort, so at least it isn't worse.

The main difficulty we've had is that "Versioned" is a state of a single tree, but "Added" is a diff between two trees. So you need an api that can give you the difference among 3 trees:
1) basis_tree
2) current working tree (versioned info)
3) filesystem

Since that gives you "Newly Versioned" "Renamed" "Ignored" etc.

As for the blackbox changes... You changed everything to be '--recurse' to '--no-recurse' (since the default changed), but you didn't add back any "--recurse" tests.

Also, is this the correct behavior:
        self.ls_equals('subdir/b\n'
                       , '--from-root')

I'm thinking it probably is, but I'll note that it is a change in behavior. 'bzr ls --from-root' used to be equivalent to 'bzr ls ../../' until you got to the root. We now don't have a way from a subdirectory to say "ls everything" without giving the full path to the root.

While doing "bzr ls --from-root ." would be an easy way to specify the opposite.

review: Needs Information
Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

Sorry, I hit 'Save' before I was done.

=== modified file 'bzrlib/transform.py'

I really wonder if this should be in "bzrlib/preview_tree.py". Not that you have to do that.

I noticed you didn't add "bzrlib/tests/test_ls.py". Since the code is now refactored, it would be nice if most of the 'blackbox/tests_ls.py' was actually done as whitebox tests. Again, not required, but if you at least added the test file and some very basic tests, it would be a place that could be obviously expanded later.

So overall, I think it is worth landing something like this.

Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

Just a couple of comments.

ls.py doesn't seem to know if its UI layer or logic layer.

I'd personally rather have a clear logic layer - something that takes
the needed parameters and provides an iterator over [semi-]structured
data.

then the command can be
for thing in [dols]:
    self.output.write("%s%s\n" % (thing[0], thing[1]))

This would make this useful for GUI's as well.

I think a good home for this would be tree.py, rather than a new python
module, as it is very tree specific code (in fact, in refactoring terms,
the first parameter 'tree' stands out like a ForeignMethod).

-Rob

Revision history for this message
Martin Pool (mbp) wrote : Posted in a previous version of this proposal

> This patch makes ls faster. In particular, ls on a historical revision is
> faster: ls -r-1 on the OpenOffice trunk drops from 62 seconds to 1.3 seconds.
> Plain ls on a subdirectory gets faster as well, dropping from 3-4 seconds to
> around 1 second.

Just a meta-comment: also explaining in the cover letter what you changed to do that will help with reviews. People may have comments only on the approach without reading the patches and also it helps verify the patch does what was intended.

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote : Posted in a previous version of this proposal

> Just a couple of comments.
>
> ls.py doesn't seem to know if its UI layer or logic layer.
>
> I'd personally rather have a clear logic layer - something that takes
> the needed parameters and provides an iterator over [semi-]structured
> data.
>
> then the command can be
> for thing in [dols]:
> self.output.write("%s%s\n" % (thing[0], thing[1]))
>
> This would make this useful for GUI's as well.
>
> I think a good home for this would be tree.py, rather than a new python
> module, as it is very tree specific code (in fact, in refactoring terms,
> the first parameter 'tree' stands out like a ForeignMethod).

All good points, though the use case for a GUI ls client isn't really there. (The functionality tends to be provided by far more powerful GUIs like qbrowse and Olive). I'll resubmit this in a less ambitious form, just focusing on the logic bug and performance issue for now.

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

This patch does two things:

* it fixes 'ls DIR --from-root' so that only things in DIR are shown, not everything

* it fixes ls performance by leveraging recent enhancements to iter_entries_by_dir() and list_files().

On OpenOffice.org, here are the speed-ups:

* ls: 2.4 to 1.1 seconds

* ls -r-1: 54.3 to 1.1 seconds

Unlike the previous submission, this one doesn't bother moving the code in builtins.py out into a separate module/method. Right now, YAGNI. Instead, it explicitly tries to minimise the differences to bzr.dev to ease review. (As an example, there's an 'if True' block currently included so that the diff for reviewers makes it clearer as to exactly what got changed. I plan to remove that on landing, of course.)

In summary, I think this is a step forward and is worth landing. It doesn't go as far as John would like and move to an iter_entries_by_dir() solution instead of the existing list_files() one. But frankly, ls just isn't that important as a command to *me* (vs other priorities), so I don't plan to spend any more time on it beyond the bare minimum above.

Revision history for this message
Martin Pool (mbp) wrote :

This looks reasonable to me; you might mention the changed (fixed) behaviour of --from-root in the NEWS too. It's a useful step.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'NEWS'
--- NEWS 2009-06-17 10:04:37 +0000
+++ NEWS 2009-06-17 20:35:12 +0000
@@ -55,6 +55,10 @@
55Improvements55Improvements
56************56************
5757
58``bzr ls`` is now faster. On OpenOffice.org, the time drops from 2.4
59 to 1.1 seconds. The improvement for ``bzr ls -r-1`` is more
60 substantial dropping from 54.3 to 1.1 seconds. (Ian Clatworthy)
61
58* Resolving a revno to a revision id on a branch accessed via ``bzr://``62* Resolving a revno to a revision id on a branch accessed via ``bzr://``
59 or ``bzr+ssh://`` is now much faster and involves no VFS operations.63 or ``bzr+ssh://`` is now much faster and involves no VFS operations.
60 This speeds up commands like ``bzr pull -r 123``. (Andrew Bennetts)64 This speeds up commands like ``bzr pull -r 123``. (Andrew Bennetts)
6165
=== modified file 'bzrlib/builtins.py'
--- bzrlib/builtins.py 2009-06-17 08:08:36 +0000
+++ bzrlib/builtins.py 2009-06-17 20:35:13 +0000
@@ -2395,19 +2395,22 @@
23952395
2396 if path is None:2396 if path is None:
2397 fs_path = '.'2397 fs_path = '.'
2398 prefix = ''
2399 else:2398 else:
2400 if from_root:2399 if from_root:
2401 raise errors.BzrCommandError('cannot specify both --from-root'2400 raise errors.BzrCommandError('cannot specify both --from-root'
2402 ' and PATH')2401 ' and PATH')
2403 fs_path = path2402 fs_path = path
2404 prefix = path
2405 tree, branch, relpath = bzrdir.BzrDir.open_containing_tree_or_branch(2403 tree, branch, relpath = bzrdir.BzrDir.open_containing_tree_or_branch(
2406 fs_path)2404 fs_path)
2405
2406 # Calculate the prefix to use
2407 prefix = None
2407 if from_root:2408 if from_root:
2408 relpath = u''2409 if relpath:
2409 elif relpath:2410 prefix = relpath + '/'
2410 relpath += '/'2411 elif fs_path != '.':
2412 prefix = fs_path + '/'
2413
2411 if revision is not None or tree is None:2414 if revision is not None or tree is None:
2412 tree = _get_one_revision_tree('ls', revision, branch=branch)2415 tree = _get_one_revision_tree('ls', revision, branch=branch)
24132416
@@ -2421,21 +2424,27 @@
24212424
2422 tree.lock_read()2425 tree.lock_read()
2423 try:2426 try:
2424 for fp, fc, fkind, fid, entry in tree.list_files(include_root=False):2427 for fp, fc, fkind, fid, entry in tree.list_files(include_root=False,
2425 if fp.startswith(relpath):2428 from_dir=relpath, recursive=recursive):
2426 rp = fp[len(relpath):]2429 if True:
2427 fp = osutils.pathjoin(prefix, rp)2430 # Apply additional masking
2428 if not recursive and '/' in rp:
2429 continue
2430 if not all and not selection[fc]:2431 if not all and not selection[fc]:
2431 continue2432 continue
2432 if kind is not None and fkind != kind:2433 if kind is not None and fkind != kind:
2433 continue2434 continue
2434 if apply_view:2435 if apply_view:
2435 try:2436 try:
2436 views.check_path_in_view(tree, fp)2437 if relpath:
2438 fullpath = osutils.pathjoin(relpath, fp)
2439 else:
2440 fullpath = fp
2441 views.check_path_in_view(tree, fullpath)
2437 except errors.FileOutsideView:2442 except errors.FileOutsideView:
2438 continue2443 continue
2444
2445 # Output the entry
2446 if prefix:
2447 fp = osutils.pathjoin(prefix, fp)
2439 kindch = entry.kind_character()2448 kindch = entry.kind_character()
2440 outstring = fp + kindch2449 outstring = fp + kindch
2441 ui.ui_factory.clear_term()2450 ui.ui_factory.clear_term()
@@ -2452,11 +2461,11 @@
2452 self.outf.write('\0')2461 self.outf.write('\0')
2453 self.outf.flush()2462 self.outf.flush()
2454 else:2463 else:
2455 if fid is not None:
2456 my_id = fid
2457 else:
2458 my_id = ''
2459 if show_ids:2464 if show_ids:
2465 if fid is not None:
2466 my_id = fid
2467 else:
2468 my_id = ''
2460 self.outf.write('%-50s %s\n' % (outstring, my_id))2469 self.outf.write('%-50s %s\n' % (outstring, my_id))
2461 else:2470 else:
2462 self.outf.write(outstring + '\n')2471 self.outf.write(outstring + '\n')
24632472
=== modified file 'bzrlib/tests/blackbox/test_ls.py'
--- bzrlib/tests/blackbox/test_ls.py 2009-05-08 13:39:32 +0000
+++ bzrlib/tests/blackbox/test_ls.py 2009-06-17 20:35:13 +0000
@@ -129,19 +129,11 @@
129 self.ls_equals('b\n')129 self.ls_equals('b\n')
130 self.ls_equals('b\0'130 self.ls_equals('b\0'
131 , '--null')131 , '--null')
132 self.ls_equals('.bzrignore\n'132 self.ls_equals('subdir/b\n'
133 'a\n'
134 'subdir/\n'
135 'subdir/b\n'
136 , '--from-root')133 , '--from-root')
137 self.ls_equals('.bzrignore\0'134 self.ls_equals('subdir/b\0'
138 'a\0'
139 'subdir\0'
140 'subdir/b\0'
141 , '--from-root --null')135 , '--from-root --null')
142 self.ls_equals('.bzrignore\n'136 self.ls_equals('subdir/b\n'
143 'a\n'
144 'subdir/\n'
145 , '--from-root', recursive=False)137 , '--from-root', recursive=False)
146138
147 def test_ls_path(self):139 def test_ls_path(self):