Merge lp:~vila/bzr/selftest-fixes into lp:~bzr/bzr/trunk-old

Proposed by Vincent Ladeuil
Status: Work in progress
Proposed branch: lp:~vila/bzr/selftest-fixes
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 170 lines (has conflicts)
Text conflict in NEWS
To merge this branch: bzr merge lp:~vila/bzr/selftest-fixes
Reviewer Review Type Date Requested Status
Robert Collins (community) Needs Fixing
bzr-core Pending
Review via email: mp+10364@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Vincent Ladeuil (vila) wrote :

This patch implements a better balancing algorithm for selftest --parallel=fork.

It leads to ~25% smaller elapsed time on 4-core (8 threads) host.
Or said otherwise, running the full test suite goes down from 4 minutes to 3 minutes.

I also added a '-Eslices' that helps understand how the tests are distributed.

Revision history for this message
Robert Collins (lifeless) wrote :

I'm fairly sure that this breaks compatibility with --parallel=ec2.

I think the concept is useful but please consider how to fit in with that plugin, and on windows, where external processes are much more expensive to start.

review: Needs Fixing
Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> "robert" == Robert Collins <email address hidden> writes:

    robert> Review: Needs Fixing

    robert> I'm fairly sure that this breaks compatibility with
    robert> --parallel=ec2.

Right, care to give it a try ? You know it's far easier for you
than for me.

I presume your suspicion comes from the fact that the plugin uses
fork_for_tests right ?

    robert> I think the concept is useful but please consider how
    robert> to fit in with that plugin, and on windows, where
    robert> external processes are much more expensive to start.

Using a process for each slice was simpler than starting one
process by processor and then implementing a way to send the
slices via another socket/pipe, so I tried that approach a bit
and punted.

If you played a bit with -Eslices, you noticed that, indeed, many
processes are spawned (one by slice) and avoiding them can give
even better performances.

Bug given that the test suite is *not* fully running so far on
windows, being able to run even part of it, faster is still a win IMHO.

And even if starting new processes is slow, I doubt it can be slower
to run 8 parallel newly started processes than a single started
once process :).

Hmm, I don't doubt it, I know in fact, the overhead of starting
new processes and handling the results via subunit is high, even
on linux, yet --parallel=fork reduces the *elapsed* time
enormously. Unless that can't reproduced on windows, I far prefer
being able to reduce the elapsed time than delay using -parallel=fork there.

Would you be ok if instead of modifying fork_for_tests I create a
new fork_balanced_for_tests until we can reconcile both ?

Revision history for this message
Robert Collins (lifeless) wrote :

On Mon, 2009-08-24 at 06:18 +0000, Vincent Ladeuil wrote:
>
> robert> I think the concept is useful but please consider how
> robert> to fit in with that plugin, and on windows, where
> robert> external processes are much more expensive to start.
>
> Using a process for each slice was simpler than starting one
> process by processor and then implementing a way to send the
> slices via another socket/pipe, so I tried that approach a bit
> and punted.

A chatty protocol has its own issues indeed, including not being
friendly to xmlrpc apis and so on.

> If you played a bit with -Eslices, you noticed that, indeed, many
> processes are spawned (one by slice) and avoiding them can give
> even better performances.

> Bug given that the test suite is *not* fully running so far on
> windows, being able to run even part of it, faster is still a win
> IMHO.

Sure, I'm not debating that. Its only slightly faster though ;). Running
with --randomize should give approximately the same performance boost I
think.

> And even if starting new processes is slow, I doubt it can be slower
> to run 8 parallel newly started processes than a single started
> once process :).

On EC2 bringing up a warm machine takes about 60 seconds. Bringing up a
fresh machine takes about 8-10 minutes. Each machine then has 8 CPU's at
its disposal.

> Hmm, I don't doubt it, I know in fact, the overhead of starting
> new processes and handling the results via subunit is high, even
> on linux, yet --parallel=fork reduces the *elapsed* time
> enormously. Unless that can't reproduced on windows, I far prefer
> being able to reduce the elapsed time than delay using -parallel=fork
> there.
>
> Would you be ok if instead of modifying fork_for_tests I create a
> new fork_balanced_for_tests until we can reconcile both ?

Its not fork_for_tests, its the helper classes.

ec2test is easy to setup, docs are in the developer tree:
install the plugin.
. ~/.aws
install boto [support library]
./bzr push; ./bzr --selftest --parallel=ec2

-Rob

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...
>
> Bug given that the test suite is *not* fully running so far on
> windows, being able to run even part of it, faster is still a win IMHO.
>
> And even if starting new processes is slow, I doubt it can be slower
> to run 8 parallel newly started processes than a single started
> once process :).
>
> Hmm, I don't doubt it, I know in fact, the overhead of starting
> new processes and handling the results via subunit is high, even
> on linux, yet --parallel=fork reduces the *elapsed* time
> enormously. Unless that can't reproduced on windows, I far prefer
> being able to reduce the elapsed time than delay using -parallel=fork there.
>
> Would you be ok if instead of modifying fork_for_tests I create a
> new fork_balanced_for_tests until we can reconcile both ?

I'll mention that "os.fork()" doesn't exist on Windows, you have to
spawn a subprocess and pass it data.

I don't know if this patch effects those cases or not. But if the
problem is that you're worried about *fork* performance on Windows, it
is, indeed awful :).

If you are spawning a subprocess, and passing it data, then it seems
like most of the work is already done, and you just need that subprocess
to check its input when it is finished, to see if there is more to do....

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqSqnsACgkQJdeBCYSNAAMjsgCfYLMIv7i/G2iPRBL8pnAikHoB
oUYAn2a9sqHXlSVw4oNmat+TWhz19G5y
=Ydab
-----END PGP SIGNATURE-----

Unmerged revisions

4625. By Vincent Ladeuil

NEWS entry and some cleanup for submission.

4624. By Vincent Ladeuil

-Eslices conditions statistics display.

* bzrlib/tests/__init__.py:
(selftest_debug_flags): Add a 'slices' debug flag.
(fork_for_tests.TestInOtherProcess.run): Display some key
statistics related to test suite slicing.

4623. By Vincent Ladeuil

Some tuning and cleanup.

* bzrlib/tests/__init__.py:
(fork_for_tests.TestInOtherProcess.run): Cap slize size at 8 to
avoid spawning too much processes.
(fork_for_tests.TestInOtherProcess.run_slice): Don't leave zombies
around, that hurts performances when tenths or hundreds are
produced.

4622. By Vincent Ladeuil

Implement a balancing scheme to maximize processor utilisation.

* bzrlib/tests/__init__.py:
(fork_for_tests): Change the palce we fork to better control which
tests are run where.

4621. By Vincent Ladeuil

Start hacking on balancing parallel selftest.

* bzrlib/tests/__init__.py:
(fork_for_tests): Start balancing forked selftest. This version
does not work.

4620. By Vincent Ladeuil

Fixed as per John's review.

4619. By Vincent Ladeuil

Make --parallel=fork work again.

* bzrlib/tests/__init__.py:
(run_suite): CoutingDecorator is incompatible with
--parallel=fork, don't use the former when the later is
required (even if we lose the toal number of tests that has to be
run...).

4618. By Canonical.com Patch Queue Manager <email address hidden>

(robertc) Fix test_write_group to not test inappropriate things on
 RemoteRepository. (Robert Collins)

4617. By Canonical.com Patch Queue Manager <email address hidden>

(robertc) Prepare test_foreign for rich roots as the default format.
 (Robert Collins)

4616. By Canonical.com Patch Queue Manager <email address hidden>

(robertc) Change a KnownFailure into a test with two success paths in
 preperation for 2a as default. (Robert Collins)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'NEWS'
--- NEWS 2009-08-30 23:51:10 +0000
+++ NEWS 2009-08-31 04:35:37 +0000
@@ -148,6 +148,7 @@
148 --version`` and ``bzr selftest``.148 --version`` and ``bzr selftest``.
149 (Martin Pool, #409137)149 (Martin Pool, #409137)
150150
151<<<<<<< TREE
151* bzr can now (again) capture crash data through the apport library, 152* bzr can now (again) capture crash data through the apport library,
152 so that a single human-readable file can be attached to bug reports.153 so that a single human-readable file can be attached to bug reports.
153 This can be disabled by using ``-Dno_apport`` on the command line, or by154 This can be disabled by using ``-Dno_apport`` on the command line, or by
@@ -222,6 +223,17 @@
222 ``TestCase.start_server`` that registers a cleanup and starts the server223 ``TestCase.start_server`` that registers a cleanup and starts the server
223 should be used. (Robert Collins)224 should be used. (Robert Collins)
224225
226=======
227Testing
228*******
229
230* ``bzr selftest --parallel=fork`` now uses a better balancing algorithm
231 leading to a further ~25% reduction of the overall elapsed time. A new
232 ``-Eslices`` flag is available for selftest to display some statistics about
233 how the tests are distributed between the processes.
234 (Vincent Ladeuil)
235
236>>>>>>> MERGE-SOURCE
225bzr 1.18237bzr 1.18
226########238########
227239
228240
=== modified file 'bzrlib/tests/__init__.py'
--- bzrlib/tests/__init__.py 2009-08-28 21:05:31 +0000
+++ bzrlib/tests/__init__.py 2009-08-31 04:35:38 +0000
@@ -3051,49 +3051,94 @@
3051 """3051 """
3052 concurrency = osutils.local_concurrency()3052 concurrency = osutils.local_concurrency()
3053 result = []3053 result = []
3054
3054 from subunit import TestProtocolClient, ProtocolTestCase3055 from subunit import TestProtocolClient, ProtocolTestCase
3055 try:3056 try:
3056 from subunit.test_results import AutoTimingTestResultDecorator3057 from subunit.test_results import AutoTimingTestResultDecorator
3057 except ImportError:3058 except ImportError:
3058 AutoTimingTestResultDecorator = lambda x:x3059 AutoTimingTestResultDecorator = lambda x:x
3059 class TestInOtherProcess(ProtocolTestCase):3060
3061 all_tests = list(iter_suite_tests(suite))
3062 nb_tests = len(all_tests)
3063 shared_cur_test = [0]
3064
3065 import threading
3066 class TestInOtherProcess(object):
3060 # Should be in subunit, I think. RBC.3067 # Should be in subunit, I think. RBC.
3061 def __init__(self, stream, pid):3068 # Or in testtools given the coupling with ConcurrentTestSuite ? --vila
3062 ProtocolTestCase.__init__(self, stream)3069 # 20090819
3063 self.pid = pid3070
3071 def __init__(self, suite_semaphore, rank):
3072 self.suite_semaphore = suite_semaphore
3073 self.rank = rank
3074 self.nb_slices = 0
3075 self.nb_tests = 0
30643076
3065 def run(self, result):3077 def run(self, result):
3066 try:3078 self.suite_semaphore.acquire()
3067 ProtocolTestCase.run(self, result)3079 cur_test = shared_cur_test[0]
3068 finally:3080 while cur_test < nb_tests:
3069 os.waitpid(self.pid, os.WNOHANG)3081 # The slice size should be a balance between the overhead of
30703082 # processing a slice, and the ability to feed as many children
3071 test_blocks = partition_tests(suite, concurrency)3083 # as possible for the longest possible time (or said otherwise:
3072 for process_tests in test_blocks:3084 # the last standing child should run alone for the shortest
3073 process_suite = TestSuite()3085 # possible time). So we start by saying that each child will
3074 process_suite.addTests(process_tests)3086 # handle as many slices as there are children and reducing the
3075 c2pread, c2pwrite = os.pipe()3087 # slice size from there.
3076 pid = os.fork()3088 slice_size = ((nb_tests - cur_test)
3077 if pid == 0:3089 / (concurrency * concurrency)) + 8
3078 try:3090 if 'slices' in selftest_debug_flags:
3079 os.close(c2pread)3091 note('New slice for %d: %5d', self.rank, slice_size)
3080 # Leave stderr and stdout open so we can see test noise3092 # give a slice to first free child
3081 # Close stdin so that the child goes away if it decides to3093 first, last = cur_test, cur_test + slice_size
3082 # read from stdin (otherwise its a roulette to see what3094 shared_cur_test[0] = last
3083 # child actually gets keystrokes for pdb etc).3095 self.suite_semaphore.release()
3084 sys.stdin.close()3096
3085 sys.stdin = None3097 self.nb_slices += 1
3086 stream = os.fdopen(c2pwrite, 'wb', 1)3098 self.nb_tests += slice_size
3087 subunit_result = AutoTimingTestResultDecorator(3099 self.run_slice(result, all_tests[first:last])
3088 TestProtocolClient(stream))3100 if shared_cur_test[0] > nb_tests:
3089 process_suite.run(subunit_result)3101 break
3090 finally:3102 self.suite_semaphore.acquire()
3091 os._exit(0)3103 cur_test = shared_cur_test[0]
3092 else:3104 self.suite_semaphore.release()
3093 os.close(c2pwrite)3105 if 'slices' in selftest_debug_flags:
3094 stream = os.fdopen(c2pread, 'rb', 1)3106 note('%d ran %5d tests in %5d slices',
3095 test = TestInOtherProcess(stream, pid)3107 self.rank, self.nb_tests, self.nb_slices)
3096 result.append(test)3108
3109 def run_slice(self, result, tests):
3110 (f_read, f_write) = os.pipe()
3111 pid = os.fork()
3112 if pid == 0:
3113 try:
3114 # Leave stderr and stdout open so we can see test noise
3115 # Close stdin so that the child goes away if it decides
3116 # to read from stdin (otherwise its a roulette to see
3117 # what child actually gets keystrokes for pdb etc).
3118 sys.stdin.close()
3119 sys.stdin = None
3120 feedback = os.fdopen(f_write, 'wb', 1)
3121 os.close(f_read)
3122 subunit_result = AutoTimingTestResultDecorator(
3123 TestProtocolClient(feedback))
3124 process_suite = TestSuite()
3125 process_suite.addTests(tests)
3126 process_suite.run(subunit_result)
3127 finally:
3128 os._exit(0)
3129 else:
3130 feedback = os.fdopen(f_read, 'rb', 1)
3131 os.close(f_write)
3132 try:
3133 test = ProtocolTestCase(feedback)
3134 test.run(result)
3135 finally:
3136 os.waitpid(pid, 0)
3137
3138 suite_semaphore = threading.Semaphore(1)
3139 for proc in range(0, concurrency):
3140 test = TestInOtherProcess(suite_semaphore, proc)
3141 result.append(test)
3097 return result3142 return result
30983143
30993144
@@ -3254,6 +3299,8 @@
3254# rather than failing tests. And no longer raise3299# rather than failing tests. And no longer raise
3255# LockContention when fctnl locks are not being used3300# LockContention when fctnl locks are not being used
3256# with proper exclusion rules.3301# with proper exclusion rules.
3302# -Eslices Will output information about how the test suite is
3303# sliced while running with --parallel=fork
3257selftest_debug_flags = set()3304selftest_debug_flags = set()
32583305
32593306