Merge lp:~mars/launchpad/fix-test_on_merge-578886 into lp:launchpad
Status: | Superseded |
---|---|
Proposed branch: | lp:~mars/launchpad/fix-test_on_merge-578886 |
Merge into: | lp:launchpad |
Diff against target: |
321 lines (+175/-64) 2 files modified
buildout-templates/bin/test.in (+0/-9) test_on_merge.py (+175/-55) |
To merge this branch: | bzr merge lp:~mars/launchpad/fix-test_on_merge-578886 |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Māris Fogels (community) | Needs Resubmitting | ||
Michael Hudson-Doyle | Approve | ||
Review via email: mp+25981@code.launchpad.net |
This proposal has been superseded by a proposal from 2010-07-28.
Description of the change
Hi,
This branch fixes test_on_merge.py so that it now terminates the test suite properly after 10 minutes. This means EC2 instances will no longer hang when they kill the test suite.
This branch changes how the testrunner watchdog handles the process groups of itself and its children. The diff explains why the process group fiddling takes place. In order for the testrunner watchdog to control the process groups I had to remove process group handling from bin/test.
I changed the code a fair bit by introducing many comments and some new functions to encapsulate the existing code. I found these changes necessary to understand what the code was doing, as I only had basic knowledge of Unix process handling before this change. This code should now be easier for others to understand.
I dropped the testrunner timeout to 10 minutes. No test should run that long without output.
I cleaned up the print statements a bit, personal preference.
I added some detailed output to the testrunner kill procedure. This is failing loudly, with the hope that the script's behaviour will be less opaque when things go wrong in the code that handles things going wrong.
This branch was tested with a hand-crafted windmill testrunner harness. The harness forces an hour-long sleep() in one of the windmill tests. I used this harness to test both a local test suite timeout and an ec2 test suite timeout, and to ensure that they can kill off an entire process group.
I will complete a full "make check" run and a full "ec2 test" run with this branch before landing.
Pre-implementation call with: no one
Test command: see above
Maris
Gary asked me to have a look at this.
I think it broadly looks reasonable -- certainly it terms of readability it's way better than what went before -- and it's great that you've tested that it works. The proof is in the pudding here!
Just to be clear, the reason this stopped working is because there's now a "xvfb-run" process 'between' the test_on_merge.py script and the bin/test script?
I have a little apprehension about simply dying if we're the process leader. For a start, it means you can't just run "./test_ on_merge. py" from the shell! It also might not work on buildbot -- I think it actually will, unless we change our configuration to specify usePTY=True, but that seems rather fragile. Did you consider re-execing the process if we find we're the process group leader? I've pushed an implementation of this idea to lp:~mwhudson/launchpad/fix-test_on_merge-578886 and it seems to work.
If you merge my change in or do something similar, I think this is good to land.