Launchpad itself

Code review comment for lp:~benji/launchpad/bug-622765

bug-622765
Merge into devel

Revision history for this message

Benji York (benji) wrote on 2010-08-30:

On Mon, Aug 30, 2010 at 11:54 AM, Henning Eggers
<email address hidden> wrote:
> Review: Approve code
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi Benji,
> thank you for this nice fix. It is basically good to land after you have
> considered a few comments of mine.
>
> review approve code
>
> Cheers,
> Henning
>
> Am 30.08.2010 17:03, schrieb Benji York:
>>
>> === modified file 'lib/canonical/launchpad/scripts/tests/test_librarian_apache_log_parser.py'
>> --- lib/canonical/launchpad/scripts/tests/test_librarian_apache_log_parser.py 2010-08-20 20:31:18 +0000
>> +++ lib/canonical/launchpad/scripts/tests/test_librarian_apache_log_parser.py 2010-08-30 15:03:44 +0000
>> @@ -102,7 +102,7 @@
>> '69.233.136.42 - - [13/Jun/2008:14:55:22 +0100] "GET '
>> '/15018215/ul_logo_64x64.png HTTP/1.1" 200 2261 '
>> '"https://launchpad.net/~ubuntulite/+archive" "Mozilla"')
>> - downloads, parsed_bytes = parse_file(
>> + downloads, parsed_bytes, ignored = parse_file(
>
> I am not sure "ignored" is the best choice here. I wondered at first if it
> means "ignored_bytes" or "ignored_lines". Would it hurt to call it what it is
> ("parsed_lines") and then simply ignore it?

Done.

>> === modified file 'lib/lp/services/apachelogparser/script.py'
>> --- lib/lp/services/apachelogparser/script.py 2010-08-27 18:41:01 +0000
>> +++ lib/lp/services/apachelogparser/script.py 2010-08-30 15:03:44 +0000
>> @@ -8,6 +8,7 @@
>>
>> from zope.component import getUtility
>>
>> +from canonical.config import config
>> from lp.app.errors import NotFoundError
>> from lp.services.apachelogparser.base import (
>> create_or_update_parsedlog_entry,
>> @@ -65,8 +66,15 @@
>>
>> self.setUpUtilities()
>> country_set = getUtility(ICountrySet)
>> + parsed_lines = 0
>> + max_parsed_lines = getattr(
>> + config.launchpad, 'logparser_max_parsed_lines', None)
>> for fd, position in files_to_parse:
>> - downloads, parsed_bytes = parse_file(
>> + # If we've used up our budget of lines to process, stop.
>> + if (max_parsed_lines is not None
>> + and parsed_lines >= max_parsed_lines):
>
> This line must be indented and the "and" should probably go on the previous
> line but multi-line "if" conditions are always bad to read because the
> dangling lines align with the body of the statement. Storing the boolean in a
> variable first and checking that in the statement often works better.

Done.

>> + # If we have already parsed some lines, then the number of lines
>> + # parsed will be passed in (parsed_lines argument) and parse_file will
>> + # take that number into account when determining if the maximum number
>> + # of lines to parse has been reached.
>
> Hm, this is a functional test and not documentation. This description should
> probably be found in the docstring of parse_file, should it not? If found
> there, it need not be explained in such detail here.

My intent was to explain the functioning of the test. I'm sure I would
appreciate a nice explanation of what's going on if I was reading the
test. I've also added some explanation of lines_parsed to parse_file.

>> + parsed_lines = 1
>> + downloads, parsed_bytes, parsed_lines = parse_file(
>> + fd, start_position=0, logger=self.logger,
>> + get_download_key=get_path_download_key, parsed_lines=parsed_lines)
>> +
>> + # Since we told parse_file that we had already parsed 1 line and the
>> + # limit is 2 lines, it only parsed the first line of the file which is
>> + # a 404, so there were no downloads recorded.
>
> The 404 is fairly irrelevant to the test. If you want to check that the parser
> stopped at the right place, you will need a downloaded item right before the
> limit and one right after the limit and check for those.
>
> You could construct the two lines in a string right here in the test and use
> StringIO to feed it to parse_file. That would be more explicit and independent
> of sample data.

Since other tests show that the right number of lines are parsed when
the maximum is reached, I'm not attempting to test that here. I just
want to show that if a non-zero parsed_lines is passed in, the number of
lines parsed will be less than it would otherwise be. I've added a
comment in the test to that effect.

>> + fd.seek(0)
>
> Is that really needed at the end?

Nope, it's a vestige of some copy/paste. I've removed it.
--
Benji York

On Mon, Aug 30, 2010 at 11:54 AM, Henning Eggers
<henning.eggers@canonical.com> wrote:
> Review: Approve code
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi Benji,
> thank you for this nice fix. It is basically good to land after you have
> considered a few comments of mine.
>
>  review approve code
>
> Cheers,
> Henning
>
> Am 30.08.2010 17:03, schrieb Benji York:
>>
>> === modified file 'lib/canonical/launchpad/scripts/tests/test_librarian_apache_log_parser.py'
>> --- lib/canonical/launchpad/scripts/tests/test_librarian_apache_log_parser.py 2010-08-20 20:31:18 +0000
>> +++ lib/canonical/launchpad/scripts/tests/test_librarian_apache_log_parser.py 2010-08-30 15:03:44 +0000
>> @@ -102,7 +102,7 @@
>>              '69.233.136.42 - - [13/Jun/2008:14:55:22 +0100] "GET '
>>              '/15018215/ul_logo_64x64.png HTTP/1.1" 200 2261 '
>>              '"https://launchpad.net/~ubuntulite/+archive" "Mozilla"')
>> -        downloads, parsed_bytes = parse_file(
>> +        downloads, parsed_bytes, ignored = parse_file(
>
> I am not sure "ignored" is the best choice here. I wondered at first if it
> means "ignored_bytes" or "ignored_lines". Would it hurt to call it what it is
> ("parsed_lines") and then simply ignore it?

Done.

>> === modified file 'lib/lp/services/apachelogparser/script.py'
>> --- lib/lp/services/apachelogparser/script.py 2010-08-27 18:41:01 +0000
>> +++ lib/lp/services/apachelogparser/script.py 2010-08-30 15:03:44 +0000
>> @@ -8,6 +8,7 @@
>>
>>  from zope.component import getUtility
>>
>> +from canonical.config import config
>>  from lp.app.errors import NotFoundError
>>  from lp.services.apachelogparser.base import (
>>      create_or_update_parsedlog_entry,
>> @@ -65,8 +66,15 @@
>>
>>          self.setUpUtilities()
>>          country_set = getUtility(ICountrySet)
>> +        parsed_lines = 0
>> +        max_parsed_lines = getattr(
>> +            config.launchpad, 'logparser_max_parsed_lines', None)
>>          for fd, position in files_to_parse:
>> -            downloads, parsed_bytes = parse_file(
>> +            # If we've used up our budget of lines to process, stop.
>> +            if (max_parsed_lines is not None
>> +            and parsed_lines >= max_parsed_lines):
>
> This line must be indented and the "and" should probably go on the previous
> line but multi-line "if" conditions are always bad to read because the
> dangling lines align with the body of the statement. Storing the boolean in a
> variable first and checking that in the statement often works better.

Done.

>> +        # If we have already parsed some lines, then the number of lines
>> +        # parsed will be passed in (parsed_lines argument) and parse_file will
>> +        # take that number into account when determining if the maximum number
>> +        # of lines to parse has been reached.
>
> Hm, this is a functional test and not documentation. This description should
> probably be found in the docstring of parse_file, should it not? If found
> there, it need not be explained in such detail here.

My intent was to explain the functioning of the test.  I'm sure I would
appreciate a nice explanation of what's going on if I was reading the
test.  I've also added some explanation of lines_parsed to parse_file.

>> +        parsed_lines = 1
>> +        downloads, parsed_bytes, parsed_lines = parse_file(
>> +            fd, start_position=0, logger=self.logger,
>> +            get_download_key=get_path_download_key, parsed_lines=parsed_lines)
>> +
>> +        # Since we told parse_file that we had already parsed 1 line and the
>> +        # limit is 2 lines, it only parsed the first line of the file which is
>> +        # a 404, so there were no downloads recorded.
>
> The 404 is fairly irrelevant to the test. If you want to check that the parser
> stopped at the right place, you will need a downloaded item right before the
> limit and one right after the limit and check for those.
>
> You could construct the two lines in a string right here in the test and use
> StringIO to feed it to parse_file. That would be more explicit and independent
> of sample data.

Since other tests show that the right number of lines are parsed when
the maximum is reached, I'm not attempting to test that here.  I just
want to show that if a non-zero parsed_lines is passed in, the number of
lines parsed will be less than it would otherwise be.  I've added a
comment in the test to that effect.

>> +        fd.seek(0)
>
> Is that really needed at the end?

Nope, it's a vestige of some copy/paste.  I've removed it.
-- 
Benji York

« Back to merge proposal