Code review comment for lp:~wgrant/launchpad/distroseries-source-format-selection

Revision history for this message
Julian Edwards (julian-edwards) wrote :

> While your suggested solution would not work (at least not without the
> 'nonlocal' keyword), I believe I have found something better. I keep one
> dict mapping SPFTs to counts, and another mapping component names to
> counts. This means there's only one special case (for
> COMPONENT_ORIG_TARBALL), and the code for each format becomes much
> cleaner too.

Ah, this is 100% better than my suggestion, nice one!

> I've split it, and completely changed the verification style. The
> failure messages are much less verbose, but one would have to manually
> edit the .dsc to get any of them anyway. I feel the reduction in code
> makes things much better.

Actually it is not only better, it's easier to test because there are
fewer cases!

> This does make testing much easier.
>
> However, there are now just a few code paths, but a lot of potential
> failure cases. How completely should I test?

We always test 100% code coverage at least and any peculiar corner cases
that are obvious.

I'd be happy here to have 2 tests per format type; one that tests it accepts
correctly and one that tests it fails correctly, since that's pretty much
all the code is doing now.

> > Also, can you please change it to compare against the format enum instead
> > of strings. Then the assertion error below will make more sense.
> >
> > > else:
> > > raise AssertionError("Unknown source format.")
>
> It will now die with a KeyError when looking up the file checker, which
> seems as reasonable as the old explicit AssertionError.

I don't really like that, it's not as explicit as the old error, particularly
because the error will appear in the rejection email (unlikely as it is).

Can you catch the KeyError and re-raise it as AssertionError with some good
text. This will ensure if/when we add more format types, it will be easier to
track down why the failure occurred.

> The two uses are on different types: this is a SourceUploadFile, while
> the other is a SourcePackageReleaseFile. Should I instead create a
> utility function alongside SPFT, perhaps?

I would make a mix-in class that both can inherit from and put the property
on that.

> @@ -375,35 +377,32 @@
> and checksum.
> """
>
> - diff_count = 0
> - debian_tar_count = 0
> - orig_tar_count = 0
> + file_type_counts = {
> + SourcePackageFileType.DIFF: 0,
> + SourcePackageFileType.ORIG_TARBALL: 0,
> + SourcePackageFileType.DEBIAN_TARBALL: 0,
> + SourcePackageFileType.NATIVE_TARBALL: 0,
> + }
> component_orig_tar_counts = {}
> - native_tar_count = 0
> -
> bzip2_count = 0
> -

*Loads* better!

> files_missing = False
> +
> for sub_dsc_file in self.files:
> - filetype = determine_source_file_type(sub_dsc_file.filename)
> -
> - if filetype == SourcePackageFileType.DIFF:
> - diff_count += 1
> - elif filetype == SourcePackageFileType.ORIG_TARBALL:
> - orig_tar_count += 1
> - elif filetype == SourcePackageFileType.COMPONENT_ORIG_TARBALL:
> + file_type = determine_source_file_type(sub_dsc_file.filename)
> +
> + if file_type is None:
> + yield UploadError('Unknown file: ' + sub_dsc_file.filename)
> + continue
> +
> + if file_type == SourcePackageFileType.COMPONENT_ORIG_TARBALL:
> + # Split the count by component name.
> component = re_is_component_orig_tar_ext.match(
> get_source_file_extension(sub_dsc_file.filename)).group(1)
> if component not in component_orig_tar_counts:
> component_orig_tar_counts[component] = 0
> component_orig_tar_counts[component] += 1
> - elif filetype == SourcePackageFileType.DEBIAN_TARBALL:
> - debian_tar_count += 1
> - elif filetype == SourcePackageFileType.NATIVE_TARBALL:
> - native_tar_count += 1
> else:
> - yield UploadError('Unknown file: ' + sub_dsc_file.filename)
> - continue
> + file_type_counts[file_type] += 1

This is one of the best bits of crappy code cleaned up I've seen in years :)

> +def check_format_1_0_files(filename, file_type_counts, component_counts,
> + bzip2_count):
> + """Check that the given counts of each file type suit format 1.0.
> +
> + A 1.0 source must be native (with only one tar.gz), or have an orig.tar.gz
> + and a diff.gz. It cannot use bzip2 compression.
> + """
> + if bzip2_count > 0:
> + yield UploadError(
> + "%s: is format 1.0 but uses bzip2 compression."
> + % filename)
> +
> + if file_type_counts not in ({
> + SourcePackageFileType.NATIVE_TARBALL: 1,
> + SourcePackageFileType.ORIG_TARBALL: 0,
> + SourcePackageFileType.DEBIAN_TARBALL: 0,
> + SourcePackageFileType.DIFF: 0,
> + }, {
> + SourcePackageFileType.ORIG_TARBALL: 1,
> + SourcePackageFileType.DIFF: 1,
> + SourcePackageFileType.NATIVE_TARBALL: 0,
> + SourcePackageFileType.DEBIAN_TARBALL: 0,
> + }) or len(component_counts) > 0:

This is a little ugly. Can you do something like:

    valid_file_type_counts = [
        {
            SourcePackageFileType.NATIVE_TARBALL: 1,
            ...
        },
        {
            ...
        },
    ]

    if file_type_counts not in valid_file_type_counts ...

And the same for the other check funcs as well.

[snip]

Otherwise a great change!

Let me see these few changes and then we'll get this puppy landed.

Cheers
J

review: Needs Fixing (code)

« Back to merge proposal