^- I think factoring the patterns into a dict indirected code is interesting. I'm a bit concerned about:
for pattern_type, patterns in pattern_lists.iteritems():
I'm pretty sure we explicitly intended the order of 'check extensions' then 'check basename' then 'check fullpath'. So that we can do checks on just the little bit of the content before we have to check against the whole content. I don't have any explicit proof of this, but turning the current deterministic ordering with a random one doesn't seem ideal.
otherwise the change seems fine to me. I may be over-concerned, but I do believe that 'is_ignored()' is a fair amount of overhead in the 'initial add of 30k files', and I'd like us to at least be aware of it. I do wonder about changing the internals so that instead of using '?:.*/)?(?!.*/)) to filter out everything but the basename for some patterns, if we wouldn't be better off knowing that this is a basename-only pattern and just supply that to the match.
I'm certainly willing to discuss it, but the easiest way to move forward is to just make the order of _add_patterns deterministic by iterating over a fixed ordering.
79 - self._add_ patterns( ext_patterns, _sub_extension, r'(?:.* /)?(?!. */)(?:. *\.)') patterns( base_patterns, _sub_basename, r'(?:.* /)?(?!. */)') patterns( path_patterns, _sub_fullpath) lists[Globster. identify( pat)].append( pat) lists.iteritems (): patterns( patterns, translators[ pattern_ type], prefixes[ pattern_ type])
80 - prefix=
81 - self._add_
82 - prefix=
83 - self._add_
84 + pattern_
85 + for pattern_type, patterns in pattern_
86 + self._add_
87 + Globster.
88 + Globster.
^- I think factoring the patterns into a dict indirected code is interesting. I'm a bit concerned about: lists.iteritems ():
for pattern_type, patterns in pattern_
I'm pretty sure we explicitly intended the order of 'check extensions' then 'check basename' then 'check fullpath'. So that we can do checks on just the little bit of the content before we have to check against the whole content. I don't have any explicit proof of this, but turning the current deterministic ordering with a random one doesn't seem ideal.
otherwise the change seems fine to me. I may be over-concerned, but I do believe that 'is_ignored()' is a fair amount of overhead in the 'initial add of 30k files', and I'd like us to at least be aware of it. I do wonder about changing the internals so that instead of using '?:.*/)?(?!.*/)) to filter out everything but the basename for some patterns, if we wouldn't be better off knowing that this is a basename-only pattern and just supply that to the match.
I'm certainly willing to discuss it, but the easiest way to move forward is to just make the order of _add_patterns deterministic by iterating over a fixed ordering.