Robert Collins wrote:
> On Tue, 2009-06-02 at 20:50 +0000, John A Meinel wrote:
>> John A Meinel has proposed merging lp:~jameinel/bzr/1.16-commit-fulltext into lp:bzr.
>>
>> Requested reviews:
>> bzr-core (bzr-core)
>>
>> This branch adds a new api, VersionedFiles.add_text(). If people really want, I could change it to VF.add_chunks(), but add_text() fits what I needed, and was expedient.
>
> Is it at all possible to use insert_record_stream?
>
> I'd really like to shrink the VF surface area, not increase it.
>
> -Rob
>
Not trivially.
1) It is an incompatible api change to insert_record_stream
2) It requires setting up a FulltextContextFactory and passing in a
stream of 1 entry just to add a text, which isn't particularly nice.
3) It requires adding lots of parameters like 'nostore_sha', and
'random_id', etc, onto insert_record_stream
4) It requires rewriting the internals of
KnitVersionedFiles.insert_record_stream to *not* thunk back to
self.add_lines(chunks_to_lines(record.get_bytes_as('chunked')))
5) nostore_sha especially doesn't fit with the theology of
insert_record_stream. It is really only applicable to a single text, and
insert_record_stream is really designed around many texts. Wedging new
parameters onto a function where it doesn't really fit doesn't seem
*better*.
6) As for VF surface area, there is at least a default implementation
that simply thunks over to .add_lines() for those that don't strictly
care about memory performance. (And thus works fine for Weaves, etc.)
In theory we could try to layer it so that we had an 'ongoing' stream,
and 'yield' texts to be inserted as we find them. But that really
doesn't fit 'nostore_sha' since that also needs to be passed in, and
needs to raise an exception which breaks the stream.
Also, I thought we *wanted* commit for groupcompress to not have to do
deltas, and if we stream the texts in, we would spend a modest amount of
time getting poor compression between text files. (Note that we were
already spending that time to compute the delta index, but I have a
patch which fixes that.)
I can understand wanting to shrink the api. If you really push on it,
I'm willing to deprecate .add_lines() and write a .add_chunks() that is
meant to replace it. (since you can .add_chunks(lines) and
.add_chunks([text])) However, chunks fits slightly worse for knits,
since the Content code and annotation and deltas needs lines anyway, and
Groupcompress wants fulltexts...
So if you push hard, I'll try to find the time to do it. But this was
*much* easier.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote: add_text( ). If people really want, I could change it to VF.add_chunks(), but add_text() fits what I needed, and was expedient. record_ stream?
> On Tue, 2009-06-02 at 20:50 +0000, John A Meinel wrote:
>> John A Meinel has proposed merging lp:~jameinel/bzr/1.16-commit-fulltext into lp:bzr.
>>
>> Requested reviews:
>> bzr-core (bzr-core)
>>
>> This branch adds a new api, VersionedFiles.
>
> Is it at all possible to use insert_
>
> I'd really like to shrink the VF surface area, not increase it.
>
> -Rob
>
Not trivially.
1) It is an incompatible api change to insert_ record_ stream
2) It requires setting up a FulltextContext Factory and passing in a
stream of 1 entry just to add a text, which isn't particularly nice.
3) It requires adding lots of parameters like 'nostore_sha', and record_ stream
'random_id', etc, onto insert_
4) It requires rewriting the internals of les.insert_ record_ stream to *not* thunk back to lines(chunks_ to_lines( record. get_bytes_ as('chunked' )))
KnitVersionedFi
self.add_
5) nostore_sha especially doesn't fit with the theology of record_ stream. It is really only applicable to a single text, and record_ stream is really designed around many texts. Wedging new
insert_
insert_
parameters onto a function where it doesn't really fit doesn't seem
*better*.
6) As for VF surface area, there is at least a default implementation
that simply thunks over to .add_lines() for those that don't strictly
care about memory performance. (And thus works fine for Weaves, etc.)
In theory we could try to layer it so that we had an 'ongoing' stream,
and 'yield' texts to be inserted as we find them. But that really
doesn't fit 'nostore_sha' since that also needs to be passed in, and
needs to raise an exception which breaks the stream.
Also, I thought we *wanted* commit for groupcompress to not have to do
deltas, and if we stream the texts in, we would spend a modest amount of
time getting poor compression between text files. (Note that we were
already spending that time to compute the delta index, but I have a
patch which fixes that.)
I can understand wanting to shrink the api. If you really push on it, [text]) ) However, chunks fits slightly worse for knits,
I'm willing to deprecate .add_lines() and write a .add_chunks() that is
meant to replace it. (since you can .add_chunks(lines) and
.add_chunks(
since the Content code and annotation and deltas needs lines anyway, and
Groupcompress wants fulltexts...
So if you push hard, I'll try to find the time to do it. But this was
*much* easier.
John
=:->
-----BEGIN PGP SIGNATURE----- enigmail. mozdev. org
l5pQACgkQJdeBCY SNAAMreQCgqQEE9 8oPL7DgmZnJRDZI f4F7 lLPwRI0rUwJpy/ F8+
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAko
g2gAn3sZ91KqokO
=wAGi
-----END PGP SIGNATURE-----