libpqxx:no-unquoted-backslash

Last commit made on 2022-10-26
Get this branch:
git clone -b no-unquoted-backslash https://git.launchpad.net/libpqxx

Branch merges

Branch information

Name:
no-unquoted-backslash
Repository:
lp:libpqxx

Recent commits

389c78e... by Jeroen T. Vermeulen

Backslashes do not occur in unquoted strings.

Removes a little wart in the array/composite/range field parser.

4565895... by Jeroen Vermeulen <email address hidden>

Overhaul array parser. Comma-separated types only. (#606)

The `array_parser` was seriously broken (#590): SQL arrays may contain elements that have a semicolon in them... _and the back-end won't put them in quotes._ The parser would always see that as a field separator. Same thing for commas in e.g. the SQL "box" type, which uses the semicolon as its separator but uses commas inside an object.

So I'm limiting `array_parser` for use with comma-separated types only, and planning a better, friendlier, faster, more flexible API for parsing arrays.

At the same time, I did manage to specialise `array_parser` internally to different encodings, which should make it considerably faster. These changes will also benefit the future array parsing API.

b779194... by Jeroen Vermeulen <email address hidden>

Make more functions `noexcept`. (#605)

509cf38... by Jeroen T. Vermeulen

Document.

ec1e30d... by Jeroen Vermeulen <email address hidden>

Bump minimum libpq/postgres to version 10. (#604)

The PostgreSQL project no longer supports postgres 9, so I'm dropping
it off the libpqxx support list as well.

The only consequence code-wise is that libpqxx will now assume that
libpq has a `PQencryptPasswordConn()`.

Actually might be nice to have a compile-time check for an acceptable
libpq version. But that's a separate project.

9bf8be5... by Jeroen T. Vermeulen

Remove an unjustified `constexpr`.

e809439... by Jeroen Vermeulen <email address hidden>

Speed up encoding handling in streams. (#601)

The optimisations are:
1. Inline glyph scanning function in the search loop.
2. For "ASCII-safe" encodings, use the "monobyte" search loop.

The inlining optimisation works as follows: Previously the stream class
kept a pointer to a function that figures out glyph boundaries (the byte
where the next character begins in a byte string). It looks up the
function specialised for the current kind of encoding: UTF-8, GBK, SJIS,
etc... or "monobyte" for single-byte encodings. In libpqxx I call those
functions _glyph scanners._ But this way of working is painfully slow:
the stream calls that function pointer for every single character it
tries to read. Here, I rewrite the loop to use a different specialised
function pointer, which works at a higher level: "Find any one of these
special characters." That means that the inner loop is now inside that
function, not on the outside calling in. Gives the compiler more of a
chance to optimise the loop.

The other change is based on the fact that many encodings have two
basic kinds of characters: ASCII ones which are in the 0..127 range,
and non-ASCII ones in the non-ASCII byte range — they have the high
bit in their bit value set to 1. And that means that we can never
have the "SJIS" situation where an ASCII byte value (such as that of a
backslash character) can also occur _inside_ a multibyte character.
When we know we're in an encoding where that can't ever happen (and
UTF-8 is one of those!) then we don't need the glyph scanner for that
encoding at all. We can just use the simpler "monobyte" glyph scanner
which just always returns `offset + 1`.

Neither of these optimisations is particularly powerful on its own.
Inlining UTF-8 scanning (for instance) will probably be a bit faster
than a function pointer, but it won't be a huge difference. And calling
a simpler glyph scanner won't do us much good, especially if that just
means that we'll need to call it 3 times for a 3-byte character, for
instance. But the two changes work well together: the monobyte scanner
can be as simple as an `offset++`.

Unfortunately this is an ABI-breaking change. We're replacing a
function pointer field with a pointer to a different type of function. So, I'm bumping to version 7.8.0.

1acde50... by Jeroen Vermeulen <email address hidden>

No such thing as single-quoted array/composite elements! (#587)

Turns out elements of arrays or of composite values are never single-quoted. If we see an element that's surrounded by single quotes, those quotes are part of the string itself.

477ced2... by Jeroen T. Vermeulen

Parallel test build.

571bfc1... by Jeroen Vermeulen <email address hidden>

Test `stream_from` scanning better. (#599)

I was too ambitious in a previous attempt. I want to replace some text
scanning with a "find single-byte character" function specialised to the
applicable encoding. To begin with, I want to do this in `stream_from`.
I then hope to be able to inline the scanning of character boundaries in
that function; and I hope to use "monobyte" scanning for UTF-8, since it
will never have a byte inside a multibyte character that looks as if it
were an ASCII character. Those changes should make `stream_from` tons
faster, especially for monobyte encodings and "ASCII-safe" encodings
like UTF-8.

But that's a lot of work, with lots of opportunities to mess up. So, as
a first step, I'm testing the glyph scanning in `stream_from` more
thoroughly. This'll give me more confidence as I refactor the code.