Code review comment for lp:~danilo/launchpad/bug-408718

Revision history for this message
Данило Шеган (danilo) wrote :

= Bug #408717 =

Note: this is a cherrypick candidate so is kept as simple as possible.
It's critical for Launchpad Translations (poimport is blocked on this).

We had super-fast-imports query (which is a huge optimization for our
imports, resulting in 50x faster imports on average) cause Postgres to
eat up diskspace with repeated requests for more temp space. That
happened a month ago (when bug #408717 was filed and considered a
one-off), and it happened again on Friday morning. We all suspect a
Postgres bug, but that doesn't help us out.

However, since this is only an optimization, we can avoid this query
when we do hit a problem. So, how do we know when we'll hit a problem?

Simple: a timeout. 5 minute timeout (300s) should be enough to handle
all real query runs which do not exhibit the bad behaviour. If this
query takes more than 5 minutes, it's probably hitting a PG bug, so
let's kill it. Note that PG was completely operational when this
happened, so it should be able to fulfill this as well.

Note that exactly the same query that failed when it was ran manually
caused no issues: so, it's not a particular query that fails, it's a
particular set of circumstances that's out of our control (i.e.
something like a PG bug) that make the query fail.

= Pre-implementation notes =

Discussed this with Stuart: he thinks it's a great idea :)

He advised me about what I need to do to set the statement timeout and
how to best test it.

= Proposed fix =

So, introduce a statement timeout right around the potentially breaking
query. If it doesn't complete in the allotted time, kill it and go the
slow route for this particular import.

poimport.statement_timeout option allows one to set it to 'timeout'
which will trigger the timeout on purpose. This is used in a test.

= Tests =

bin/test -vvt poimport -t superfastimports

= Demo & QA =

Test poimport thoroughly.

« Back to merge proposal