lttng-modules:stable-2.1

Last commit made on 2013-07-11
Get this branch:
git clone -b stable-2.1 https://git.launchpad.net/lttng-modules

Branch merges

Branch information

Name:
stable-2.1
Repository:
lp:lttng-modules

Recent commits

4dc9488... by Mathieu Desnoyers

Version 2.1.3

Signed-off-by: Mathieu Desnoyers <email address hidden>

e6f88d6... by Mathieu Desnoyers

Fix: ring buffer: get_subbuf() checks should be performed on "consumed" parameter

This triggers lots of false-positive -EAGAIN errors in flight recorder
snapshots.

Reported-by: Julien Desfossez <email address hidden>
Signed-off-by: Mathieu Desnoyers <email address hidden>

7ae9296... by Mathieu Desnoyers

Fix: SWITCH_FLUSH new sub-buffer checks

The SWITCH_FLUSH, when performed on a completely empty sub-buffer, was
missing some checks (imported from space reservation).

Signed-off-by: Mathieu Desnoyers <email address hidden>

0348b08... by Mathieu Desnoyers

Fix: ring buffer: handle concurrent update in nested buffer wrap around check

With stress-test loads that trigger sub-buffer switch very frequently
(small 4kB sub-buffers, frequent flush), we currently observe this kind
of warnings once every few minutes:

[65335.896208] ring buffer relay-overwrite-mmap, cpu 5: records were lost. Caused by:
[65335.896208] [ 0 buffer full, 1 nest buffer wrap-around, 0 event too big ]

It appears that the check for nested buffer wrap-around does not take
into account that a concurrent execution contexts (either nested for
per-cpu buffers, or from another CPU or nested for global buffers) can
update the commit_count value concurrently.

What we really want to do with this check is to ensure that if we enter
a sub-buffer that had an unbalanced reserve/commit count, assuming there
is no hope that this gets rebalanced promptly, we detect this and drop
the current event. However, in the case where the commit counter has
been concurrently updated by another reserve or a switch, we want to
retry the entire reserve operation.

One way to detect this is to sample the reserve offset twice, around the
commit counter read, along with the appropriate memory barriers.
Therefore, we can detect if the mismatch between reserve and commit
counter is actually caused by a concurrent update, which necessarily has
updated the reserve counter.

Signed-off-by: Mathieu Desnoyers <email address hidden>

43b690e... by Mathieu Desnoyers

Fix: handle writes of length 0

lib_ring_buffer_write(), lib_ring_buffer_memset() and
lib_ring_buffer_copy_from_user_inatomic() could be passed a length of 0.
This typically has no side-effect as far as writing into the buffers is
concerned, except for one detail: in overwrite mode, there is a check to
make sure the sub-buffer can be written into. This check is performed
even if length is 0. In the case where this would fall exactly at the
end of a sub-buffer, the check would fail, because the offset would fall
exactly at the beginning of the next sub-buffer.

It triggers this warning:

[65356.890016] ------------[ cut here ]------------
[65356.890016] WARNING: at /home/compudj/git/lttng-modules/wrapper/ringbuffer/../../lib/ringbuffer/../../wrapper/ringbuffer/../../lib/ringbuffer/backend.h:110 lttng_event_write+0x118/0x140 [lttng_ring_buffer_client_mmap_overwrite]()
[65356.890016] Hardware name: X7DAL
[65356.890016] Modules linked in: lttng_probe_writeback(O) lttng_probe_workqueue(O) lttng_probe_vmscan(O) lttng_probe_udp(O) lttng_probe_timer(O) lttng_probe_sunrpc(O) lttng_probe_statedump(O) lttng_probe_sock(O) lttng_probe_skb(O) lttng_probe_signal(O) lttng_probe_scsi(O) lttng_probe_sched(O) lttng_probe_rcu(O) lttng_probe_random(O) lttng_probe_printk(O) lttng_probe_power(O) lttng_probe_net(O) lttng_probe_napi(O) lttng_probe_module(O) lttng_probe_kvm(O) lttng_probe_kmem(O) lttng_probe_jbd2(O) lttng_probe_jbd(O) lttng_probe_irq(O) lttng_probe_ext4(O) lttng_probe_ext3(O) lttng_probe_compaction(O) lttng_probe_btrfs(O) lttng_probe_block(O) lttng_types(O) lttng_ring_buffer_metadata_mmap_client(O) lttng_ring_buffer_client_mmap_overwrite(O) lttng_ring_buffer_client_mmap_discard(O) lttng_ring_buffer_metadata_client(O) lttng_ring_buffer_client_overwrite(O) lttng_ring_buffer_client_discard(O) lttng_tracer(O) lttng_kretprobes(O) lttng_ftrace(O) lttng_kprobes(O) lttng_statedump(O) lttng_lib_ring_buffer(O) cpufreq_ondemand loop e1000e kvm_intel kvm ptp pps_core [last unloaded: lttng_lib_ring_buffer]
[65357.287529] Pid: 0, comm: swapper/7 Tainted: G O 3.9.4-trace-test #143
[65357.309694] Call Trace:
[65357.317022] <IRQ> [<ffffffff8103a3ef>] warn_slowpath_common+0x7f/0xc0
[65357.336893] [<ffffffff8103a44a>] warn_slowpath_null+0x1a/0x20
[65357.354368] [<ffffffffa0ff17b8>] lttng_event_write+0x118/0x140 [lttng_ring_buffer_client_mmap_overwrite]
[65357.383025] [<ffffffffa100134f>] __event_probe__block_rq_with_error+0x1bf/0x220 [lttng_probe_block]
[65357.410376] [<ffffffff812ea134>] blk_update_request+0x324/0x720
[65357.428364] [<ffffffff812ea561>] blk_update_bidi_request+0x31/0x90
[65357.447136] [<ffffffff812eb68c>] blk_end_bidi_request+0x2c/0x80
[65357.465127] [<ffffffff812eb6f0>] blk_end_request+0x10/0x20
[65357.481822] [<ffffffff81406b7c>] scsi_io_completion+0x9c/0x670
[65357.499555] [<ffffffff813fe320>] scsi_finish_command+0xb0/0xe0
[65357.517283] [<ffffffff81406965>] scsi_softirq_done+0xa5/0x140
[65357.534758] [<ffffffff812f1d30>] blk_done_softirq+0x80/0xa0
[65357.551710] [<ffffffff81043b00>] __do_softirq+0xe0/0x440
[65357.567881] [<ffffffff81043ffe>] irq_exit+0x9e/0xb0
[65357.582754] [<ffffffff81026465>] smp_call_function_single_interrupt+0x35/0x40
[65357.604388] [<ffffffff8167be2f>] call_function_single_interrupt+0x6f/0x80
[65357.624976] <EOI> [<ffffffff8100ac06>] ? default_idle+0x46/0x300
[65357.643541] [<ffffffff8100ac04>] ? default_idle+0x44/0x300
[65357.660235] [<ffffffff8100b899>] cpu_idle+0x89/0xe0
[65357.675109] [<ffffffff81664911>] start_secondary+0x220/0x227

Always from an event that can write a 0-length field as last field of
its payload, and it always happen directly on a sub-buffer boundary.

While we are there, check for length 0 in lib_ring_buffer_read_cstr()
too.

Signed-off-by: Mathieu Desnoyers <email address hidden>

8dfa3bb... by Mathieu Desnoyers

Fix: ring buffer: RING_BUFFER_FLUSH ioctl buffer corruption

lib_ring_buffer_switch_slow() clearly states:

 * Note, however, that as a v_cmpxchg is used for some atomic
 * operations, this function must be called from the CPU which owns the
 * buffer for a ACTIVE flush.

But unfortunately, the RING_BUFFER_FLUSH ioctl does not follow these
important directives. Therefore, whenever the consumer daemon or session
daemon explicitly triggers a "flush" on a buffer, it can race with data
being written to the buffer, leading to corruption of the reserve/commit
counters, and therefore corruption of data in the buffer. It triggers
these warnings for overwrite mode buffers:

[65356.890016] WARNING: at
/home/compudj/git/lttng-modules/wrapper/ringbuffer/../../lib/ringbuffer/../../wrapper/ringbuffer/../../lib/ringbuffer/backend.h:110 lttng_event_write+0x118/0x140 [lttng_ring_buffer_client_mmap_overwrite]()

Which indicates that we are trying to write into a sub-buffer for which
we don't have exclusive access. It also causes the following warnings to
show up:

[65335.896208] ring buffer relay-overwrite-mmap, cpu 5: records were lost. Caused by:
[65335.896208] [ 0 buffer full, 80910 nest buffer wrap-around, 0 event too big ]

Which is caused by corrupted commit counter.

Fix this by sending an IPI to the CPU owning the flushed buffer for
per-cpu synchronization. For global synchronization, no IPI is needed,
since we allow writes from remote CPUs.

Signed-off-by: Mathieu Desnoyers <email address hidden>

21b2893... by Mathieu Desnoyers

Version 2.1.2

Signed-off-by: Mathieu Desnoyers <email address hidden>

88ec3fa... by Samuel Martin <email address hidden>

Fix build and load against linux-2.6.33.x

* lttng-event.h declared but did not implement
  lttng_add_perf_counter_to_ctx on kernel >=2.6.33, the implementation
  was in lttng-context-perf-counters.c, which was only included for
  kernel >=2.6.34. This prevented the module from being loaded.

* on kernel 2.6.33.x, lttng-context-perf-counters.c complains about
  implicit declaration for {get,put}_online_cpus and
  {,un}register_cpu_notifier; so fix header inclusion.

Signed-off-by: Samuel Martin <email address hidden>
Signed-off-by: Mathieu Desnoyers <email address hidden>

1aa9351... by Mathieu Desnoyers

fix block instrumentation probe signature mismatch for 3.9.x kernels

Was causing OOPS when tracing the block layer.

Signed-off-by: Mathieu Desnoyers <email address hidden>

94e7e92... by Mathieu Desnoyers

fix: block instrumentation: NULL pointer dereference

This backtrace appears with kernel 3.9.4:

[19311.313455] BUG: unable to handle kernel NULL pointer dereference at (null)
[19311.316589] IP: [<ffffffffa076392b>] __event_probe__block_bio+0x8b/0x1a0 [lttng_probe_block]
[19311.316589] PGD 41808a067 PUD 4190a3067 PMD 0
[19311.316589] Oops: 0000 [#1] PREEMPT SMP
[19311.316589] Modules linked in: lttng_probe_writeback(O) lttng_probe_workqueue(O) lttng_probe_vmscan(O) lttng_probe_udp(O) lm
[19311.316589] CPU 4
[19311.316589] Pid: 1423, comm: kjournald Tainted: G O 3.9.4-trace-test #143 Supermicro X7DAL/X7DAL
[19311.316589] RIP: 0010:[<ffffffffa076392b>] [<ffffffffa076392b>] __event_probe__block_bio+0x8b/0x1a0 [lttng_probe_block]
[19311.316589] RSP: 0018:ffff88041c0d5aa8 EFLAGS: 00010246
[19311.316589] RAX: 0000000000000000 RBX: ffff88041a994c00 RCX: ffff88041b610800
[19311.316589] RDX: 0000000000000004 RSI: ffff88041c0d5b00 RDI: ffff88041c0d5ab0
[19311.316589] RBP: ffff88041c0d5b18 R08: fffffffffffc0000 R09: 0000000000000000
[19311.316589] R10: ffff88041a9947e8 R11: 0000000000000001 R12: ffff880419252c18
[19311.316589] R13: ffff88041c9e0a48 R14: ffff88041749bd10 R15: ffff88041749bd20
[19311.316589] FS: 0000000000000000(0000) GS:ffff88042fd00000(0000) knlGS:0000000000000000
[19311.316589] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[19311.316589] CR2: 0000000000000000 CR3: 0000000417a0a000 CR4: 00000000000007e0
[19311.316589] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[19311.316589] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[19311.316589] Process kjournald (pid: 1423, threadinfo ffff88041c0d4000, task ffff88041daa0000)
[19311.316589] Stack:
[19311.316589] 0000000000000000 ffff88041c3e4000 ffff880408987518 0000000000000024
[19311.316589] 0000000400000001 ffff88042fd19e30 000000000000002a 00000000000126c1
[19311.316589] 00000000000126bb 0000119043fa4efc ffff880400000000 0000000000000246
[19311.316589] Call Trace:
[19311.316589] [<ffffffff812e79fb>] bio_attempt_back_merge+0xfb/0x1e0
[19311.316589] [<ffffffff812ec96f>] blk_queue_bio+0x43f/0x570
[19311.316589] [<ffffffff812ea68a>] generic_make_request+0xca/0x100
[19311.316589] [<ffffffff81495bfe>] raid1_unplug+0x12e/0x170
[19311.316589] [<ffffffff812ec36c>] blk_flush_plug_list+0x9c/0x210
[19311.316589] [<ffffffff812ec4f8>] blk_finish_plug+0x18/0x50
[19311.316589] [<ffffffff81243a48>] journal_commit_transaction+0x728/0x1f30
[19311.316589] [<ffffffff8104e463>] ? try_to_del_timer_sync+0x53/0x70
[19311.316589] [<ffffffff8104e485>] ? del_timer_sync+0x5/0xd0
[19311.316589] [<ffffffff8124967d>] kjournald+0xed/0x290
[19311.316589] [<ffffffff8106a270>] ? __init_waitqueue_head+0x60/0x60
[19311.316589] [<ffffffff81249590>] ? commit_timeout+0x10/0x10
[19311.316589] [<ffffffff810695fb>] kthread+0xdb/0xe0
[19311.316589] [<ffffffff81672c60>] ? _raw_spin_unlock_irq+0x30/0x60
[19311.316589] [<ffffffff81069520>] ? __init_kthread_worker+0x70/0x70
[19311.316589] [<ffffffff8167af1c>] ret_from_fork+0x7c/0xb0
[19311.316589] [<ffffffff81069520>] ? __init_kthread_worker+0x70/0x70

Fixes #559

Signed-off-by: Mathieu Desnoyers <email address hidden>