aa-logprof: doesn't handle large logs

Bug #387657 reported by Tom Metro
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
AppArmor
Fix Released
Wishlist
Unassigned
apparmor (Ubuntu)
Confirmed
Wishlist
Unassigned

Bug Description

Binary package hint: apparmor

Ubuntu 8.04.2
Package: apparmor-utils
Version: 2.1+1075-0ubuntu9.2

My first experience with AppArmor was finding a kernel log file that was full of 800 MB of AppArmor warnings from Samba and CUPS. I'm not sure what suddenly enabled AppArmor on this LTS system, but while fixing the problem I noticed....

aa-logprof, while processing an 800 MB log file, drove the load average up into the high 40's. I suspect it was trying to load the whole thing into memory (on a system with 1 GB of RAM). I'd recommend revising the architecture so that it processes the lines as it sees them.

I worked around the problem by splitting the file into a dozen 75MB chunks.

Tags: aa-tools
Changed in apparmor (Ubuntu):
status: New → Confirmed
importance: Undecided → Wishlist
Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Is this still an issue on Ubuntu 14.04 or 14.10?

Changed in apparmor (Ubuntu):
status: Confirmed → Incomplete
tags: added: aa-tools
Changed in apparmor:
importance: Undecided → Wishlist
status: New → Incomplete
Revision history for this message
Christian Boltz (cboltz) wrote :

I just checked the code - see aa.py do_logprof_pass(). Shortened quote (comments removed):

    log_reader = apparmor.logparser.ReadLog(pid, filename, existing_profiles, profile_dir, log)
    log = log_reader.read_log(logmark)

    for root in log:
        handle_children('', '', root)

    for pid in sorted(profile_changes.keys()):
        set_process(pid, profile_changes[pid])

    collapse_log()

So it seems first the full log is read, then handle_children processes the log entries, set_process() changes the profiles of running processes (if they have null-XY subprofiles) and finally collapse_log() is called.

handle_children() loops over all log events, so it should be easy to change it to get one call per log entry.

handle_children() changes profile_changes at various places, so integrating set_process() causes some work. The solution is probably to change all "profile_changes[pid] = ..." to call a helper function that
- checks if profile_changes[pid] is already set and, if it is, is identical to the new value
- if there is a real change, call set_process() for that pid
- and of course include profile_changes[pid] = ...

After that, integrating collapse_log() shouldn't be too hard.

With this change, only events that cause questions for profile changes will be kept in memory.

Changed in apparmor:
status: Incomplete → Triaged
Revision history for this message
Christian Boltz (cboltz) wrote :

A good (and even easier) start would be to check for duplicates in add_to_tree() before blindly adding events to self.pid[loc_pid].

Changed in apparmor (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Christian Boltz (cboltz) wrote :

Just for the records - bzr trunk contains some patches that make aa-logprof faster *a lot* (~70% time saved) (thanks Peter!). Those patches didn't even touch the functions I mentioned above, so there's still room for improvement ;-)

Revision history for this message
Christian Boltz (cboltz) wrote :

In the meantime (actually nearly a year ago), log parsing was rewritten and now does de-duplication instantly. This should reduce memory usage a lot - my experience is that especially large lots have lots of duplication included.

I also removed some intermediate steps in the chain from logfile to aa-logprof questions, which means less intermediate variables/arrays that eat quite some memory.

The new log parsing is currently only in git master (not in 2.13.x). It will be included in the next major release (probably 3.0).

Processing / asking about the log events one by one (instead of reading the whole log first) would in theory save even more memory. However, IMHO it would make the user experience worse because it would also mean loosing the ability to sort the questions aa-logprof asks by profile and rule type.

Therefore I'm closing this bug as "mostly fixed" ;-) - if you still see aa-logprof eating too much memory, feel free to open a new bug.

Changed in apparmor:
status: Triaged → Fix Committed
Revision history for this message
intrigeri (intrigeri) wrote :

1.5 later with no feedback, let's assume the tentative fix works.

Changed in apparmor:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.