UberProf performance improvements, or: when O(N^3 + N) is not fast enough

time to read 5 min | 822 words

While working on improving the performance of the profiler, I got really annoyed. The UI times for updating sessions with large amount of statements was simply unacceptable. We already virtualized the list UI, so it couldn’t be that. I started tracking it down, and it finally came down to the following piece of code:

protected override void Update(IEnumerable<IStatementSnapshot> snapshots)
{
    foreach (var statementSnapshot in snapshots)
    {
        var found = (from model in unfilteredStatements
                     where model.Id == statementSnapshot.StatementId
                     select model).FirstOrDefault();

        if (found == null)
        {
            unfilteredStatements.Add(
                new StatementModel(applicationModel, statementSnapshot) {CanJumpToSession = false});
        }
        else
        {
            found.UpdateWith(statementSnapshot);
        }
    }

    if (snapshots.Any())
    {
        hasPolledForStatements = true;
        StatementsChanged(this, 
            new StatementsChangedEventArgs {HasPolledForStatments = hasPolledForStatements});
    }
}

This looks pretty innocent, right? At a guess, what is the performance characteristics of this code?

It isn’t O(N), that is for sure, look at the linq query, it will perform a linear search in the unfilteredStatements, which is an O(N) operation. *

At first glance, it looks like O(N*N), right? Wrong!

unfilteredStatements is an ObservableCollection, and guess what is going on in the CollectionChanged event? A statistics function that does yet another O(N) operation for every single added statement.

Finally, we have the StatementsChanged handler, which will also perform an O(N) operation, but only once.

protected override void Update(IEnumerable<IStatementSnapshot> snapshots)
{
    var hasStatements = false;
    foreach (var statementSnapshot in snapshots)
    {
        IStatementModel found;
        statementsById.TryGetValue(statementSnapshot.StatementId, out found);
        hasStatements = true;

        if (found == null)
        {
            unfilteredStatements.Add(new StatementModel(applicationModel, statementSnapshot) {CanJumpToSession = false});
            statementsById.Add(statementSnapshot.StatementId, found);
        }
        else
        {
            found.UpdateWith(statementSnapshot);
        }
    }

    if (hasStatements == false) 
        return;
    HandleStatementsOrFiltersChanging(true);
}

The changes here are subtle, first, we killed off the O(N) linq query in favor of an O(1) lookup in a hashtable. Second, unfilteredStatements is now a simple List<T>, and HandleStatementsOrFilterChanging is the one responsible for notifications. Just these two changes were sufficent from the O(N^3+N) that we had before to a simple O(N+N) (because we still run some statistical stuff at the end) which collapses so a simple O(N).

Once that is done, showing sessions with 50,000 statements in them became instantaneous.

* Yes, it is supposed to be O(M) because the unfilteredStatements.Count is different than snapshots.Count, but I am simplifying here to make things easier.