The performance loop

A practical guide to profiling and benchmarking

[SimpleJob]
[MemoryDiagnoser]
public class StringJoinBenchmarks {

  [Benchmark]
  public string StringJoin() {
    return string.Join(", ", Enumerable.Range(0, 10).Select(i => i.ToString()));
  }

  [Benchmark]
  public string StringBuilder() {
    var sb = new StringBuilder();
    for (int i = 0; i < 10; i++)
    {
        sb.Append(i);
        sb.Append(", ");
    }

    return sb.ToString(0, sb.Length - 2);
  }

  [Benchmark]
  public string ValueStringBuilder() {
    var seperator = new ReadOnlySpan<char>(new char[] { ',', ' '});
    using var sb = new ValueStringBuilder(stackalloc char[30]);
    for (int i = 0; i < 10; i++)
    {
        sb.Append(i);
        sb.Append(seperator);
    }

    return sb.AsSpan(0, sb.Length - 2).ToString();
  }
}

"simple"

"We were able to see Azure Compute cost reduction of up to 50% per month, on average we observed 24% monthly cost reduction after migrating to .NET 6. The reduction in cores reduced Azure spend by 24%."

Perf   rmance Aware

Bear                     Aware

Be curious....
Understand The Context

  • How is this code going to be executed at scale, and what would the memory characteristics be (gut feeling)
  • Are there simple low-hanging fruits I can apply to accelerate this code?
  • Are there things I can move away from the hot path by simply restructuring a bit my code?
  • What part is under my control and what isn't really?
  • What optimizations can I apply, and when should I stop?

The performance loop

Profile using a harness

   Improve a hot path

   Benchmark and compare

Profile improvements again

Ship to production

The performance loop

  • Profile at least CPU and memory using a profiling harness
  • Improve parts of the hot path

The performance loop

  • Profile at least CPU and memory using a profiling harness

  • Improve parts of the hot path
  • Benchmark and compare
  • Profile improvements again with the harness and make adjustments where necessary
  • Ship and focus your attention to other parts
Queue
Queue
Code
Code
Text is not SVG - cannot display

NServiceBus

Queue
Queue
Message Pump
Message...
Behaviors
Behaviors
Code
Code
...
...
Text is not SVG - cannot display

NServiceBus

Pipeline

public class RequestCultureMiddleware {
    private readonly RequestDelegate _next;

    public RequestCultureMiddleware(RequestDelegate next) {
        _next = next;
    }

    public async Task InvokeAsync(HttpContext context) {
        // Do work that does something before
        await _next(context);
        // Do work that does something after
    }
}

ASP.NET Core Middleware

public class Behavior : Behavior<IIncomingLogicalMessageContext> {
    public override Task 
    	Invoke(IIncomingLogicalMessageContext context, Func<Task> next) {
        // Do work that does something before
        await next();
        // Do work that does something after
    }
}

Behaviors

The performance loop

Profile using a harness

   Improve a hot path

   Benchmark and compare

Profile improvements again

Ship to production

Profiling the pipeline

Profiling the pipeline

> Improving > Benchmarking > Profiling

The harness

var endpointConfiguration = new EndpointConfiguration("Harness");
endpointConfiguration.UseSerialization<JsonSerializer>();
var transport = endpointConfiguration.UseTransport<MsmqTransport>();
endpointConfiguration.UsePersistence<InMemoryPersistence>();

var endpointInstance = await Endpoint.Start(endpointConfiguration);

Console.WriteLine("Attach the profiler and hit <enter>.");
Console.ReadLine();

var tasks = new List<Task>(1000);
for (int i = 0; i < 1000; i++)
{
    tasks.Add(endpointInstance.Publish(new MyEvent()));
}
await Task.WhenAll(tasks);

Console.WriteLine("Publish 1000 done. Get a snapshot");
Console.ReadLine();

Profiling the pipeline

Publish Pipeline

> Improving > Benchmarking > Profiling

The harness

public class MyEventHandler : IHandleMessages<MyEvent> {
    public Task Handle(MyEvent message, IMessageHandlerContext context)
    {
        return Task.CompletedTask;
    }
}

Profiling the pipeline

Receive Pipeline

> Improving > Benchmarking > Profiling

The harness

  • Compiled and executed in Release mode
  • Runs a few seconds and keeps overhead minimal
  • Disabled Tiered JIT
    <TieredCompilation>false</TieredCompilation>
  • Emits full symbols
    <DebugType>pdbonly</DebugType>
    <DebugSymbols>true</DebugSymbols>
var endpointConfiguration = new EndpointConfiguration("Harness");
endpointConfiguration.UseSerialization<JsonSerializer>();
var transport = endpointConfiguration.UseTransport<MsmqTransport>();
endpointConfiguration.UsePersistence<InMemoryPersistence>();

var endpointInstance = await Endpoint.Start(endpointConfiguration);

Console.WriteLine("Attach the profiler and hit <enter>.");
Console.ReadLine();

var tasks = new List<Task>(1000);
for (int i = 0; i < 1000; i++)
{
    tasks.Add(endpointInstance.Publish(new MyEvent()));
}
await Task.WhenAll(tasks);

Console.WriteLine("Publish 1000 done. Get a snapshot");
Console.ReadLine();
public class MyEventHandler : IHandleMessages<MyEvent> {
    public Task Handle(MyEvent message, IMessageHandlerContext context)
    {
        return Task.CompletedTask;
    }
}

Profiling the pipeline

> Improving > Benchmarking > Profiling

Publish

Memory Characteristics

Profiling the pipeline

> Improving > Benchmarking > Profiling

Receive

Memory Characteristics

Profiling the pipeline

> Improving > Benchmarking > Profiling

BehaviorChain

Memory Characteristics

Profiling the pipeline

> Improving > Benchmarking > Profiling

Context matters
You are the expert

Memory Characteristics

Profiling the pipeline

> Improving > Benchmarking > Profiling

  • MSMQ has a diminishing user-base
  • ​Ramping up knowledge may not be feasible
  • Iterative gains on the hot path will lead to overall improvements
  • Pipeline optimizations benefits all users

Memory Characteristics

Profiling the pipeline

> Improving > Benchmarking > Profiling

Memory Characteristics

Profiling the pipeline

> Improving > Benchmarking > Profiling

CPU Characteristics

Profiling the pipeline

> Improving > Benchmarking > Profiling

CPU Characteristics

Profiling the pipeline

> Improving > Benchmarking > Profiling

CPU Characteristics

Publish

Profiling the pipeline

> Improving > Benchmarking > Profiling

CPU Characteristics

Receive

Profiling the pipeline

> Improving > Benchmarking > Profiling

CPU Characteristics

Profiling the pipeline

> Improving > Benchmarking > Profiling

The performance loop

Profile using a harness

   Improve a hot path

   Benchmark and compare

Profile improvements again

Ship to production

Improving

Improving

> Benchmarking > Profiling

Profiling > 

Improving

Improving

> Benchmarking > Profiling

Profiling > 

The performance loop

Profile using a harness

   Improve a hot path

   Benchmark and compare

Profile improvements again

Ship to production

Benchmarking the pipeline

> Profiling

Benchmarking the pipeline

Profiling > Improving >

Benchmarking the pipeline

  • Copy and paste relevant code
  • Adjust it to the bare essentials to create a controllable environment

Extract Code

> Profiling

Profiling > Improving >

  • Trim down to relevant behaviors
  • Replaced dependency injection container with creating relevant classes
  • Replaced IO-operations with completed tasks

Extract Code

Benchmarking the pipeline

> Profiling

Profiling > Improving >

  • Get started with small steps
  • Culture change takes time
  • Make changes gradually

Performance Culture

Benchmarking the pipeline

> Profiling

Profiling > Improving >

// Special nobs and dials
[Job]
[XyZDiagnoser]
public class Benchmark {

    // Permutations that influence your scenario
    
    [Params(...)]
    public int Parameter1 { get; set; }
    
    [Params(...)]
    public int Parameter2 { get; set; }    


    [GlobalSetup]
    public void SetUp()  {
      // Stuff that you don't want to measure
    }

    [Benchmark(Baseline = true)]
    public void Before() {
        // Your code before the changes
    }

    [Benchmark]
    public void After() {
        // Your code after the changes
    }
}

Benchmarking the pipeline

> Profiling

Profiling > Improving >

  • Measures a distribution of values
| Method |     Mean |    Error |   StdDev | Ratio    | Param    |
|------- |---------:|---------:|---------:|---------:|---------:|
| Before | 51.57 us | 0.311 us | 0.291 us | 1.00     | Value    |
| After  | 21.91 us | 0.138 us | 0.129 us | 0.42     | Value    |
  • Executed until results are stable potentially hundreds or thousands of times
  • Takes minutes or hours
  • Focuses on most common cases of frequently used code (hot path) with the required amount of permutations
  • Cases should be derived from production usage

Unlike a unit test

[ShortRunJob]
[MemoryDiagnoser]
public class PipelineExecution {

    [Params(10, 20, 40)]
    public int PipelineDepth { get; set; }


    [GlobalSetup]
    public void SetUp()  {
        behaviorContext = new BehaviorContext();

        pipelineModificationsBeforeOptimizations = new PipelineModifications();
        for (int i = 0; i < PipelineDepth; i++)
        {
            pipelineModificationsBeforeOptimizations.Additions.Add(RegisterStep.Create(i.ToString(),
                typeof(BaseLineBehavior), i.ToString(), b => new BaseLineBehavior()));
        }

        pipelineModificationsAfterOptimizations = new PipelineModifications();
        for (int i = 0; i < PipelineDepth; i++)
        {
            pipelineModificationsAfterOptimizations.Additions.Add(RegisterStep.Create(i.ToString(),
                typeof(BehaviorOptimization), i.ToString(), b => new BehaviorOptimization()));
        }

        pipelineBeforeOptimizations = new BaseLinePipeline<IBehaviorContext>(null, new SettingsHolder(),
            pipelineModificationsBeforeOptimizations);
        pipelineAfterOptimizations = new PipelineOptimization<IBehaviorContext>(null, new SettingsHolder(),
            pipelineModificationsAfterOptimizations);
    }

    [Benchmark(Baseline = true)]
    public async Task Before() {
        await pipelineBeforeOptimizations.Invoke(behaviorContext);
    }

    [Benchmark]
    public async Task After() {
        await pipelineAfterOptimizations.Invoke(behaviorContext);
    }
}

Benchmarking the pipeline

> Profiling

Profiling > Improving >

[ShortRunJob]
[MemoryDiagnoser]
public class PipelineExecution {

    [Params(10, 20, 40)]
    public int PipelineDepth { get; set; }


    [GlobalSetup]
    public void SetUp()  {
        behaviorContext = new BehaviorContext();

        pipelineModificationsBeforeOptimizations = new PipelineModifications();
        for (int i = 0; i < PipelineDepth; i++)
        {
            pipelineModificationsBeforeOptimizations.Additions.Add(RegisterStep.Create(i.ToString(),
                typeof(BaseLineBehavior), i.ToString(), b => new BaseLineBehavior()));
        }

        pipelineModificationsAfterOptimizations = new PipelineModifications();
        for (int i = 0; i < PipelineDepth; i++)
        {
            pipelineModificationsAfterOptimizations.Additions.Add(RegisterStep.Create(i.ToString(),
                typeof(BehaviorOptimization), i.ToString(), b => new BehaviorOptimization()));
        }

        pipelineBeforeOptimizations = new BaseLinePipeline<IBehaviorContext>(null, new SettingsHolder(),
            pipelineModificationsBeforeOptimizations);
        pipelineAfterOptimizations = new PipelineOptimization<IBehaviorContext>(null, new SettingsHolder(),
            pipelineModificationsAfterOptimizations);
    }

    [Benchmark(Baseline = true)]
    public async Task Before() {
        await pipelineBeforeOptimizations.Invoke(behaviorContext);
    }

    [Benchmark]
    public async Task After() {
        await pipelineAfterOptimizations.Invoke(behaviorContext);
    }
}

Benchmarking the pipeline

> Profiling

Profiling > Improving >

[ShortRunJob]
[MemoryDiagnoser]
public class PipelineExecution {

    [Params(10, 20, 40)]
    public int PipelineDepth { get; set; }


    [GlobalSetup]
    public void SetUp()  {
        behaviorContext = new BehaviorContext();

        pipelineModificationsBeforeOptimizations = new PipelineModifications();
        for (int i = 0; i < PipelineDepth; i++)
        {
            pipelineModificationsBeforeOptimizations.Additions.Add(RegisterStep.Create(i.ToString(),
                typeof(BaseLineBehavior), i.ToString(), b => new BaseLineBehavior()));
        }

        pipelineModificationsAfterOptimizations = new PipelineModifications();
        for (int i = 0; i < PipelineDepth; i++)
        {
            pipelineModificationsAfterOptimizations.Additions.Add(RegisterStep.Create(i.ToString(),
                typeof(BehaviorOptimization), i.ToString(), b => new BehaviorOptimization()));
        }

        pipelineBeforeOptimizations = new BaseLinePipeline<IBehaviorContext>(null, new SettingsHolder(),
            pipelineModificationsBeforeOptimizations);
        pipelineAfterOptimizations = new PipelineOptimization<IBehaviorContext>(null, new SettingsHolder(),
            pipelineModificationsAfterOptimizations);
    }

    [Benchmark(Baseline = true)]
    public async Task Before() {
        await pipelineBeforeOptimizations.Invoke(behaviorContext);
    }

    [Benchmark]
    public async Task After() {
        await pipelineAfterOptimizations.Invoke(behaviorContext);
    }
}

Benchmarking the pipeline

> Profiling

Profiling > Improving >

| Method   | Calls  | Depth | Mean       | Error      | StdDev     | Ratio | RatioSD | Gen 0       | Allocated      |
|----------|--------|-------|------------|------------|------------|-------|---------|-------------|----------------|
| Before   | 20000  | 10    | 7.083 ms   | 3.1550 ms  | 0.1729 ms  | 1.00  | 0.00    | 3054.6875   | 19,200,023 B   |
| After    | 20000  | 10    | 1.588 ms   | 1.1607 ms  | 0.0636 ms  | 0.22  | 0.01    | -           | 1 B            |
| Before   | 20000  | 20    | 10.989 ms  | 9.0910 ms  | 0.4983 ms  | 1.00  | 0.00    | 6109.3750   | 38,400,049 B   |
| After    | 20000  | 20    | 2.830 ms   | 2.4414 ms  | 0.1338 ms  | 0.26  | 0.00    | -           | 2 B            |
| Before   | 20000  | 40    | 23.054 ms  | 11.1449 ms | 0.6109 ms  | 1.00  | 0.00    | 12218.7500  | 76,800,012 B   |
| After    | 20000  | 40    | 5.192 ms   | 4.4372 ms  | 0.2432 ms  | 0.23  | 0.02    | -           | 3 B            |

Benchmarking the pipeline

> Profiling

Profiling > Improving >

  • Single Responsibility Principle
  • No side effects
  • Prevents dead code elimination
  • Delegates heavy lifting to the framework
  • Is explicit
    • No implicit casting
    • Explicit types were necessary
  • Avoid running any other resource-heavy processes while benchmarking

practices

Benchmarking the pipeline

> Profiling

Profiling > Improving >

Benchmarking is really hard


BenchmarkDotNet will protect you from the common pitfalls because it does all the dirty work for you

Benchmarking the pipeline

> Profiling

Profiling > Improving >

[ShortRunJob]
[MemoryDiagnoser]
public class Step1_PipelineWarmup {
    // rest almost the same

    [Benchmark(Baseline = true)]
    public BaseLinePipeline<IBehaviorContext> Before() {
        var pipelineBeforeOptimizations = new BaseLinePipeline<IBehaviorContext>(null, new SettingsHolder(),
            pipelineModificationsBeforeOptimizations);
        return pipelineBeforeOptimizations;
    }

    [Benchmark]
    public PipelineOptimization<IBehaviorContext> After() {
        var pipelineAfterOptimizations = new PipelineOptimization<IBehaviorContext>(null, new SettingsHolder(),
            pipelineModificationsAfterOptimizations);
        return pipelineAfterOptimizations;
    }
}

Benchmarking the pipeline

> Profiling

Profiling > Improving >

[ShortRunJob]
[MemoryDiagnoser]
public class Step2_PipelineException {
    [GlobalSetup]
    public void SetUp() {
        ...
        var stepdId = PipelineDepth + 1;
        pipelineModificationsBeforeOptimizations.Additions.Add(RegisterStep.Create(stepdId.ToString(), typeof(Throwing), "1", b => new Throwing()));

        ...
        pipelineModificationsAfterOptimizations.Additions.Add(RegisterStep.Create(stepdId.ToString(), typeof(Throwing), "1", b => new Throwing()));

        pipelineBeforeOptimizations = new Step1.PipelineOptimization<IBehaviorContext>(null, new SettingsHolder(),
            pipelineModificationsBeforeOptimizations);
        pipelineAfterOptimizations = new PipelineOptimization<IBehaviorContext>(null, new SettingsHolder(),
            pipelineModificationsAfterOptimizations);
    }

    [Benchmark(Baseline = true)]
    public async Task Before() {
        try
        {
            await pipelineBeforeOptimizations.Invoke(behaviorContext).ConfigureAwait(false);
        }
        catch (InvalidOperationException)
        {
        }
    }

    [Benchmark]
    public async Task After() {
        try
        {
            await pipelineAfterOptimizations.Invoke(behaviorContext).ConfigureAwait(false);
        }
        catch (InvalidOperationException)
        {
        }
    }
    
    class Throwing : Behavior<IBehaviorContext> {
        public override Task Invoke(IBehaviorContext context, Func<Task> next)
        {
            throw new InvalidOperationException();
        }
    }
}

Benchmarking the pipeline

> Profiling

Profiling > Improving >

[ShortRunJob]
[MemoryDiagnoser]
public class Step2_PipelineException {
    [GlobalSetup]
    public void SetUp() {
        ...
        var stepdId = PipelineDepth + 1;
        pipelineModificationsBeforeOptimizations.Additions.Add(RegisterStep.Create(stepdId.ToString(), typeof(Throwing), "1", b => new Throwing()));

        ...
        pipelineModificationsAfterOptimizations.Additions.Add(RegisterStep.Create(stepdId.ToString(), typeof(Throwing), "1", b => new Throwing()));

        pipelineBeforeOptimizations = new Step1.PipelineOptimization<IBehaviorContext>(null, new SettingsHolder(),
            pipelineModificationsBeforeOptimizations);
        pipelineAfterOptimizations = new PipelineOptimization<IBehaviorContext>(null, new SettingsHolder(),
            pipelineModificationsAfterOptimizations);
    }
}

Benchmarking the pipeline

> Profiling

Profiling > Improving >

[ShortRunJob]
[MemoryDiagnoser]
public class Step2_PipelineException {
    [GlobalSetup]
    public void SetUp() {
      ...
    }

    [Benchmark(Baseline = true)]
    public async Task Before() {
        try
        {
            await pipelineBeforeOptimizations.Invoke(behaviorContext);
        }
        catch (InvalidOperationException)
        {
        }
    }

    [Benchmark]
    public async Task After() {
        try
        {
            await pipelineAfterOptimizations.Invoke(behaviorContext);
        }
        catch (InvalidOperationException)
        {
        }
    }
    ...
}

Benchmarking the pipeline

> Profiling

Profiling > Improving >

Benchmarking the pipeline

> Profiling

Profiling > Improving >

The performance loop

Profile using a harness

   Improve a hot path

   Benchmark and compare

Profile improvements again

Ship to production

Profiling the pipeline (Again)

Profiling the pipeline (again)

Profiling > Improving > Benchmarking >

Profiling the pipeline (again)

Publish

Memory Characteristics

Before

After

Profiling > Improving > Benchmarking >

Receive

Memory Characteristics

Profiling the pipeline (again)

After

Before

Profiling > Improving > Benchmarking >

Memory Characteristics

Profiling the pipeline (again)

Receive

After

Before

Profiling > Improving > Benchmarking >

Memory Characteristics

Profiling the pipeline (again)

After

Before

Profiling > Improving > Benchmarking >

oh look, there is nothing 😌

CPU Characteristics

Publish

Profiling the pipeline (again)

Profiling > Improving > Benchmarking >

CPU Characteristics

Publish

Profiling the pipeline (again)

Profiling > Improving > Benchmarking >

After

Before

CPU Characteristics

Receive

Profiling the pipeline (again)

Profiling > Improving > Benchmarking >

CPU Characteristics

Receive

Profiling the pipeline (again)

Profiling > Improving > Benchmarking >

After

Before

NServiceBus Pipeline
NServiceBus Pipeline
NServiceBus Transport
NServiceBus Transport
MSMQ
MSMQ
Text is not SVG - cannot display

Getting lower on the stack

NServiceBus Pipeline
NServiceBus Pipeline
NServiceBus Transport
NServiceBus Transport
Azure.Messaging.ServiceBus
Azure.Messaging.ServiceBus
Microsoft.Azure.Amqp
Microsoft.Azure.Amqp
Text is not SVG - cannot display

Getting lower on the stack

Getting lower on the stack

The harness

await using var serviceBusClient = new ServiceBusClient(connectionString);

await using var sender = serviceBusClient.CreateSender(destination);
var messages = new List<ServiceBusMessage>(1000);
for (int i = 0; i < 1000; i++) {
    messages.Add(new ServiceBusMessage(UTF8.GetBytes($"Deep Dive {i} Deep Dive {i} Deep Dive {i} Deep Dive {i} Deep Dive {i} Deep Dive {i}")));

    if (i % 100 == 0) {
        await sender.SendMessagesAsync(messages);
        messages.Clear();
    }
}

await sender.SendMessagesAsync(messages);

WriteLine("Messages sent");
Console.WriteLine("Take snapshot");
Console.ReadLine();

var countDownEvent = new CountdownEvent(1000);

var processorOptions = new ServiceBusProcessorOptions {
    AutoCompleteMessages = true,
    MaxConcurrentCalls = 100,
    MaxAutoLockRenewalDuration = TimeSpan.FromMinutes(10),
    ReceiveMode = ServiceBusReceiveMode.PeekLock,
};

await using var receiver = serviceBusClient.CreateProcessor(destination, processorOptions);
receiver.ProcessMessageAsync += async messageEventArgs => {
    var message = messageEventArgs.Message;
    await Out.WriteLineAsync(
        $"Received message with '{message.MessageId}' and content '{UTF8.GetString(message.Body)}' / binary {message.Body}");
    countDownEvent.Signal();
};
// rest omitted
await receiver.StartProcessingAsync();

countDownEvent.Wait();

Console.WriteLine("Take snapshot");
Console.ReadLine();

await receiver.StopProcessingAsync();

Getting lower on the stack

The harness

await using var serviceBusClient = new ServiceBusClient(connectionString);

await using var sender = serviceBusClient.CreateSender(destination);
var messages = new List<ServiceBusMessage>(1000);
for (int i = 0; i < 1000; i++) {
    messages.Add(new ServiceBusMessage(UTF8.GetBytes($"Deep Dive {i} Deep Dive {i} Deep Dive {i} Deep Dive {i} Deep Dive {i} Deep Dive {i}")));

    if (i % 100 == 0) {
        await sender.SendMessagesAsync(messages);
        messages.Clear();
    }
}

await sender.SendMessagesAsync(messages);

WriteLine("Messages sent");
Console.WriteLine("Take snapshot");
Console.ReadLine();

Getting lower on the stack

The harness

var countDownEvent = new CountdownEvent(1000);

var processorOptions = new ServiceBusProcessorOptions
{
    AutoCompleteMessages = true,
    MaxConcurrentCalls = 100,
    MaxAutoLockRenewalDuration = TimeSpan.FromMinutes(10),
    ReceiveMode = ServiceBusReceiveMode.PeekLock,
};

await using var receiver = serviceBusClient.CreateProcessor(destination, processorOptions);
receiver.ProcessMessageAsync += async messageEventArgs => {
    var message = messageEventArgs.Message;
    await Out.WriteLineAsync(
        $"Received message with '{message.MessageId}' and content '{UTF8.GetString(message.Body)}' / binary {message.Body}");
    countDownEvent.Signal();
};
// rest omitted
await receiver.StartProcessingAsync();

countDownEvent.Wait();

Console.WriteLine("Take snapshot");
Console.ReadLine();

await receiver.StopProcessingAsync();

Getting lower on the stack

Memory Characteristics

Getting lower on the stack

Memory Characteristics

Getting lower on the stack

Preventing regressions

C:\Projects\performance\src\tools\ResultsComparer> dotnet run --base "C:\results\before" 
--diff "C:\results\after" --threshold 2%
C:\Projects\performance\src\benchmarks\micro> dotnet run -c Release -f net8.0 \
    --artifacts "C:\results\before"
C:\Projects\performance\src\benchmarks\micro> dotnet run -c Release -f net8.0 \
    --artifacts "C:\results\after"

"CPU-bound benchmarks are much more stable than Memory/Disk-bound benchmarks, but the average performance levels still can be up to
three times different across builds."

The performance loop

A practical guide to profiling and benchmarking

github.com/danielmarbach/BeyondSimpleBenchmarks

  • Use the performance loop to improve your code where it matters
  • Combine it with profiling to observe how the small changes add up
  • Optimize until you hit a diminishing point of return
  • You'll learn a ton about potential improvements for a new design

Profile

Improve

Benchmark

Profile

Ship