Header Ads Widget

Performance Improvements in RyuJIT in .NET Core and .NET Framework

by Joseph Tremoulet [MSFT] 

RyuJIT is the just-in-time compiler used by .NET Core on x64 and now x86 and by the .NET Framework on x64 to compile MSIL bytecode to native machine code when a managed assembly executes. I’d like to point out some of the past year’s improvements that have gone into RyuJIT, and how they make the generated code faster.

What follows is by no means a comprehensive list of RyuJIT optimization improvements, but rather a few hand-picked examples that should make for a fun read and point to some of the issues and pull requests on GitHub that highlight the great community interactions and contributions that have helped shape this work. Be sure to also check out Stephen Toub’s recent post about performance improvements in the runtime and base class libraries, if you haven’t already.

This post will be comparing the performance of RyuJIT in .NET Framework 4.6.2 to its performance in .NET Core 2.0 and .NET Framework 4.7.1. Note that .NET Framework 4.7.1 has not yet shipped and I am using an early private build of the product. The same RyuJIT compiler sources are shared between .NET Core and .NET Framework, so the compiler changes discussed here are present in both .NET Core 2.0 and .NET Framework 4.7.1 builds.

NOTE: Code examples included in this post use manual Stopwatch invocations, with arbitrarily fixed iteration counts and no statistical analysis, as a zero-dependency way to corroborate known large performance deltas. The timings quoted below were collected on the same machine, with compared runs executed back-to-back, but even so it would be ill-advised to extrapolate quantitative results; they serve only to confirm that the optimizations improve the performance of the targeted code sequences rather than degrade it. Active performance work, of course, demands real benchmarking, which comes with a whole host of subtle issues that it is well worth taking a dependency to manage properly. Andrey Akinshin recently wrote a great blog post discussing this, using the code snippets from Stephen’s post as examples. He will publish a follow-on post to this one with additional benchmarks soon. Thanks Andrey!

The machine code sequence that the just-in-time compiler emits for a virtual call necessarily involves some degree of indirection, so that the correct target override method can be determined when the machine code executes. Compared to a direct call, this indirection imposes nontrivial overhead. RyuJIT can now identify that certain virtual call sites will always have one particular target override, and replace those virtual calls with direct ones. This avoids the overhead of the virtual indirection and, better still, allows inlining the callee method into the callsite, eliminating call overhead entirely and giving optimizations better insight into the effects of the code. This can happen when the target object has sealed type, or when its allocation site is immediately apparent and thus its exact type is known. This optimization was introduced to RyuJIT in dotnet/coreclr #9230; was subsequently improved by dotnet/coreclr #10192, dotnet/coreclr #10432, and dotnet/coreclr #10471; and has plenty more room for improvement.
The PRs for the changes include some statistics (e.g. 7.3% of virtual calls in System.Private.CoreLib get devirtualized) and real-world examples (e.g. this diff in ConcurrentStack.GetEnumerator() — to see the code diff at that link you may have to scroll past the quoted output from jit-diff, which is a tool we use for assessing compiler change impact. It reports any code size increase as a “regression”, though in this case the code size increases are likely from enabling inlines, which is actually an improvement). Here’s a minimal example to illustrate the optimization in action:

using System;
using System.Diagnostics; // for Stopwatch
using System.Runtime.CompilerServices; // for MethodImpl

public abstract class Operation  // abstract unary integer operation
    public abstract int Operate(int input);
    public int OperateTwice(int input) => Operate(Operate(input)); // two virtual calls to Operate
public sealed class Increment : Operation // concrete, sealed operation: increment by fixed amount
    public readonly int Amount;
    public Increment(int amount = 1) { Amount = amount; }
    public override int Operate(int input) => input + Amount;
class Test  // driver class
    int input;   // input for test method PostDoubleIncrememnt
    int output;  // output for ^
    int PostDoubleIncrement(Increment inc) // Parameter type is sealed Incremement class
        output = inc.OperateTwice(input);  // inlining OperateTwice brings in two virtual calls to Operate
        return input;  // returns input unchanged, but virtual calls obscure the unchanged-ness
    public static int Main(string[] args)
        var inc = new Increment();
        var test = new Test() { input = 12 };
        while (true)
            var sw = Stopwatch.StartNew();
            for (int i = 0; i < 100000000; i++)
                test.input = test.output;

Method Operation.OperateTwice takes an instance parameter of abstract type Operation, and makes two virtual calls to its Operate method.
When run with the version of the RyuJIT compiler included in .NET Framework 4.6.2, OperateTwice is inlined into Test.PostDoubleIncrement, leaving PostDoubleIncrement with two virtual calls:


Post a Comment


  1. Nice post, Thanks for sharing Get more update at
    .Net Online Training

  2. Wow, thanks. I love seeing all the pictures. It makes it so easy to browse. I'm pinning this for letter review next year. Right now,

    Dot Net Training in Chennai | Dot Net Training in anna nagar | Dot Net Training in omr | Dot Net Training in porur | Dot Net Training in tambaram | Dot Net Training in velachery


thank you for your comment