The Machinery Behind the Magic: How Kotlin Turns suspend into State Machines

skydovesJaewoong Eum (skydoves)||27 min read

The Machinery Behind the Magic: How Kotlin Turns suspend into State Machines

Kotlin Coroutines have become the standard for asynchronous programming on the JVM, offering developers a way to write sequential, readable code that can pause and resume without blocking threads. Most developers interact with coroutines through familiar APIs like launch, async, and Flow, treating suspend as a language keyword that "just works." But coroutines are not simply a library feature layered on top of the language. They are a compiler level solution, built through the Kotlin compiler's IR lowering pipeline and bytecode generation, that transforms your sequential code into resumable state machines. The suspend keyword triggers a series of compiler transformations that rewrite your function's structure, signature, and control flow before it ever reaches the JVM.

In this article, you'll dive deep into the Kotlin compiler's coroutine machinery, exploring the six stage transformation pipeline that converts a suspend function into a state machine. You'll trace through how the compiler injects hidden continuation parameters through CPS transformation, how it generates continuation classes with the clever sign bit trick for distinguishing fresh calls from resumptions, how the bytecode level transformer collects suspension points and inserts a TABLESWITCH dispatch, how local variables are "spilled" into continuation fields to survive across suspension, and how tail call optimization lets the compiler skip the entire state machine when it can prove every suspension point is a tail call.

The fundamental problem: How do you make a function resumable?

Consider this suspend function:

suspend fun fetchUserData(): UserData {
    val user = fetchUser()
    val profile = fetchProfile(user.id)
    return UserData(user, profile)
}

This looks like ordinary sequential code, but both fetchUser() and fetchProfile() might perform network requests that take hundreds of milliseconds. The function must be able to pause at each call, release the thread entirely, and later resume execution at the exact point where it left off, with all local variables intact.

The JVM provides no native mechanism for this. A JVM method is a stack frame, and when a method returns, its stack frame is gone. There is no way to "freeze" a stack frame, release the thread, and later restore it. The function must return to release the thread, but returning destroys the local state.

The Kotlin compiler solves this by transforming each suspend function into a state machine. The function's body is split into segments between suspension points. Local variables are saved into fields of a continuation object before each suspension, and restored after resumption. A label field tracks which segment to execute next, and a TABLESWITCH at the function entry dispatches to the correct segment. The developer writes linear code; the compiler generates the machinery to break it apart and reassemble it on demand.

The six stage pipeline: From suspend to state machine

The transformation happens across six distinct phases in the JVM backend. Understanding the full pipeline is essential to understanding why each phase exists and what it contributes.

  1. SuspendLambdaLowering: Converts suspend lambda expressions into anonymous continuation classes
  2. TailCallOptimizationLowering: Identifies suspend calls in tail position and marks them with IrReturn wrappers
  3. AddContinuationLowering: The central IR lowering, generates continuation classes, injects $completion parameters, creates static suspend implementations
  4. Code generation: Lowers IR to JVM bytecode, placing BeforeSuspendMarker/AfterSuspendMarker instructions around each suspension point
  5. CoroutineTransformerMethodVisitor: The bytecode level state machine engine, inserts the TABLESWITCH, spills variables, generates resume paths
  6. Tail call optimization check: If all suspension points are tail calls, the state machine is skipped entirely

Let's trace through each phase.

CPS transformation: The invisible parameter

The foundation of coroutine compilation is Continuation Passing Style (CPS) transformation. Every suspend function, when compiled, receives a hidden additional parameter: the continuation. This continuation represents "what happens next" after the function completes or suspends.

When you write:

suspend fun fetchUser(): User {
    // ...
}

The compiler transforms the signature to:

fun fetchUser($completion: Continuation<User>?): Any?

Two changes happen. First, a $completion parameter of type Continuation is appended. Second, the return type becomes Any?, because the function can now return either the actual result or the special sentinel COROUTINE_SUSPENDED, indicating that the function has paused and will deliver its result later through the continuation.

Looking at how AddContinuationLowering performs this injection:

val continuationParameter = buildValueParameter(function) {
    kind = IrParameterKind.Regular
    name = Name.identifier(SUSPEND_FUNCTION_COMPLETION_PARAMETER_NAME) // "$completion"
    type = continuationType(context).substitute(substitutionMap)        // Continuation<RetType>?
    origin = JvmLoweredDeclarationOrigin.CONTINUATION_CLASS
}

The parameter is inserted before any default argument masks but after all regular parameters. This is invisible in source code but always present in the bytecode. Every call site of a suspend function is also rewritten to pass the current continuation as this extra argument.

The continuation class: Where state lives

The central artifact of coroutine compilation is the continuation class. For each named suspend function, the compiler generates an inner class that extends ContinuationImpl and holds all the state needed to suspend and resume.

Looking at generateContinuationClassForNamedFunction in AddContinuationLowering.kt:

context.irFactory.buildClass {
    name = Name.special("<Continuation>")
    origin = JvmLoweredDeclarationOrigin.CONTINUATION_CLASS
}.apply {
    superTypes += context.symbols.continuationImplClass.owner.defaultType

    val resultField = addField(CONTINUATION_RESULT_FIELD_NAME, ...)   // "result"
    val labelField = addField(COROUTINE_LABEL_FIELD_NAME, ...)        // "label"
    val capturedThisField = ...  // captures outer `this` for instance methods

    addConstructorForNamedFunction(capturedThisField, ...)
    addInvokeSuspendForNamedFunction(irFunction, resultField, labelField, ...)
}

The generated class has three essential fields:

  • label: Int: The state machine index, tracking which segment of the function body to execute next
  • result: Any?: Holds the value passed to resumeWith when the coroutine resumes
  • this$0 (optional): Captures the dispatch receiver for instance methods

Additionally, spilled local variable fields (L$0, L$1, I$0, etc.) are added later during bytecode transformation. These hold local variables that must survive across suspension points.

The invokeSuspend method: The re entry point

The continuation class overrides invokeSuspend, which the coroutine runtime calls when a suspended coroutine is resumed. This method stores the resume value, sets the sign bit on the label, and calls back into the original function:

override fun invokeSuspend(result: Result<Any?>): Any? {
    this.result = result
    this.label = this.label or (1 shl 31)  // SET the sign bit
    return foo(this)  // re enter the function with `this` as the continuation
}

The sign bit trick is worth examining closely.

The sign bit trick: Distinguishing fresh calls from resumptions

When a suspend function receives a continuation as $completion, it needs to answer a important question: "Is this a fresh call, or am I being resumed from a previous suspension?" The answer determines whether to start from the beginning or jump to the saved state.

There are three scenarios:

  1. Direct call from another suspend function: A fresh call with a caller provided continuation
  2. Resume via resumeWith: The runtime calls invokeSuspend, which re enters the function with the continuation object itself as $completion
  3. Recursive call: The function calls itself, passing a continuation of the same type

To distinguish case 1 from cases 2 and 3, the compiler uses an INSTANCEOF check. If $completion is an instance of the function's own continuation class, it might be a resume or a recursive call. To distinguish case 2 from case 3, the compiler uses the sign bit of the label field:

val signBit = 1 shl 31   // 0x80000000
+irSetField(
    irGet(function.dispatchReceiverParameter!!), labelField,
    irCallOp(
        context.irBuiltIns.intClass.functions.single {
            it.owner.name == OperatorNameConventions.OR
        },
        context.irBuiltIns.intType,
        irGetField(irGet(function.dispatchReceiverParameter!!), labelField),
        irInt(signBit)  // label = label | 0x80000000
    )
)

When invokeSuspend is called (the resume path), it ORs 0x80000000 into the label. The function entry prelude then checks this bit: if set, this is a genuine resume. If not set, even though the continuation passes the INSTANCEOF check, it's a recursive call and should be treated as fresh.

The entry prelude, generated by prepareMethodNodePreludeForNamedFunction in the bytecode transformer, implements this logic in three stages.

First, it checks whether the incoming $completion is an instance of this function's own continuation class:

ALOAD $completion
INSTANCEOF Foo$1 // Is it our continuation class?
IFEQ createNewContinuation // No -> fresh call

If the INSTANCEOF check passes, the prelude casts it and inspects the sign bit of label. This is where the distinction between resume and recursive call happens:

ALOAD $completion
CHECKCAST Foo$1
ASTORE $continuation

ALOAD $continuation
GETFIELD label
ICONST 0x80000000
IAND // label & 0x80000000
IFEQ createNewContinuation // Sign bit not set -> recursive call, treat as fresh

If the sign bit is set, this is a genuine resume. The prelude clears the bit (restoring the original label value) and jumps to the state machine dispatch:

ALOAD $continuation
DUP
GETFIELD label
ICONST 0x80000000
ISUB // label - 0x80000000 (clears the sign bit)
PUTFIELD label
GOTO afterCreate

If either check fails (not our class, or no sign bit), the prelude allocates a fresh continuation and loads the resume value:

createNewContinuation:
  NEW Foo$1
  DUP
  ALOAD this
  ALOAD $completion
  INVOKESPECIAL Foo$1.<init>
  ASTORE $continuation

afterCreate:
  ALOAD $continuation
  GETFIELD result
  ASTORE $result // Load the resume value into $result local

This is cool. A single bit in the label integer serves as a flag that perfectly disambiguates three different calling scenarios, without requiring an additional field or any runtime cost beyond a bitwise AND.

The static suspend implementation: Avoiding virtual dispatch

For overridable suspend functions (non final, non private), the compiler creates a static implementation method and rewrites the original to delegate:

private fun createStaticSuspendImpl(irFunction: IrSimpleFunction): IrSimpleFunction {
    val static = createStaticFunctionWithReceivers(
        irFunction.parent,
        irFunction.name.toSuspendImplementationName(),  // "foo$suspendImpl"
        irFunction,
        origin = JvmLoweredDeclarationOrigin.SUSPEND_IMPL_STATIC_FUNCTION,
    )
    static.body = irFunction.moveBodyTo(static)

    // Original method becomes a simple forwarder:
    irFunction.body = irBuilder.irBlockBody {
        +irReturn(irCall(static).also {
            it.arguments.assignFrom(irFunction.parameters, ::irGet)
        })
    }
    return static
}

The state machine lives in the static foo$suspendImpl method. The original virtual method simply delegates. This prevents a important problem: if a subclass overrides foo, the resumed continuation must call back into the original implementation's state machine, not the subclass's override. Static dispatch guarantees this.

Suspend lambda transformation: Anonymous continuation classes

Suspend lambdas follow a different path through SuspendLambdaLowering. Each suspend lambda becomes an anonymous class extending SuspendLambda:

val suspendLambda =
    if (reference.isRestrictedSuspension)
        context.symbols.restrictedSuspendLambdaClass.owner
    else
        context.symbols.suspendLambdaClass.owner

// The class extends both SuspendLambda and FunctionN+1
superTypes = listOf(suspendLambda.defaultType, functionNType)

Lambda parameters are stored as fields using a naming convention based on their JVM type descriptor: L$0, L$1 for reference types, I$0 for ints, J$0 for longs, and so on. This naming convention matters because the bytecode transformer uses it to allocate spill fields without collisions.

For lambdas with arity 0 or 1, the compiler generates a create(completion) factory method. The constructor initially passes null for the completion parameter:

+irCall(continuation.constructors.single().symbol).apply {
    arguments[0] = irNull()  // completion = null initially
}

The actual completion is provided later through create(completion) or the invoke override. This separation allows the same lambda class to be instantiated once and invoked multiple times with different completions.

The bytecode transformer: Where the state machine is born

After IR lowering and code generation, each suspend function's bytecode still looks mostly linear, with synthetic BeforeSuspendMarker and AfterSuspendMarker instructions bracketing each suspension point. The CoroutineTransformerMethodVisitor is where these markers are consumed and the actual state machine is assembled.

This is the most complex piece of coroutine compilation. It operates on ASM MethodNode trees and performs the transformation in a carefully ordered sequence.

The transformation pipeline

Looking at performTransformations, the main driver method, the pipeline begins with cleanup passes that normalize the bytecode left over from IR code generation:

override fun performTransformations(methodNode: MethodNode) {
    removeFakeContinuationConstructorCall(methodNode)            // 1. Strip IR placeholders
    replaceReturnsUnitMarkersWithPushingUnitOnStack(methodNode)  // 2. Insert actual Unit pushes
    replaceFakeContinuationsWithRealOnes(methodNode)             // 3. Replace ACONST_NULL with real loads
    FixStackMethodTransformer().transform(...)                   // 4. Fix stack shape from inlining

Next, it identifies the suspension points and performs optimization passes:

    val suspensionPoints = collectSuspensionPoints(methodNode)                    // 5. Find all marker pairs
    RedundantLocalsEliminationMethodTransformer(suspensionPoints).transform(...)  // 6. Dead code
    ChangeBoxingMethodTransformer.transform(...)                                  // 6. Boxing cleanup
    checkForSuspensionPointInsideMonitor(methodNode, suspensionPoints)            // 7. Illegal suspend check

At this point, the transformer checks whether the full state machine can be skipped entirely. If every suspension point is a tail call, it takes the fast path:

    if (isForNamedFunction &&
        methodNode.allSuspensionPointsAreTailCalls(suspensionPoints, ...)) {
        methodNode.addCoroutineSuspendedChecks(suspensionPoints)
        dropSuspensionMarkers(methodNode)
        return  // NO state machine needed
    }

If the fast path doesn't apply, the transformer builds the full state machine. The remaining steps happen in order, each depending on the previous:

    prepareMethodNodePreludeForNamedFunction(methodNode)                       // 8. Entry prelude
    for (point in suspensionPoints) {
        splitTryCatchBlocksContainingSuspensionPoint(methodNode, point)        // 9. Split try-catch
    }
    spillVariables(suspensionPoints, methodNode)                               // 10. Spill variables
    val stateLabels = suspensionPoints.withIndex().map {
        transformCallAndReturnStateLabel(it.index + 1, it.value, methodNode, ...)  // 11. Per-point logic
    }
    generateStateMachinesTableswitch(methodNode, ..., suspensionPoints, stateLabels) // 12. TABLESWITCH
    dropSuspensionMarkers(methodNode)                                          // 13. Cleanup
}

Each step has a clear purpose. Let's examine the most important ones.

Try-catch splitting: Exception handling across suspension

Step 9 splits try-catch blocks around suspension points. The problem is that a single try-catch block in your source code might span multiple suspension points:

suspend fun riskyOperation(): String {
    try {
        val a = fetchA()    // suspension point 1
        val b = fetchB(a)   // suspension point 2
        return process(a, b)
    } catch (e: Exception) {
        return "fallback"
    }
}

In the JVM bytecode, a try-catch block is defined by a start label, an end label, and a handler label. But when the function suspends and returns COROUTINE_SUSPENDED, execution leaves the try-catch scope entirely. When it resumes at a later state label, it re enters the method at the TABLESWITCH, which is outside the original try-catch range.

The transformer solves this by splitting each try-catch block that contains a suspension point into multiple blocks: one for the code before the suspension, and one for the resume path after. Each resume label gets its own try-catch entry that points to the same handler. This ensures that exceptions thrown during resumption (for example, if resumeWith delivers a failure result) are still caught by the original handler.

The checkForSuspensionPointInsideMonitor step (step 7) is related but different: it detects suspend calls inside synchronized blocks and reports an error. Suspending inside a monitor would release the thread while holding the lock, leading to deadlocks. The compiler catches this at compile time rather than allowing it to fail silently at runtime.

Suspension point collection: Finding the boundaries

Before building the state machine, the transformer must identify where suspension points are. During code generation, each call to a suspend function is bracketed by synthetic marker instructions:

ICONST_0
INVOKESTATIC InlineMarker.mark()    // BeforeSuspendMarker
... actual suspend call ...
ICONST_1
INVOKESTATIC InlineMarker.mark()    // AfterSuspendMarker

The collectSuspensionPoints method walks the bytecode, identifies each BeforeSuspendMarker/AfterSuspendMarker pair, and constructs a SuspensionPoint object:

private fun collectSuspensionPoints(methodNode: MethodNode): List<SuspensionPoint> {
    val cfg = ControlFlowGraph.build(methodNode, followExceptions = false)

    return methodNode.instructions.filter { isBeforeSuspendMarker(it) }
        .mapNotNull { start ->
            val ends = mutableSetOf<AbstractInsnNode>()
            collectSuspensionPointEnds(start, mutableSetOf(), ends)
            val end = ends.find { isAfterSuspendMarker(it) } ?: return@mapNotNull null
            SuspensionPoint(start.previous, end)
        }.toList()
}

Each SuspensionPoint carries a stateLabel, the LabelNode that the TABLESWITCH will jump to when resuming at that point.

Variable spilling: Saving locals across suspension

When a function suspends, its JVM stack frame is destroyed (the function returns COROUTINE_SUSPENDED). Any local variables that are needed after resumption must be saved somewhere persistent. The compiler saves them into fields of the continuation object, a process called "spilling."

The spillVariables method performs liveness analysis to determine which variables are alive at each suspension point, then generates save and restore bytecode:

private fun spillVariables(suspensionPoints, methodNode) {
    val frames = performSpilledVariableFieldTypesAnalysis(...)
    val livenessFrames = analyzeLiveness(methodNode)

    for (suspension in suspensionPoints) {
        val variablesToSpill = calculateVariablesToSpill(...)

        // Partition: references need nulling after spill to avoid GC leaks
        val (references, primitives) = variablesToSpill.partition {
            it.normalizedType == OBJECT_TYPE
        }

        for (variable in references + primitives) {
            generateSpillAndUnspill(methodNode, suspension, variable, ...)
        }
    }
}

For each live variable, the transformer inserts:

Before the suspension point (spill):

ALOAD $continuation
ALOAD localVar         // or ILOAD, LLOAD, etc.
PUTFIELD Foo$1.L$0     // save to continuation field

After the resume label (unspill):

ALOAD $continuation
GETFIELD Foo$1.L$0     // restore from continuation field
ASTORE localVar

Fields are named by type and index: L$0, L$1 for object references, I$0 for ints, J$0 for longs, D$0 for doubles. The compiler only promotes variables that are live across suspension points. Variables used entirely within a single state remain as normal stack allocated locals.

The important observation: reference type variables are nulled out in the continuation after being restored. This prevents the continuation from holding strong references to objects that the function has already finished using, which would otherwise cause memory leaks if the coroutine remains suspended for a long time.

The TABLESWITCH: State machine dispatch

The final piece of the state machine is the dispatch mechanism at the function entry. The generateStateMachinesTableswitch method inserts a TABLESWITCH instruction that reads the label field and jumps to the correct resume point.

First, it caches the COROUTINE_SUSPENDED sentinel and loads the current label:

methodNode.instructions.insertBefore(actualCoroutineStart, insnListOf(
    *withInstructionAdapter { loadCoroutineSuspendedMarker() }.toArray(),
    VarInsnNode(ASTORE, suspendMarkerVarIndex),   // cache the sentinel in a local
    VarInsnNode(ALOAD, continuationIndex),
    *withInstructionAdapter { getLabel() }.toArray(),  // GETFIELD label

Then it inserts the TABLESWITCH with one case per state:

    TableSwitchInsnNode(
        0,                          // min = 0 (initial call)
        suspensionPoints.size,      // max = N
        defaultLabel,               // default: throw IllegalStateException
        firstStateLabel,            // case 0: initial entry
        *stateLabels.toTypedArray() // case 1..N: resume points
    ),
    firstStateLabel
))

The default case catches illegal states, for example if a continuation is resumed more than once:

methodNode.instructions.insert(last, withInstructionAdapter {
    AsmUtil.genThrow(
        this,
        "java/lang/IllegalStateException",
        ILLEGAL_STATE_ERROR_MESSAGE  // "call to 'resume' before 'invoke' with coroutine"
    )
})

State 0 is the initial entry point (the function is being called for the first time). States 1 through N correspond to the resume points after each suspension.

The COROUTINE_SUSPENDED sentinel is loaded once and stored in a local variable ($suspendMarker) at the very top of the method. This avoids repeated static method calls to getCOROUTINE_SUSPENDED() at each suspension point check.

Each suspension point: Set label, check, return

For each suspension point, transformCallAndReturnStateLabel inserts three pieces of logic.

First, before the suspend call, it saves the current state by writing the suspension point's ID into the label field:

insertBefore(suspension.suspensionCallBegin, withInstructionAdapter {
    visitVarInsn(ALOAD, continuationIndex)
    iconst(id)
    setLabel()  // PUTFIELD label = id
})

After the suspend call returns, it checks whether the function actually suspended. If the return value is COROUTINE_SUSPENDED, it propagates the sentinel up the call stack. The resume label (where the TABLESWITCH jumps on re entry) is placed immediately after:

insert(suspension.tryCatchBlockEndLabelAfterSuspensionCall, withInstructionAdapter {
    dup()
    load(suspendMarkerVarIndex, OBJECT_TYPE)  // load COROUTINE_SUSPENDED
    ifacmpne(continuationLabel)               // not suspended? skip

    load(suspendMarkerVarIndex, OBJECT_TYPE)
    areturn(OBJECT_TYPE)                      // return COROUTINE_SUSPENDED to caller

    visitLabel(suspension.stateLabel.label)   // resume label (TABLESWITCH target)
})

At the resume label, the transformer emits an exception check (in case resumeWith was called with a failure) and loads $result onto the stack as if the suspend call had returned normally:

insert(possibleTryCatchBlockStart, withInstructionAdapter {
    generateResumeWithExceptionCheck(dataIndex)  // ResultKt.throwOnFailure($result)
    load(dataIndex, OBJECT_TYPE)                 // push $result as the "return value"
})

The pattern is consistent for every suspension point:

  1. Set label = id so the TABLESWITCH knows where to jump on resume
  2. Make the actual suspend call (passing the continuation)
  3. Check if the return value is COROUTINE_SUSPENDED; if yes, propagate it upward
  4. If the call completed synchronously (fast path), continue to the next instruction
  5. At the resume label, call throwOnFailure to propagate exceptions from resumeWith, then load $result onto the stack as if the suspend call had returned normally

The fast path (step 4) is important. If a suspend function completes without actually suspending (for example, returning a cached value), execution continues without any suspension machinery overhead. No state is saved, no thread switch happens, no dispatch is needed. This makes the common case of synchronous completion extremely cheap.

Tail call optimization: Skipping the state machine entirely

Not every suspend function needs a full state machine. If every suspension point in a function is a tail call (meaning the suspend call's return value is immediately returned), the compiler can skip the entire state machine and emit a much simpler form.

The optimization happens at two levels. First, at the IR level, TailCallOptimizationLowering identifies tail position suspend calls:

override fun visitCall(expression: IrCall, data: TailCallOptimizationData?): IrExpression {
    val transformed = super.visitCall(expression, data) as IrExpression
    return if (data == null || expression !in data.tailCalls) transformed
    else IrReturnImpl(
        data.function.endOffset, data.function.endOffset,
        context.irBuiltIns.nothingType,
        data.function.symbol,
        if (data.returnsUnit) transformed.coerceToUnit() else transformed
    )
}

Then, at the bytecode level, allSuspensionPointsAreTailCalls in TailCallOptimization.kt verifies the optimization is safe by performing control flow analysis:

fun MethodNode.allSuspensionPointsAreTailCalls(suspensionPoints, ...): Boolean {
    val frames = MethodTransformer.analyze("fake", this, TcoInterpreter(suspensionPoints))

    return suspensionPoints.all { suspensionPoint ->
        // Must not be inside a try-catch block
        tryCatchBlocks.all { index < it.start || it.end <= index } &&
        // Only ARETURN (or POP + Unit + ARETURN) allowed after the call
        suspensionPoint.suspensionCallEnd.transitiveSuccessorsAreSafeOrReturns(...)
    }
}

If the check passes, instead of building a full state machine with TABLESWITCH, spilling, and continuation class instantiation, the transformer simply inserts a COROUTINE_SUSPENDED check after each call:

fun MethodNode.addCoroutineSuspendedChecks(suspensionPoints) {
    for (suspensionPoint in suspensionPoints) {
        if (suspensionPoint.suspensionCallEnd.nextMeaningful?.opcode == ARETURN) continue
        instructions.insert(suspensionPoint.suspensionCallEnd, withInstructionAdapter {
            dup()
            loadCoroutineSuspendedMarker()
            ifacmpne(label)
            areturn(OBJECT_TYPE)    // propagate COROUTINE_SUSPENDED
            mark(label)
        })
    }
}

This is a significant optimization. A tail call optimized suspend function has no continuation class allocation, no field spilling, no TABLESWITCH. It's nearly as cheap as a regular function call with one additional reference comparison per suspension point.

The bridge: IR codegen to bytecode transformation

The connection between IR code generation and the bytecode level transformer happens in CoroutineCodegen.kt. The acceptWithStateMachine extension function wraps the generated MethodNode in a CoroutineTransformerMethodVisitor:

internal fun MethodNode.acceptWithStateMachine(
    irFunction: IrFunction,
    classCodegen: ClassCodegen,
    methodVisitor: MethodVisitor,
    varsCountByType: Map<Type, Int>,
    obtainContinuationClassBuilder: () -> ClassBuilder,
) {
    val visitor = CoroutineTransformerMethodVisitor(
        methodVisitor, access, name, desc,
        containingClassInternalName = classCodegen.type.internalName,
        obtainClassBuilderForCoroutineState = obtainContinuationClassBuilder,
        isForNamedFunction = irFunction.isSuspend,
        needDispatchReceiver = irFunction.isSuspend &&
            (irFunction.dispatchReceiverParameter != null || ...),
        initialVarsCountByType = varsCountByType,
    )
    accept(visitor)
}

The hasContinuation() predicate in JvmIrCoroutineUtils.kt gates which functions go through this path:

fun IrFunction.hasContinuation(): Boolean =
    isInvokeSuspendOfLambda() ||
    isSuspend && shouldContainSuspendMarkers() &&
    !isEffectivelyInlineOnly() &&
    origin != IrDeclarationOrigin.INLINE_LAMBDA &&
    origin != JvmLoweredDeclarationOrigin.FOR_INLINE_STATE_MACHINE_TEMPLATE

Functions that are effectively inline, or that serve as templates for the inliner, skip the state machine because their code will be transplanted into the caller's state machine instead.

The complete picture: Tracing a suspend function

Let's trace the complete transformation of a concrete function to see all the pieces working together:

suspend fun loadData(id: Int): String {
    val token = authenticate()       // suspension point 1
    val data = fetch(id, token)      // suspension point 2
    return process(data)             // suspension point 3 (tail call)
}

After AddContinuationLowering (IR level)

The function signature becomes:

fun loadData(id: Int, $completion: Continuation<String>?): Any?

A continuation class is generated:

class LoadData$1(
    var I$0: Int,       // spill field for `id`
    var result: Any?,
    var label: Int,
    completion: Continuation<*>?
) : ContinuationImpl(completion) {
    override fun invokeSuspend(result: Result<Any?>): Any? {
        this.result = result
        this.label = this.label or 0x80000000  // set sign bit
        return loadData(0, this)               // re enter
    }
}

After bytecode transformation

The final bytecode transformation happens in several layers. Let's walk through each one.

The method begins with the prelude: the INSTANCEOF check, the sign bit check, and continuation creation or reuse, exactly as described in the sign bit trick section. After the prelude, the COROUTINE_SUSPENDED sentinel is loaded once and cached, and the TABLESWITCH dispatches based on label:

INVOKESTATIC getCOROUTINE_SUSPENDED; ASTORE $suspended
ALOAD $cont; GETFIELD label
TABLESWITCH 0..3:
    0 -> state_0
    1 -> state_1
    2 -> state_2
    3 -> state_3
    default -> throw IllegalStateException

State 0 is the initial entry. The compiler spills id into the continuation (because it's needed after the first suspension), sets label = 1, and calls authenticate. If the call returns COROUTINE_SUSPENDED, the function returns immediately, releasing the thread:

state_0:
    ALOAD $cont; ILOAD id; PUTFIELD I$0      // spill id
    ALOAD $cont; ICONST 1; PUTFIELD label     // set next state
    ALOAD $cont; INVOKEVIRTUAL authenticate   // suspend call
    DUP; ALOAD $suspended; IF_ACMPNE -> state_0_continue
    ARETURN                                    // suspended: release the thread

State 1 is the resume point after authenticate() completes. The transformer first checks for exceptions (if resumeWith was called with a failure), then unspills id from the continuation and stores the result as token:

state_1:
    ALOAD $result; INVOKESTATIC throwOnFailure  // throw if failure
    ALOAD $cont; GETFIELD I$0; ISTORE id        // unspill id
    ALOAD $result; ASTORE token                  // authenticate's result

Execution then falls through to prepare for the second suspension. No variables need to be spilled here: id and token are consumed as arguments to fetch, and neither is referenced after the call returns. The only value needed after resumption is data, which arrives through $result:

state_0_continue:
    ALOAD $cont; ICONST 2; PUTFIELD label         // set next state
    ILOAD id; ALOAD token; ALOAD $cont; INVOKEVIRTUAL fetch
    DUP; ALOAD $suspended; IF_ACMPNE -> state_1_continue
    ARETURN                                        // suspended

State 2 resumes after fetch(). The result becomes data:

state_2:
    ALOAD $result; INVOKESTATIC throwOnFailure
    ALOAD $result; ASTORE data

The final call to process(data) is a tail call. The compiler still sets label = 3 and checks for COROUTINE_SUSPENDED, but no spilling is needed because nothing follows the call:

state_1_continue:
    ALOAD $cont; ICONST 3; PUTFIELD label
    ALOAD data; ALOAD $cont; INVOKEVIRTUAL process
    DUP; ALOAD $suspended; IF_ACMPNE -> state_2_continue
    ARETURN

state_3:
    ALOAD $result; INVOKESTATIC throwOnFailure
    ALOAD $result

state_2_continue:
    ARETURN                                        // return the final result

This is the complete transformation. The sequential three line function has become a state machine with four states, field spilling for one local variable (id, live across the first suspension), exception checking at each resume point, and a TABLESWITCH dispatch at the entry.

The JS and Wasm backends: A different approach

The JVM backend performs the state machine transformation at the bytecode level using ASM tree manipulation. The JS and Wasm backends take a fundamentally different approach: they build the state machine entirely in IR.

AbstractSuspendFunctionsLowering provides the common framework:

abstract class AbstractSuspendFunctionsLowering<C : CommonBackendContext>(val context: C) {
    protected abstract val stateMachineMethodName: Name
    protected abstract fun buildStateMachine(
        stateMachineFunction: IrFunction,
        transformingFunction: IrFunction,
        argumentToPropertiesMap: Map<IrValueParameter, IrField>,
    )
}

The JS backend's StateMachineBuilder creates SuspendState nodes directly in the IR tree, where each state represents an atomic block of code between two suspension points:

class SuspendState(type: IrType) {
    val entryBlock: IrContainerExpression = JsIrBuilder.buildComposite(type)
    val successors = mutableSetOf<SuspendState>()
    var id = -1
}

The JS backend also classifies suspend functions into three categories before deciding what to generate:

sealed class SuspendFunctionKind {
    object NO_SUSPEND_CALLS : SuspendFunctionKind()
    class DELEGATING(val delegatingCall: IrCall) : SuspendFunctionKind()
    object NEEDS_STATE_MACHINE : SuspendFunctionKind()
}

Functions with no suspend calls become plain functions. Functions with a single tail position suspend call become simple delegations. Only functions that genuinely need a state machine get one. This classification avoids unnecessary overhead in the generated JavaScript.

Inline suspend functions: Transplanting state machines

Inline suspend functions follow yet another path. When a suspend function is marked inline, the compiler does not generate a state machine for it. Instead, the function's body is copied directly into the caller's bytecode by the inliner, and the caller's state machine absorbs the inlined suspension points.

This means an inline suspend function like withContext or coroutineScope does not produce its own continuation class or TABLESWITCH. Its suspension points become part of the calling function's state machine, with the caller's continuation handling the spilling and dispatching.

To support this, the compiler generates two copies of every inline suspend function during code generation:

  1. A normal version with a state machine, used when the function is called from non inlined contexts (for example, through a function reference)
  2. A version named foo$$forInline without a state machine, retaining the suspend markers, for the inliner to consume

The SuspendForInlineCopyingMethodVisitor in SuspendFunctionGenerationStrategy.kt handles this duplication:

class SuspendForInlineCopyingMethodVisitor(...) : TransformationMethodVisitor(...) {
    override fun performTransformations(methodNode: MethodNode) {
        methodNode.preprocessSuspendMarkers(forInline = false, keepFakeContinuation = false)
        newMethodNode.preprocessSuspendMarkers(forInline = true, keepFakeContinuation = true)
        newMethodNode.accept(newMethodVisitor)
    }
}

The forInline = true copy keeps the fake continuation markers intact so the inliner can later replace them with the actual caller's continuation. The forInline = false copy strips the markers and proceeds through the normal state machine transformation.

This dual copy approach is why inline suspend functions have essentially zero overhead when inlined: their suspension points merge directly into the caller's state machine, sharing the same continuation object and spill fields.

Real world implications: What this means for your code

The compiler machinery has direct implications for how you write and debug Kotlin code.

Stack traces and debugging

Coroutine stack traces show the state machine internals rather than your original code flow. When you see MyClass$fetchData$1.invokeSuspend(MyClass.kt:42), the $fetchData$1 is the generated continuation class, and invokeSuspend is the state machine's re entry point. The line number corresponds to the suspension point in your original source. If a coroutine appears stuck, you can inspect the continuation's label field (via kotlinx-coroutines-debug or a debugger) to identify exactly which suspension point it's waiting at.

Memory retention across suspension

Local variables promoted to continuation fields remain in memory for the lifetime of the coroutine. If you allocate a large bitmap in one state and the coroutine suspends for a long time in a later state, that bitmap lives in the continuation's L$0 field until the coroutine completes or the variable is overwritten. This is a common source of unexpected memory pressure in long running coroutines. The mitigation is straightforward: set large references to null after you no longer need them, or restructure your code so the large allocation and the long suspension are in different functions.

The importance of the fast path

In production systems, many suspend function calls complete synchronously. A channel send() that finds space in the buffer, a Mutex.lock() on an uncontested lock, a Deferred.await() on an already completed computation: these all return their result directly without suspending. The fast path (checking result != COROUTINE_SUSPENDED and continuing) means these calls have negligible overhead compared to non suspend calls. This is why using suspend functions liberally in your API design is not a performance concern in most cases.

Tail calls in practice

Knowing that the compiler can optimize tail call suspend functions means you can write delegation patterns efficiently:

suspend fun fetchConditionally(id: Int): Data {
    return if (id > 0) fetchFromNetwork(id) else fetchFromCache(id)
    // both branches are tail calls
}

Because both suspend calls are in tail position and not inside try-catch blocks, this function does not need a state machine. The compiler generates minimal COROUTINE_SUSPENDED checks instead. Note that wrapping a tail call in try-catch disqualifies it from this optimization, since the suspension point falls within the exception handler's range. If you add logging after a suspend call, or wrap it in try-catch, the optimization disappears and a full state machine is generated.

Conclusion

In this article, you've explored the complete compiler pipeline that transforms Kotlin's suspend keyword into JVM state machines. You've traced through CPS transformation (adding the hidden $completion parameter), continuation class generation (with the sign bit trick for distinguishing fresh calls from resumptions), suspension point collection (via marker instructions), variable spilling (saving live locals to continuation fields), TABLESWITCH generation (dispatching to the correct resume point), and tail call optimization (skipping the state machine when possible).

These internals directly inform how you reason about coroutine behavior. The sign bit trick explains why recursive suspend calls work correctly. Variable spilling explains why large objects referenced across suspension points can cause memory pressure. The fast path optimization explains why many suspend calls have negligible overhead. Tail call optimization explains why simple delegation functions are nearly free. These are the mechanics that determine coroutine performance characteristics in production systems.

Whether you're debugging a coroutine that seems stuck (check the label field to see which suspension point it's waiting at), optimizing a hot path that calls suspend functions in a tight loop (ensure synchronous completion hits the fast path), or designing coroutine based architectures (understand the per invocation allocation cost and spill overhead), this knowledge of the compiler machinery gives you the foundation for writing correct, performant Kotlin code. Coroutines are not a library abstraction. They are a compiler level solution, and the depth of that solution is what makes the suspend keyword work as well as it does.

As always, happy coding!

— Jaewoong