The Machinery Behind the Magic: How Kotlin Turns suspend into State Machines
The Machinery Behind the Magic: How Kotlin Turns suspend into State Machines
Kotlin Coroutines have become the standard for asynchronous programming on the JVM, offering developers a way to write sequential, readable code that can pause and resume without blocking threads. Most developers interact with coroutines through familiar APIs like launch, async, and Flow, treating suspend as a language keyword that "just works." But coroutines are not simply a library feature layered on top of the language. They are a compiler level solution, built through the Kotlin compiler's IR lowering pipeline and bytecode generation, that transforms your sequential code into resumable state machines. The suspend keyword triggers a series of compiler transformations that rewrite your function's structure, signature, and control flow before it ever reaches the JVM.
In this article, you'll dive deep into the Kotlin compiler's coroutine machinery, exploring the six stage transformation pipeline that converts a suspend function into a state machine. You'll trace through how the compiler injects hidden continuation parameters through CPS transformation, how it generates continuation classes with the clever sign bit trick for distinguishing fresh calls from resumptions, how the bytecode level transformer collects suspension points and inserts a TABLESWITCH dispatch, how local variables are "spilled" into continuation fields to survive across suspension, and how tail call optimization lets the compiler skip the entire state machine when it can prove every suspension point is a tail call.
The fundamental problem: How do you make a function resumable?
Consider this suspend function:
suspend fun fetchUserData(): UserData {
val user = fetchUser()
val profile = fetchProfile(user.id)
return UserData(user, profile)
}
This looks like ordinary sequential code, but both fetchUser() and fetchProfile() might perform network requests that take hundreds of milliseconds. The function must be able to pause at each call, release the thread entirely, and later resume execution at the exact point where it left off, with all local variables intact.
The JVM provides no native mechanism for this. A JVM method is a stack frame, and when a method returns, its stack frame is gone. There is no way to "freeze" a stack frame, release the thread, and later restore it. The function must return to release the thread, but returning destroys the local state.
The Kotlin compiler solves this by transforming each suspend function into a state machine. The function's body is split into segments between suspension points. Local variables are saved into fields of a continuation object before each suspension, and restored after resumption. A label field tracks which segment to execute next, and a TABLESWITCH at the function entry dispatches to the correct segment. The developer writes linear code; the compiler generates the machinery to break it apart and reassemble it on demand.
The six stage pipeline: From suspend to state machine
The transformation happens across six distinct phases in the JVM backend. Understanding the full pipeline is essential to understanding why each phase exists and what it contributes.
SuspendLambdaLowering: Converts suspend lambda expressions into anonymous continuation classesTailCallOptimizationLowering: Identifies suspend calls in tail position and marks them withIrReturnwrappersAddContinuationLowering: The central IR lowering, generates continuation classes, injects$completionparameters, creates static suspend implementations- Code generation: Lowers IR to JVM bytecode, placing
BeforeSuspendMarker/AfterSuspendMarkerinstructions around each suspension point CoroutineTransformerMethodVisitor: The bytecode level state machine engine, inserts theTABLESWITCH, spills variables, generates resume paths- Tail call optimization check: If all suspension points are tail calls, the state machine is skipped entirely
Let's trace through each phase.
CPS transformation: The invisible parameter
The foundation of coroutine compilation is Continuation Passing Style (CPS) transformation. Every suspend function, when compiled, receives a hidden additional parameter: the continuation. This continuation represents "what happens next" after the function completes or suspends.
When you write:
suspend fun fetchUser(): User {
// ...
}
The compiler transforms the signature to:
fun fetchUser($completion: Continuation<User>?): Any?
Two changes happen. First, a $completion parameter of type Continuation is appended. Second, the return type becomes Any?, because the function can now return either the actual result or the special sentinel COROUTINE_SUSPENDED, indicating that the function has paused and will deliver its result later through the continuation.
Looking at how AddContinuationLowering performs this injection:
val continuationParameter = buildValueParameter(function) {
kind = IrParameterKind.Regular
name = Name.identifier(SUSPEND_FUNCTION_COMPLETION_PARAMETER_NAME) // "$completion"
type = continuationType(context).substitute(substitutionMap) // Continuation<RetType>?
origin = JvmLoweredDeclarationOrigin.CONTINUATION_CLASS
}
The parameter is inserted before any default argument masks but after all regular parameters. This is invisible in source code but always present in the bytecode. Every call site of a suspend function is also rewritten to pass the current continuation as this extra argument.
The continuation class: Where state lives
The central artifact of coroutine compilation is the continuation class. For each named suspend function, the compiler generates an inner class that extends ContinuationImpl and holds all the state needed to suspend and resume.
Looking at generateContinuationClassForNamedFunction in AddContinuationLowering.kt:
context.irFactory.buildClass {
name = Name.special("<Continuation>")
origin = JvmLoweredDeclarationOrigin.CONTINUATION_CLASS
}.apply {
superTypes += context.symbols.continuationImplClass.owner.defaultType
val resultField = addField(CONTINUATION_RESULT_FIELD_NAME, ...) // "result"
val labelField = addField(COROUTINE_LABEL_FIELD_NAME, ...) // "label"
val capturedThisField = ... // captures outer `this` for instance methods
addConstructorForNamedFunction(capturedThisField, ...)
addInvokeSuspendForNamedFunction(irFunction, resultField, labelField, ...)
}
The generated class has three essential fields:
label: Int: The state machine index, tracking which segment of the function body to execute nextresult: Any?: Holds the value passed toresumeWithwhen the coroutine resumesthis$0(optional): Captures the dispatch receiver for instance methods
Additionally, spilled local variable fields (L$0, L$1, I$0, etc.) are added later during bytecode transformation. These hold local variables that must survive across suspension points.
The invokeSuspend method: The re entry point
The continuation class overrides invokeSuspend, which the coroutine runtime calls when a suspended coroutine is resumed. This method stores the resume value, sets the sign bit on the label, and calls back into the original function:
override fun invokeSuspend(result: Result<Any?>): Any? {
this.result = result
this.label = this.label or (1 shl 31) // SET the sign bit
return foo(this) // re enter the function with `this` as the continuation
}
The sign bit trick is worth examining closely.
The sign bit trick: Distinguishing fresh calls from resumptions
When a suspend function receives a continuation as $completion, it needs to answer a important question: "Is this a fresh call, or am I being resumed from a previous suspension?" The answer determines whether to start from the beginning or jump to the saved state.
There are three scenarios:
- Direct call from another suspend function: A fresh call with a caller provided continuation
- Resume via
resumeWith: The runtime callsinvokeSuspend, which re enters the function with the continuation object itself as$completion - Recursive call: The function calls itself, passing a continuation of the same type
To distinguish case 1 from cases 2 and 3, the compiler uses an INSTANCEOF check. If $completion is an instance of the function's own continuation class, it might be a resume or a recursive call. To distinguish case 2 from case 3, the compiler uses the sign bit of the label field:
val signBit = 1 shl 31 // 0x80000000
+irSetField(
irGet(function.dispatchReceiverParameter!!), labelField,
irCallOp(
context.irBuiltIns.intClass.functions.single {
it.owner.name == OperatorNameConventions.OR
},
context.irBuiltIns.intType,
irGetField(irGet(function.dispatchReceiverParameter!!), labelField),
irInt(signBit) // label = label | 0x80000000
)
)
When invokeSuspend is called (the resume path), it ORs 0x80000000 into the label. The function entry prelude then checks this bit: if set, this is a genuine resume. If not set, even though the continuation passes the INSTANCEOF check, it's a recursive call and should be treated as fresh.
The entry prelude, generated by prepareMethodNodePreludeForNamedFunction in the bytecode transformer, implements this logic in three stages.
First, it checks whether the incoming $completion is an instance of this function's own continuation class:
ALOAD $completion
INSTANCEOF Foo$1 // Is it our continuation class?
IFEQ createNewContinuation // No -> fresh call
If the INSTANCEOF check passes, the prelude casts it and inspects the sign bit of label. This is where the distinction between resume and recursive call happens:
ALOAD $completion
CHECKCAST Foo$1
ASTORE $continuation
ALOAD $continuation
GETFIELD label
ICONST 0x80000000
IAND // label & 0x80000000
IFEQ createNewContinuation // Sign bit not set -> recursive call, treat as fresh
If the sign bit is set, this is a genuine resume. The prelude clears the bit (restoring the original label value) and jumps to the state machine dispatch:
ALOAD $continuation
DUP
GETFIELD label
ICONST 0x80000000
ISUB // label - 0x80000000 (clears the sign bit)
PUTFIELD label
GOTO afterCreate
If either check fails (not our class, or no sign bit), the prelude allocates a fresh continuation and loads the resume value:
createNewContinuation:
NEW Foo$1
DUP
ALOAD this
ALOAD $completion
INVOKESPECIAL Foo$1.<init>
ASTORE $continuation
afterCreate:
ALOAD $continuation
GETFIELD result
ASTORE $result // Load the resume value into $result local
This is cool. A single bit in the label integer serves as a flag that perfectly disambiguates three different calling scenarios, without requiring an additional field or any runtime cost beyond a bitwise AND.
The static suspend implementation: Avoiding virtual dispatch
For overridable suspend functions (non final, non private), the compiler creates a static implementation method and rewrites the original to delegate:
private fun createStaticSuspendImpl(irFunction: IrSimpleFunction): IrSimpleFunction {
val static = createStaticFunctionWithReceivers(
irFunction.parent,
irFunction.name.toSuspendImplementationName(), // "foo$suspendImpl"
irFunction,
origin = JvmLoweredDeclarationOrigin.SUSPEND_IMPL_STATIC_FUNCTION,
)
static.body = irFunction.moveBodyTo(static)
// Original method becomes a simple forwarder:
irFunction.body = irBuilder.irBlockBody {
+irReturn(irCall(static).also {
it.arguments.assignFrom(irFunction.parameters, ::irGet)
})
}
return static
}
The state machine lives in the static foo$suspendImpl method. The original virtual method simply delegates. This prevents a important problem: if a subclass overrides foo, the resumed continuation must call back into the original implementation's state machine, not the subclass's override. Static dispatch guarantees this.
Suspend lambda transformation: Anonymous continuation classes
Suspend lambdas follow a different path through SuspendLambdaLowering. Each suspend lambda becomes an anonymous class extending SuspendLambda:
val suspendLambda =
if (reference.isRestrictedSuspension)
context.symbols.restrictedSuspendLambdaClass.owner
else
context.symbols.suspendLambdaClass.owner
// The class extends both SuspendLambda and FunctionN+1
superTypes = listOf(suspendLambda.defaultType, functionNType)
Lambda parameters are stored as fields using a naming convention based on their JVM type descriptor: L$0, L$1 for reference types, I$0 for ints, J$0 for longs, and so on. This naming convention matters because the bytecode transformer uses it to allocate spill fields without collisions.
For lambdas with arity 0 or 1, the compiler generates a create(completion) factory method. The constructor initially passes null for the completion parameter:
+irCall(continuation.constructors.single().symbol).apply {
arguments[0] = irNull() // completion = null initially
}
The actual completion is provided later through create(completion) or the invoke override. This separation allows the same lambda class to be instantiated once and invoked multiple times with different completions.
The bytecode transformer: Where the state machine is born
After IR lowering and code generation, each suspend function's bytecode still looks mostly linear, with synthetic BeforeSuspendMarker and AfterSuspendMarker instructions bracketing each suspension point. The CoroutineTransformerMethodVisitor is where these markers are consumed and the actual state machine is assembled.
This is the most complex piece of coroutine compilation. It operates on ASM MethodNode trees and performs the transformation in a carefully ordered sequence.
The transformation pipeline
Looking at performTransformations, the main driver method, the pipeline begins with cleanup passes that normalize the bytecode left over from IR code generation:
override fun performTransformations(methodNode: MethodNode) {
removeFakeContinuationConstructorCall(methodNode) // 1. Strip IR placeholders
replaceReturnsUnitMarkersWithPushingUnitOnStack(methodNode) // 2. Insert actual Unit pushes
replaceFakeContinuationsWithRealOnes(methodNode) // 3. Replace ACONST_NULL with real loads
FixStackMethodTransformer().transform(...) // 4. Fix stack shape from inlining
Next, it identifies the suspension points and performs optimization passes:
val suspensionPoints = collectSuspensionPoints(methodNode) // 5. Find all marker pairs
RedundantLocalsEliminationMethodTransformer(suspensionPoints).transform(...) // 6. Dead code
ChangeBoxingMethodTransformer.transform(...) // 6. Boxing cleanup
checkForSuspensionPointInsideMonitor(methodNode, suspensionPoints) // 7. Illegal suspend check
At this point, the transformer checks whether the full state machine can be skipped entirely. If every suspension point is a tail call, it takes the fast path:
if (isForNamedFunction &&
methodNode.allSuspensionPointsAreTailCalls(suspensionPoints, ...)) {
methodNode.addCoroutineSuspendedChecks(suspensionPoints)
dropSuspensionMarkers(methodNode)
return // NO state machine needed
}
If the fast path doesn't apply, the transformer builds the full state machine. The remaining steps happen in order, each depending on the previous:
prepareMethodNodePreludeForNamedFunction(methodNode) // 8. Entry prelude
for (point in suspensionPoints) {
splitTryCatchBlocksContainingSuspensionPoint(methodNode, point) // 9. Split try-catch
}
spillVariables(suspensionPoints, methodNode) // 10. Spill variables
val stateLabels = suspensionPoints.withIndex().map {
transformCallAndReturnStateLabel(it.index + 1, it.value, methodNode, ...) // 11. Per-point logic
}
generateStateMachinesTableswitch(methodNode, ..., suspensionPoints, stateLabels) // 12. TABLESWITCH
dropSuspensionMarkers(methodNode) // 13. Cleanup
}
Each step has a clear purpose. Let's examine the most important ones.
Try-catch splitting: Exception handling across suspension
Step 9 splits try-catch blocks around suspension points. The problem is that a single try-catch block in your source code might span multiple suspension points:
suspend fun riskyOperation(): String {
try {
val a = fetchA() // suspension point 1
val b = fetchB(a) // suspension point 2
return process(a, b)
} catch (e: Exception) {
return "fallback"
}
}
In the JVM bytecode, a try-catch block is defined by a start label, an end label, and a handler label. But when the function suspends and returns COROUTINE_SUSPENDED, execution leaves the try-catch scope entirely. When it resumes at a later state label, it re enters the method at the TABLESWITCH, which is outside the original try-catch range.
The transformer solves this by splitting each try-catch block that contains a suspension point into multiple blocks: one for the code before the suspension, and one for the resume path after. Each resume label gets its own try-catch entry that points to the same handler. This ensures that exceptions thrown during resumption (for example, if resumeWith delivers a failure result) are still caught by the original handler.
The checkForSuspensionPointInsideMonitor step (step 7) is related but different: it detects suspend calls inside synchronized blocks and reports an error. Suspending inside a monitor would release the thread while holding the lock, leading to deadlocks. The compiler catches this at compile time rather than allowing it to fail silently at runtime.
Suspension point collection: Finding the boundaries
Before building the state machine, the transformer must identify where suspension points are. During code generation, each call to a suspend function is bracketed by synthetic marker instructions:
ICONST_0
INVOKESTATIC InlineMarker.mark() // BeforeSuspendMarker
... actual suspend call ...
ICONST_1
INVOKESTATIC InlineMarker.mark() // AfterSuspendMarker
The collectSuspensionPoints method walks the bytecode, identifies each BeforeSuspendMarker/AfterSuspendMarker pair, and constructs a SuspensionPoint object:
private fun collectSuspensionPoints(methodNode: MethodNode): List<SuspensionPoint> {
val cfg = ControlFlowGraph.build(methodNode, followExceptions = false)
return methodNode.instructions.filter { isBeforeSuspendMarker(it) }
.mapNotNull { start ->
val ends = mutableSetOf<AbstractInsnNode>()
collectSuspensionPointEnds(start, mutableSetOf(), ends)
val end = ends.find { isAfterSuspendMarker(it) } ?: return@mapNotNull null
SuspensionPoint(start.previous, end)
}.toList()
}
Each SuspensionPoint carries a stateLabel, the LabelNode that the TABLESWITCH will jump to when resuming at that point.
Variable spilling: Saving locals across suspension
When a function suspends, its JVM stack frame is destroyed (the function returns COROUTINE_SUSPENDED). Any local variables that are needed after resumption must be saved somewhere persistent. The compiler saves them into fields of the continuation object, a process called "spilling."
The spillVariables method performs liveness analysis to determine which variables are alive at each suspension point, then generates save and restore bytecode:
private fun spillVariables(suspensionPoints, methodNode) {
val frames = performSpilledVariableFieldTypesAnalysis(...)
val livenessFrames = analyzeLiveness(methodNode)
for (suspension in suspensionPoints) {
val variablesToSpill = calculateVariablesToSpill(...)
// Partition: references need nulling after spill to avoid GC leaks
val (references, primitives) = variablesToSpill.partition {
it.normalizedType == OBJECT_TYPE
}
for (variable in references + primitives) {
generateSpillAndUnspill(methodNode, suspension, variable, ...)
}
}
}
For each live variable, the transformer inserts:
Before the suspension point (spill):
ALOAD $continuation
ALOAD localVar // or ILOAD, LLOAD, etc.
PUTFIELD Foo$1.L$0 // save to continuation field
After the resume label (unspill):
ALOAD $continuation
GETFIELD Foo$1.L$0 // restore from continuation field
ASTORE localVar
Fields are named by type and index: L$0, L$1 for object references, I$0 for ints, J$0 for longs, D$0 for doubles. The compiler only promotes variables that are live across suspension points. Variables used entirely within a single state remain as normal stack allocated locals.
The important observation: reference type variables are nulled out in the continuation after being restored. This prevents the continuation from holding strong references to objects that the function has already finished using, which would otherwise cause memory leaks if the coroutine remains suspended for a long time.
The TABLESWITCH: State machine dispatch
The final piece of the state machine is the dispatch mechanism at the function entry. The generateStateMachinesTableswitch method inserts a TABLESWITCH instruction that reads the label field and jumps to the correct resume point.
First, it caches the COROUTINE_SUSPENDED sentinel and loads the current label:
methodNode.instructions.insertBefore(actualCoroutineStart, insnListOf(
*withInstructionAdapter { loadCoroutineSuspendedMarker() }.toArray(),
VarInsnNode(ASTORE, suspendMarkerVarIndex), // cache the sentinel in a local
VarInsnNode(ALOAD, continuationIndex),
*withInstructionAdapter { getLabel() }.toArray(), // GETFIELD label
Then it inserts the TABLESWITCH with one case per state:
TableSwitchInsnNode(
0, // min = 0 (initial call)
suspensionPoints.size, // max = N
defaultLabel, // default: throw IllegalStateException
firstStateLabel, // case 0: initial entry
*stateLabels.toTypedArray() // case 1..N: resume points
),
firstStateLabel
))
The default case catches illegal states, for example if a continuation is resumed more than once:
methodNode.instructions.insert(last, withInstructionAdapter {
AsmUtil.genThrow(
this,
"java/lang/IllegalStateException",
ILLEGAL_STATE_ERROR_MESSAGE // "call to 'resume' before 'invoke' with coroutine"
)
})
State 0 is the initial entry point (the function is being called for the first time). States 1 through N correspond to the resume points after each suspension.
The COROUTINE_SUSPENDED sentinel is loaded once and stored in a local variable ($suspendMarker) at the very top of the method. This avoids repeated static method calls to getCOROUTINE_SUSPENDED() at each suspension point check.
Each suspension point: Set label, check, return
For each suspension point, transformCallAndReturnStateLabel inserts three pieces of logic.
First, before the suspend call, it saves the current state by writing the suspension point's ID into the label field:
insertBefore(suspension.suspensionCallBegin, withInstructionAdapter {
visitVarInsn(ALOAD, continuationIndex)
iconst(id)
setLabel() // PUTFIELD label = id
})
After the suspend call returns, it checks whether the function actually suspended. If the return value is COROUTINE_SUSPENDED, it propagates the sentinel up the call stack. The resume label (where the TABLESWITCH jumps on re entry) is placed immediately after:
insert(suspension.tryCatchBlockEndLabelAfterSuspensionCall, withInstructionAdapter {
dup()
load(suspendMarkerVarIndex, OBJECT_TYPE) // load COROUTINE_SUSPENDED
ifacmpne(continuationLabel) // not suspended? skip
load(suspendMarkerVarIndex, OBJECT_TYPE)
areturn(OBJECT_TYPE) // return COROUTINE_SUSPENDED to caller
visitLabel(suspension.stateLabel.label) // resume label (TABLESWITCH target)
})
At the resume label, the transformer emits an exception check (in case resumeWith was called with a failure) and loads $result onto the stack as if the suspend call had returned normally:
insert(possibleTryCatchBlockStart, withInstructionAdapter {
generateResumeWithExceptionCheck(dataIndex) // ResultKt.throwOnFailure($result)
load(dataIndex, OBJECT_TYPE) // push $result as the "return value"
})
The pattern is consistent for every suspension point:
- Set
label = idso theTABLESWITCHknows where to jump on resume - Make the actual suspend call (passing the continuation)
- Check if the return value is
COROUTINE_SUSPENDED; if yes, propagate it upward - If the call completed synchronously (fast path), continue to the next instruction
- At the resume label, call
throwOnFailureto propagate exceptions fromresumeWith, then load$resultonto the stack as if the suspend call had returned normally
The fast path (step 4) is important. If a suspend function completes without actually suspending (for example, returning a cached value), execution continues without any suspension machinery overhead. No state is saved, no thread switch happens, no dispatch is needed. This makes the common case of synchronous completion extremely cheap.
Tail call optimization: Skipping the state machine entirely
Not every suspend function needs a full state machine. If every suspension point in a function is a tail call (meaning the suspend call's return value is immediately returned), the compiler can skip the entire state machine and emit a much simpler form.
The optimization happens at two levels. First, at the IR level, TailCallOptimizationLowering identifies tail position suspend calls:
override fun visitCall(expression: IrCall, data: TailCallOptimizationData?): IrExpression {
val transformed = super.visitCall(expression, data) as IrExpression
return if (data == null || expression !in data.tailCalls) transformed
else IrReturnImpl(
data.function.endOffset, data.function.endOffset,
context.irBuiltIns.nothingType,
data.function.symbol,
if (data.returnsUnit) transformed.coerceToUnit() else transformed
)
}
Then, at the bytecode level, allSuspensionPointsAreTailCalls in TailCallOptimization.kt verifies the optimization is safe by performing control flow analysis:
fun MethodNode.allSuspensionPointsAreTailCalls(suspensionPoints, ...): Boolean {
val frames = MethodTransformer.analyze("fake", this, TcoInterpreter(suspensionPoints))
return suspensionPoints.all { suspensionPoint ->
// Must not be inside a try-catch block
tryCatchBlocks.all { index < it.start || it.end <= index } &&
// Only ARETURN (or POP + Unit + ARETURN) allowed after the call
suspensionPoint.suspensionCallEnd.transitiveSuccessorsAreSafeOrReturns(...)
}
}
If the check passes, instead of building a full state machine with TABLESWITCH, spilling, and continuation class instantiation, the transformer simply inserts a COROUTINE_SUSPENDED check after each call:
fun MethodNode.addCoroutineSuspendedChecks(suspensionPoints) {
for (suspensionPoint in suspensionPoints) {
if (suspensionPoint.suspensionCallEnd.nextMeaningful?.opcode == ARETURN) continue
instructions.insert(suspensionPoint.suspensionCallEnd, withInstructionAdapter {
dup()
loadCoroutineSuspendedMarker()
ifacmpne(label)
areturn(OBJECT_TYPE) // propagate COROUTINE_SUSPENDED
mark(label)
})
}
}
This is a significant optimization. A tail call optimized suspend function has no continuation class allocation, no field spilling, no TABLESWITCH. It's nearly as cheap as a regular function call with one additional reference comparison per suspension point.
The bridge: IR codegen to bytecode transformation
The connection between IR code generation and the bytecode level transformer happens in CoroutineCodegen.kt. The acceptWithStateMachine extension function wraps the generated MethodNode in a CoroutineTransformerMethodVisitor:
internal fun MethodNode.acceptWithStateMachine(
irFunction: IrFunction,
classCodegen: ClassCodegen,
methodVisitor: MethodVisitor,
varsCountByType: Map<Type, Int>,
obtainContinuationClassBuilder: () -> ClassBuilder,
) {
val visitor = CoroutineTransformerMethodVisitor(
methodVisitor, access, name, desc,
containingClassInternalName = classCodegen.type.internalName,
obtainClassBuilderForCoroutineState = obtainContinuationClassBuilder,
isForNamedFunction = irFunction.isSuspend,
needDispatchReceiver = irFunction.isSuspend &&
(irFunction.dispatchReceiverParameter != null || ...),
initialVarsCountByType = varsCountByType,
)
accept(visitor)
}
The hasContinuation() predicate in JvmIrCoroutineUtils.kt gates which functions go through this path:
fun IrFunction.hasContinuation(): Boolean =
isInvokeSuspendOfLambda() ||
isSuspend && shouldContainSuspendMarkers() &&
!isEffectivelyInlineOnly() &&
origin != IrDeclarationOrigin.INLINE_LAMBDA &&
origin != JvmLoweredDeclarationOrigin.FOR_INLINE_STATE_MACHINE_TEMPLATE
Functions that are effectively inline, or that serve as templates for the inliner, skip the state machine because their code will be transplanted into the caller's state machine instead.
The complete picture: Tracing a suspend function
Let's trace the complete transformation of a concrete function to see all the pieces working together:
suspend fun loadData(id: Int): String {
val token = authenticate() // suspension point 1
val data = fetch(id, token) // suspension point 2
return process(data) // suspension point 3 (tail call)
}
After AddContinuationLowering (IR level)
The function signature becomes:
fun loadData(id: Int, $completion: Continuation<String>?): Any?
A continuation class is generated:
class LoadData$1(
var I$0: Int, // spill field for `id`
var result: Any?,
var label: Int,
completion: Continuation<*>?
) : ContinuationImpl(completion) {
override fun invokeSuspend(result: Result<Any?>): Any? {
this.result = result
this.label = this.label or 0x80000000 // set sign bit
return loadData(0, this) // re enter
}
}
After bytecode transformation
The final bytecode transformation happens in several layers. Let's walk through each one.
The method begins with the prelude: the INSTANCEOF check, the sign bit check, and continuation creation or reuse, exactly as described in the sign bit trick section. After the prelude, the COROUTINE_SUSPENDED sentinel is loaded once and cached, and the TABLESWITCH dispatches based on label:
INVOKESTATIC getCOROUTINE_SUSPENDED; ASTORE $suspended
ALOAD $cont; GETFIELD label
TABLESWITCH 0..3:
0 -> state_0
1 -> state_1
2 -> state_2
3 -> state_3
default -> throw IllegalStateException
State 0 is the initial entry. The compiler spills id into the continuation (because it's needed after the first suspension), sets label = 1, and calls authenticate. If the call returns COROUTINE_SUSPENDED, the function returns immediately, releasing the thread:
state_0:
ALOAD $cont; ILOAD id; PUTFIELD I$0 // spill id
ALOAD $cont; ICONST 1; PUTFIELD label // set next state
ALOAD $cont; INVOKEVIRTUAL authenticate // suspend call
DUP; ALOAD $suspended; IF_ACMPNE -> state_0_continue
ARETURN // suspended: release the thread
State 1 is the resume point after authenticate() completes. The transformer first checks for exceptions (if resumeWith was called with a failure), then unspills id from the continuation and stores the result as token:
state_1:
ALOAD $result; INVOKESTATIC throwOnFailure // throw if failure
ALOAD $cont; GETFIELD I$0; ISTORE id // unspill id
ALOAD $result; ASTORE token // authenticate's result
Execution then falls through to prepare for the second suspension. No variables need to be spilled here: id and token are consumed as arguments to fetch, and neither is referenced after the call returns. The only value needed after resumption is data, which arrives through $result:
state_0_continue:
ALOAD $cont; ICONST 2; PUTFIELD label // set next state
ILOAD id; ALOAD token; ALOAD $cont; INVOKEVIRTUAL fetch
DUP; ALOAD $suspended; IF_ACMPNE -> state_1_continue
ARETURN // suspended
State 2 resumes after fetch(). The result becomes data:
state_2:
ALOAD $result; INVOKESTATIC throwOnFailure
ALOAD $result; ASTORE data
The final call to process(data) is a tail call. The compiler still sets label = 3 and checks for COROUTINE_SUSPENDED, but no spilling is needed because nothing follows the call:
state_1_continue:
ALOAD $cont; ICONST 3; PUTFIELD label
ALOAD data; ALOAD $cont; INVOKEVIRTUAL process
DUP; ALOAD $suspended; IF_ACMPNE -> state_2_continue
ARETURN
state_3:
ALOAD $result; INVOKESTATIC throwOnFailure
ALOAD $result
state_2_continue:
ARETURN // return the final result
This is the complete transformation. The sequential three line function has become a state machine with four states, field spilling for one local variable (id, live across the first suspension), exception checking at each resume point, and a TABLESWITCH dispatch at the entry.
The JS and Wasm backends: A different approach
The JVM backend performs the state machine transformation at the bytecode level using ASM tree manipulation. The JS and Wasm backends take a fundamentally different approach: they build the state machine entirely in IR.
AbstractSuspendFunctionsLowering provides the common framework:
abstract class AbstractSuspendFunctionsLowering<C : CommonBackendContext>(val context: C) {
protected abstract val stateMachineMethodName: Name
protected abstract fun buildStateMachine(
stateMachineFunction: IrFunction,
transformingFunction: IrFunction,
argumentToPropertiesMap: Map<IrValueParameter, IrField>,
)
}
The JS backend's StateMachineBuilder creates SuspendState nodes directly in the IR tree, where each state represents an atomic block of code between two suspension points:
class SuspendState(type: IrType) {
val entryBlock: IrContainerExpression = JsIrBuilder.buildComposite(type)
val successors = mutableSetOf<SuspendState>()
var id = -1
}
The JS backend also classifies suspend functions into three categories before deciding what to generate:
sealed class SuspendFunctionKind {
object NO_SUSPEND_CALLS : SuspendFunctionKind()
class DELEGATING(val delegatingCall: IrCall) : SuspendFunctionKind()
object NEEDS_STATE_MACHINE : SuspendFunctionKind()
}
Functions with no suspend calls become plain functions. Functions with a single tail position suspend call become simple delegations. Only functions that genuinely need a state machine get one. This classification avoids unnecessary overhead in the generated JavaScript.
Inline suspend functions: Transplanting state machines
Inline suspend functions follow yet another path. When a suspend function is marked inline, the compiler does not generate a state machine for it. Instead, the function's body is copied directly into the caller's bytecode by the inliner, and the caller's state machine absorbs the inlined suspension points.
This means an inline suspend function like withContext or coroutineScope does not produce its own continuation class or TABLESWITCH. Its suspension points become part of the calling function's state machine, with the caller's continuation handling the spilling and dispatching.
To support this, the compiler generates two copies of every inline suspend function during code generation:
- A normal version with a state machine, used when the function is called from non inlined contexts (for example, through a function reference)
- A version named
foo$$forInlinewithout a state machine, retaining the suspend markers, for the inliner to consume
The SuspendForInlineCopyingMethodVisitor in SuspendFunctionGenerationStrategy.kt handles this duplication:
class SuspendForInlineCopyingMethodVisitor(...) : TransformationMethodVisitor(...) {
override fun performTransformations(methodNode: MethodNode) {
methodNode.preprocessSuspendMarkers(forInline = false, keepFakeContinuation = false)
newMethodNode.preprocessSuspendMarkers(forInline = true, keepFakeContinuation = true)
newMethodNode.accept(newMethodVisitor)
}
}
The forInline = true copy keeps the fake continuation markers intact so the inliner can later replace them with the actual caller's continuation. The forInline = false copy strips the markers and proceeds through the normal state machine transformation.
This dual copy approach is why inline suspend functions have essentially zero overhead when inlined: their suspension points merge directly into the caller's state machine, sharing the same continuation object and spill fields.
Real world implications: What this means for your code
The compiler machinery has direct implications for how you write and debug Kotlin code.
Stack traces and debugging
Coroutine stack traces show the state machine internals rather than your original code flow. When you see MyClass$fetchData$1.invokeSuspend(MyClass.kt:42), the $fetchData$1 is the generated continuation class, and invokeSuspend is the state machine's re entry point. The line number corresponds to the suspension point in your original source. If a coroutine appears stuck, you can inspect the continuation's label field (via kotlinx-coroutines-debug or a debugger) to identify exactly which suspension point it's waiting at.
Memory retention across suspension
Local variables promoted to continuation fields remain in memory for the lifetime of the coroutine. If you allocate a large bitmap in one state and the coroutine suspends for a long time in a later state, that bitmap lives in the continuation's L$0 field until the coroutine completes or the variable is overwritten. This is a common source of unexpected memory pressure in long running coroutines. The mitigation is straightforward: set large references to null after you no longer need them, or restructure your code so the large allocation and the long suspension are in different functions.
The importance of the fast path
In production systems, many suspend function calls complete synchronously. A channel send() that finds space in the buffer, a Mutex.lock() on an uncontested lock, a Deferred.await() on an already completed computation: these all return their result directly without suspending. The fast path (checking result != COROUTINE_SUSPENDED and continuing) means these calls have negligible overhead compared to non suspend calls. This is why using suspend functions liberally in your API design is not a performance concern in most cases.
Tail calls in practice
Knowing that the compiler can optimize tail call suspend functions means you can write delegation patterns efficiently:
suspend fun fetchConditionally(id: Int): Data {
return if (id > 0) fetchFromNetwork(id) else fetchFromCache(id)
// both branches are tail calls
}
Because both suspend calls are in tail position and not inside try-catch blocks, this function does not need a state machine. The compiler generates minimal COROUTINE_SUSPENDED checks instead. Note that wrapping a tail call in try-catch disqualifies it from this optimization, since the suspension point falls within the exception handler's range. If you add logging after a suspend call, or wrap it in try-catch, the optimization disappears and a full state machine is generated.
Conclusion
In this article, you've explored the complete compiler pipeline that transforms Kotlin's suspend keyword into JVM state machines. You've traced through CPS transformation (adding the hidden $completion parameter), continuation class generation (with the sign bit trick for distinguishing fresh calls from resumptions), suspension point collection (via marker instructions), variable spilling (saving live locals to continuation fields), TABLESWITCH generation (dispatching to the correct resume point), and tail call optimization (skipping the state machine when possible).
These internals directly inform how you reason about coroutine behavior. The sign bit trick explains why recursive suspend calls work correctly. Variable spilling explains why large objects referenced across suspension points can cause memory pressure. The fast path optimization explains why many suspend calls have negligible overhead. Tail call optimization explains why simple delegation functions are nearly free. These are the mechanics that determine coroutine performance characteristics in production systems.
Whether you're debugging a coroutine that seems stuck (check the label field to see which suspension point it's waiting at), optimizing a hot path that calls suspend functions in a tight loop (ensure synchronous completion hits the fast path), or designing coroutine based architectures (understand the per invocation allocation cost and spill overhead), this knowledge of the compiler machinery gives you the foundation for writing correct, performant Kotlin code. Coroutines are not a library abstraction. They are a compiler level solution, and the depth of that solution is what makes the suspend keyword work as well as it does.
As always, happy coding!
— Jaewoong

