Skip to content

Streaming

LangGraph is built with first class support for streaming. There are several different ways to stream back outputs from a graph run

Streaming graph outputs (.stream and .astream)

.stream and .astream are sync and async methods for streaming back outputs from a graph run. There are several different modes you can specify when calling these methods (e.g. `graph.stream(..., mode="...")):

  • "values": This streams the full value of the state after each step of the graph.
  • "updates": This streams the updates to the state after each step of the graph. If multiple updates are made in the same step (e.g. multiple nodes are run) then those updates are streamed separately.
  • "debug": This streams as much information as possible throughout the execution of the graph.

The below visualization shows the difference between the values and updates modes:

values vs updates

Streaming LLM tokens and events (.astream_events)

In addition, you can use the astream_events method to stream back events that happen inside nodes. This is useful for streaming tokens of LLM calls.

This is a standard method on all LangChain objects. This means that as the graph is executed, certain events are emitted along the way and can be seen if you run the graph using .astream_events.

All events have (among other things) event, name, and data fields. What do these mean?

  • event: This is the type of event that is being emitted. You can find a detailed table of all callback events and triggers here.
  • name: This is the name of event.
  • data: This is the data associated with the event.

What types of things cause events to be emitted?

  • each node (runnable) emits on_chain_start when it starts execution, on_chain_stream during the node execution and on_chain_end when the node finishes. Node events will have the node name in the event's name field
  • the graph will emit on_chain_start in the beginning of the graph execution, on_chain_stream after each node execution and on_chain_end when the graph finishes. Graph events will have the LangGraph in the event's name field
  • Any writes to state channels (i.e. anytime you update the value of one of your state keys) will emit on_chain_start and on_chain_end events

Additionally, any events that are created inside your nodes (LLM events, tool events, manually emitted events, etc.) will also be visible in the output of .astream_events.

To make this more concrete and to see what this looks like, let's see what events are returned when we run a simple graph:

from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, MessagesState, START, END

model = ChatOpenAI(model="gpt-4o-mini")


def call_model(state: MessagesState):
    response = model.invoke(state['messages'])
    return {"messages": response}

workflow = StateGraph(MessagesState)
workflow.add_node(call_model)
workflow.add_edge(START, "call_model")
workflow.add_edge("call_model", END)
app = workflow.compile()

inputs = [{"role": "user", "content": "hi!"}]
async for event in app.astream_events({"messages": inputs}, version="v1"):
    kind = event["event"]
    print(f"{kind}: {event['name']}")
on_chain_start: LangGraph
on_chain_start: __start__
on_chain_end: __start__
on_chain_start: call_model
on_chat_model_start: ChatOpenAI
on_chat_model_stream: ChatOpenAI
on_chat_model_stream: ChatOpenAI
on_chat_model_stream: ChatOpenAI
on_chat_model_stream: ChatOpenAI
on_chat_model_stream: ChatOpenAI
on_chat_model_stream: ChatOpenAI
on_chat_model_stream: ChatOpenAI
on_chat_model_stream: ChatOpenAI
on_chat_model_stream: ChatOpenAI
on_chat_model_stream: ChatOpenAI
on_chat_model_stream: ChatOpenAI
on_chat_model_end: ChatOpenAI
on_chain_start: ChannelWrite<call_model,messages>
on_chain_end: ChannelWrite<call_model,messages>
on_chain_stream: call_model
on_chain_end: call_model
on_chain_stream: LangGraph
on_chain_end: LangGraph

We start with the overall graph start (on_chain_start: LangGraph). We then write to the __start__ node (this is special node to handle input). We then start the call_model node (on_chain_start: call_model). We then start the chat model invocation (on_chat_model_start: ChatOpenAI), stream back token by token (on_chat_model_stream: ChatOpenAI) and then finish the chat model (on_chat_model_end: ChatOpenAI). From there, we write the results back to the channel (ChannelWrite<call_model,messages>) and then finish the call_model node and then the graph as a whole.

This should hopefully give you a good sense of what events are emitted in a simple graph. But what data do these events contain? Each type of event contains data in a different format. Let's look at what on_chat_model_stream events look like. This is an important type of event since it is needed for streaming tokens from an LLM response.

These events look like:

{'event': 'on_chat_model_stream',
 'name': 'ChatOpenAI',
 'run_id': '3fdbf494-acce-402e-9b50-4eab46403859',
 'tags': ['seq:step:1'],
 'metadata': {'langgraph_step': 1,
  'langgraph_node': 'call_model',
  'langgraph_triggers': ['start:call_model'],
  'langgraph_task_idx': 0,
  'checkpoint_id': '1ef657a0-0f9d-61b8-bffe-0c39e4f9ad6c',
  'checkpoint_ns': 'call_model',
  'ls_provider': 'openai',
  'ls_model_name': 'gpt-4o-mini',
  'ls_model_type': 'chat',
  'ls_temperature': 0.7},
 'data': {'chunk': AIMessageChunk(content='Hello', id='run-3fdbf494-acce-402e-9b50-4eab46403859')},
 'parent_ids': []}
We can see that we have the event type and name (which we knew from before).

We also have a bunch of stuff in metadata. Noticeably, 'langgraph_node': 'call_model', is some really helpful information which tells us which node this model was invoked inside of.

Finally, data is a really important field. This contains the actual data for this event! Which in this case is an AIMessageChunk. This contains the content for the message, as well as an id. This is the ID of the overall AIMessage (not just this chunk) and is super helpful - it helps us track which chunks are part of the same message (so we can show them together in the UI).

This information contains all that is needed for creating a UI for streaming LLM tokens. You can see a guide for that here.

ASYNC IN PYTHON<=3.10

You may fail to see events being emitted from inside a node when using .astream_events in Python <= 3.10. If you're using a Langchain RunnableLambda, a RunnableGenerator, or Tool asynchronously inside your node, you will have to propagate callbacks to these objects manually. This is because LangChain cannot automatically propagate callbacks to child objects in this case. Please see examples here and here.

Comments