{ "cells": [ { "cell_type": "markdown", "id": "51466c8d-8ce4-4b3d-be4e-18fdbeda5f53", "metadata": {}, "source": [ "# How to stream LLM tokens from your graph\n", "\n", "In this example we will stream tokens from the language model powering an agent. We will use a ReAct agent as an example. The main thing to bear in mind here is that using [async nodes](../async/) typically offers the best behavior for this, since we will be using the `astream_events` method.\n", "\n", "This how-to guide closely follows the others in this directory, so we will call out differences with the **STREAMING** tag below (if you just want to search for those).\n", "\n", "
Note
\n", "\n",
" In this how-to, we will create our agent from scratch to be transparent (but verbose). You can accomplish similar functionality using the create_react_agent(model, tools=tool)
(API doc) constructor. This may be more appropriate if you are used to LangChain\u2019s AgentExecutor class.\n",
"
Note on Python < 3.11
\n", "\n",
" When using python 3.8, 3.9, or 3.10, please ensure you manually pass the RunnableConfig through to the llm when invoking it like so: llm.ainvoke(..., config)
.\n",
" The astream_events method collects all events from your nested code using a streaming tracer passed as a callback. In 3.11 and above, this is automatically handled via contextvar's; prior to 3.11, asyncio's tasks lacked proper contextvar support, meaning that the callbacks will only propagate if you manually pass the config through. We do this in the call_model
method below.\n",
"
Manual Callback Propagation
\n", "\n",
" Note that in call_model(state: State, config: RunnableConfig):
below, we a) accept the RunnableConfig in the node and b) pass this in as the second arg for llm.ainvoke(..., config)
. This is optional for python 3.11 and later. If you ever have a problem where the LLM tokens are not streamed when using `astream_events` and you are using an older version of python, it's worth checking to ensure that the callbacks are manually propagated.