Short Term Memory API Reference¶
Classes:
-
SummarizationNode
–A LangGraph node that summarizes messages when they exceed a token limit and replaces them with a summary message.
-
SummarizationResult
–Result of message summarization.
-
RunningSummary
–Object for storing information about the previous summarization.
Functions:
-
summarize_messages
–Summarize messages when they exceed a token limit and replace them with a summary message.
SummarizationNode
¶
Bases: RunnableCallable
A LangGraph node that summarizes messages when they exceed a token limit and replaces them with a summary message.
Methods:
-
__init__
–A LangGraph node that summarizes messages when they exceed a token limit and replaces them with a summary message.
__init__
¶
__init__(
*,
model: LanguageModelLike,
max_tokens: int,
max_summary_tokens: int = 256,
token_counter: TokenCounter = count_tokens_approximately,
initial_summary_prompt: ChatPromptTemplate = DEFAULT_INITIAL_SUMMARY_PROMPT,
existing_summary_prompt: ChatPromptTemplate = DEFAULT_EXISTING_SUMMARY_PROMPT,
final_prompt: ChatPromptTemplate = DEFAULT_FINAL_SUMMARY_PROMPT,
input_messages_key: str = "messages",
output_messages_key: str = "summarized_messages",
name: str = "summarization",
) -> None
A LangGraph node that summarizes messages when they exceed a token limit and replaces them with a summary message.
Processes the messages from oldest to newest: once the cumulative number of message tokens
reaches max_tokens
, all messages within max_tokens
are summarized (excluding the system message, if any)
and replaced with a new summary message. The resulting list of messages is [summary_message] + remaining_messages.
Parameters:
-
model
(LanguageModelLike
) –The language model to use for generating summaries.
-
max_tokens
(int
) –Maximum number of tokens to return. Will also be used as a threshold for triggering the summarization: once the cumulative number of message tokens reaches
max_tokens
, all messages withinmax_tokens
will be summarized.Note
If the last message within max_tokens is an AI message with tool calls or a human message, this message will not be summarized, and instead will be added to the returned messages.
-
max_summary_tokens
(int
, default:256
) –Maximum number of tokens to budget for the summary.
Note
This parameter is not passed to the summary-generating LLM to limit the length of the summary. It is only used for correctly estimating the maximum allowed token budget. If you want to enforce it, you would need to pass
model.bind(max_tokens=max_summary_tokens)
as themodel
parameter to this function. -
token_counter
(TokenCounter
, default:count_tokens_approximately
) –Function to count tokens in a message. Defaults to approximate counting. For more accurate counts you can use
model.get_num_tokens_from_messages
. -
initial_summary_prompt
(ChatPromptTemplate
, default:DEFAULT_INITIAL_SUMMARY_PROMPT
) –Prompt template for generating the first summary.
-
existing_summary_prompt
(ChatPromptTemplate
, default:DEFAULT_EXISTING_SUMMARY_PROMPT
) –Prompt template for updating an existing (running) summary.
-
final_prompt
(ChatPromptTemplate
, default:DEFAULT_FINAL_SUMMARY_PROMPT
) –Prompt template that combines summary with the remaining messages before returning.
-
input_messages_key
(str
, default:'messages'
) –Key in the input graph state that contains the list of messages to summarize.
-
output_messages_key
(str
, default:'summarized_messages'
) –Key in the state update that contains the list of updated messages.
Note
output_messages_key
must be different from theinput_messages_key
. This is done to decouple summarized messages from the main list of messages in the graph state (i.e.,input_messages_key
). If you want to update / overwrite the main list of messages, you would need to usesummarize_messages
function directly or wrap the invocation of this node in a different node. -
name
(str
, default:'summarization'
) –Name of the summarization node.
Returns:
-
None
–
Example
>>> from typing import Any, TypedDict
>>> from langchain_openai import ChatOpenAI
>>> from langchain_core.messages import AnyMessage
>>> from langgraph.graph import StateGraph, START, MessagesState
>>> from langgraph.checkpoint.memory import InMemorySaver
>>> from langmem.short_term import SummarizationNode, RunningSummary
>>>
>>> model = ChatOpenAI(model="gpt-4o")
>>> summarization_model = model.bind(max_tokens=128)
>>>
>>> class State(MessagesState):
... context: dict[str, Any]
...
>>> class LLMInputState(TypedDict):
... summarized_messages: list[AnyMessage]
... context: dict[str, Any]
...
>>> summarization_node = SummarizationNode(
... model=summarization_model,
... max_tokens=256,
... max_summary_tokens=128,
... )
>>>
>>> def call_model(state: LLMInputState):
... response = model.invoke(state["summarized_messages"])
... return {"messages": [response]}
...
>>> checkpointer = InMemorySaver()
>>> workflow = StateGraph(State)
>>> workflow.add_node(call_model)
>>> workflow.add_node("summarize", summarization_node)
>>> workflow.add_edge(START, "summarize")
>>> workflow.add_edge("summarize", "call_model")
>>> graph = workflow.compile(checkpointer=checkpointer)
>>>
>>> config = {"configurable": {"thread_id": "1"}}
>>> graph.invoke({"messages": "hi, my name is bob"}, config)
>>> graph.invoke({"messages": "write a short poem about cats"}, config)
>>> graph.invoke({"messages": "now do the same but for dogs"}, config)
>>> graph.invoke({"messages": "what's my name?"}, config)
SummarizationResult
dataclass
¶
Result of message summarization.
Attributes:
-
messages
(list[AnyMessage]
) –List of updated messages that are ready to be input to the LLM after summarization, including a message with a summary (if any).
-
running_summary
(RunningSummary | None
) –Information about previous summarization (the summary and the IDs of the previously summarized messages.
messages
instance-attribute
¶
messages: list[AnyMessage]
List of updated messages that are ready to be input to the LLM after summarization, including a message with a summary (if any).
running_summary
class-attribute
instance-attribute
¶
running_summary: RunningSummary | None = None
Information about previous summarization (the summary and the IDs of the previously summarized messages. Can be None if no summarization was performed (not enough messages to summarize).
RunningSummary
dataclass
¶
Object for storing information about the previous summarization.
Used on subsequent calls to summarize_messages to avoid summarizing the same messages.
Attributes:
-
summary
(str
) –Latest summary of the messages, updated every time the summarization is performed.
-
summarized_message_ids
(set[str]
) –The IDs of all of the messages that have been previously summarized.
summarize_messages
¶
summarize_messages(
messages: list[AnyMessage],
*,
running_summary: RunningSummary | None,
model: LanguageModelLike,
max_tokens: int,
max_summary_tokens: int = 256,
token_counter: TokenCounter = count_tokens_approximately,
initial_summary_prompt: ChatPromptTemplate = DEFAULT_INITIAL_SUMMARY_PROMPT,
existing_summary_prompt: ChatPromptTemplate = DEFAULT_EXISTING_SUMMARY_PROMPT,
final_prompt: ChatPromptTemplate = DEFAULT_FINAL_SUMMARY_PROMPT,
) -> SummarizationResult
Summarize messages when they exceed a token limit and replace them with a summary message.
This function processes the messages from oldest to newest: once the cumulative number of message tokens
reaches max_tokens
, all messages within max_tokens
are summarized (excluding the system message, if any)
and replaced with a new summary message. The resulting list of messages is [summary_message] + remaining_messages.
Parameters:
-
messages
(list[AnyMessage]
) –The list of messages to process.
-
running_summary
(RunningSummary | None
) –Optional running summary object with information about the previous summarization. If provided: - only messages that were not previously summarized will be processed - if no new summary is generated, the running summary will be added to the returned messages - if a new summary needs to be generated, it is generated by incorporating the existing summary value from the running summary
-
model
(LanguageModelLike
) –The language model to use for generating summaries.
-
max_tokens
(int
) –Maximum number of tokens to return. Will also be used as a threshold for triggering the summarization: once the cumulative number of message tokens reaches
max_tokens
, all messages withinmax_tokens
will be summarized.Note
If the last message within
max_tokens
is an AI message with tool calls or a human message, this message will not be summarized, and instead will be added to the returned messages. -
max_summary_tokens
(int
, default:256
) –Maximum number of tokens to budget for the summary.
Note
This parameter is not passed to the summary-generating LLM to limit the length of the summary. It is only used for correctly estimating the maximum allowed token budget. If you want to enforce it, you would need to pass
model.bind(max_tokens=max_summary_tokens)
as themodel
parameter to this function. -
token_counter
(TokenCounter
, default:count_tokens_approximately
) –Function to count tokens in a message. Defaults to approximate counting. For more accurate counts you can use
model.get_num_tokens_from_messages
. -
initial_summary_prompt
(ChatPromptTemplate
, default:DEFAULT_INITIAL_SUMMARY_PROMPT
) –Prompt template for generating the first summary.
-
existing_summary_prompt
(ChatPromptTemplate
, default:DEFAULT_EXISTING_SUMMARY_PROMPT
) –Prompt template for updating an existing (running) summary.
-
final_prompt
(ChatPromptTemplate
, default:DEFAULT_FINAL_SUMMARY_PROMPT
) –Prompt template that combines summary with the remaining messages before returning.
Returns:
-
SummarizationResult
–A SummarizationResult object containing the updated messages and a running summary. - messages: list of updated messages ready to be input to the LLM - running_summary: RunningSummary object - summary: text of the latest summary - summarized_message_ids: set of message IDs that were previously summarized
Example
>>> from langgraph.graph import StateGraph, START, MessagesState
>>> from langgraph.checkpoint.memory import InMemorySaver
>>> from langmem.short_term import summarize_messages, RunningSummary
>>> from langchain_openai import ChatOpenAI
>>> model = ChatOpenAI(model="gpt-4o")
>>> summarization_model = model.bind(max_tokens=128)
>>> class SummaryState(MessagesState):
... summary: RunningSummary | None
>>> def call_model(state):
... summarization_result = summarize_messages(
... state["messages"],
... running_summary=state.get("summary"),
... model=summarization_model,
... max_tokens=256,
... max_summary_tokens=128
... )
... response = model.invoke(summarization_result.messages)
... state_update = {"messages": [response]}
... if summarization_result.running_summary:
... state_update["summary"] = summarization_result.running_summary
... return state_update
>>> checkpointer = InMemorySaver()
>>> workflow = StateGraph(SummaryState)
>>> workflow.add_node(call_model)
>>> workflow.add_edge(START, "call_model")
>>> graph = workflow.compile(checkpointer=checkpointer)
>>> config = {"configurable": {"thread_id": "1"}}
>>> graph.invoke({"messages": "hi, my name is bob"}, config)
>>> graph.invoke({"messages": "write a short poem about cats"}, config)
>>> graph.invoke({"messages": "now do the same but for dogs"}, config)
>>> graph.invoke({"messages": "what's my name?"}, config)