Skip to content

Short Term Memory API Reference

Classes:

  • SummarizationNode –

    A LangGraph node that summarizes messages when they exceed a token limit and replaces them with a summary message.

  • SummarizationResult –

    Result of message summarization.

  • RunningSummary –

    Object for storing information about the previous summarization.

Functions:

  • summarize_messages –

    Summarize messages when they exceed a token limit and replace them with a summary message.

SummarizationNode

Bases: RunnableCallable

A LangGraph node that summarizes messages when they exceed a token limit and replaces them with a summary message.

Methods:

  • __init__ –

    A LangGraph node that summarizes messages when they exceed a token limit and replaces them with a summary message.

__init__

__init__(
    *,
    model: LanguageModelLike,
    max_tokens: int,
    max_summary_tokens: int = 256,
    token_counter: TokenCounter = count_tokens_approximately,
    initial_summary_prompt: ChatPromptTemplate = DEFAULT_INITIAL_SUMMARY_PROMPT,
    existing_summary_prompt: ChatPromptTemplate = DEFAULT_EXISTING_SUMMARY_PROMPT,
    final_prompt: ChatPromptTemplate = DEFAULT_FINAL_SUMMARY_PROMPT,
    input_messages_key: str = "messages",
    output_messages_key: str = "summarized_messages",
    name: str = "summarization",
) -> None

A LangGraph node that summarizes messages when they exceed a token limit and replaces them with a summary message.

Processes the messages from oldest to newest: once the cumulative number of message tokens reaches max_tokens, all messages within max_tokens are summarized (excluding the system message, if any) and replaced with a new summary message. The resulting list of messages is [summary_message] + remaining_messages.

Parameters:

  • model (LanguageModelLike) –

    The language model to use for generating summaries.

  • max_tokens (int) –

    Maximum number of tokens to return. Will also be used as a threshold for triggering the summarization: once the cumulative number of message tokens reaches max_tokens, all messages within max_tokens will be summarized.

    Note

    If the last message within max_tokens is an AI message with tool calls or a human message, this message will not be summarized, and instead will be added to the returned messages.

  • max_summary_tokens (int, default: 256 ) –

    Maximum number of tokens to budget for the summary.

    Note

    This parameter is not passed to the summary-generating LLM to limit the length of the summary. It is only used for correctly estimating the maximum allowed token budget. If you want to enforce it, you would need to pass model.bind(max_tokens=max_summary_tokens) as the model parameter to this function.

  • token_counter (TokenCounter, default: count_tokens_approximately ) –

    Function to count tokens in a message. Defaults to approximate counting. For more accurate counts you can use model.get_num_tokens_from_messages.

  • initial_summary_prompt (ChatPromptTemplate, default: DEFAULT_INITIAL_SUMMARY_PROMPT ) –

    Prompt template for generating the first summary.

  • existing_summary_prompt (ChatPromptTemplate, default: DEFAULT_EXISTING_SUMMARY_PROMPT ) –

    Prompt template for updating an existing (running) summary.

  • final_prompt (ChatPromptTemplate, default: DEFAULT_FINAL_SUMMARY_PROMPT ) –

    Prompt template that combines summary with the remaining messages before returning.

  • input_messages_key (str, default: 'messages' ) –

    Key in the input graph state that contains the list of messages to summarize.

  • output_messages_key (str, default: 'summarized_messages' ) –

    Key in the state update that contains the list of updated messages.

    Note

    output_messages_key must be different from the input_messages_key. This is done to decouple summarized messages from the main list of messages in the graph state (i.e., input_messages_key). If you want to update / overwrite the main list of messages, you would need to use summarize_messages function directly or wrap the invocation of this node in a different node.

  • name (str, default: 'summarization' ) –

    Name of the summarization node.

Returns:

  • None –

    LangGraph state update in the following format:

    {
        output_messages_key: <list of updated messages ready to be input to the LLM after summarization, including a message with a summary (if any)>,
        "context": {"running_summary": <RunningSummary object>}
    }
    

Example
>>> from typing import Any, TypedDict
>>> from langchain_openai import ChatOpenAI
>>> from langchain_core.messages import AnyMessage
>>> from langgraph.graph import StateGraph, START, MessagesState
>>> from langgraph.checkpoint.memory import InMemorySaver
>>> from langmem.short_term import SummarizationNode, RunningSummary
>>>
>>> model = ChatOpenAI(model="gpt-4o")
>>> summarization_model = model.bind(max_tokens=128)
>>>
>>> class State(MessagesState):
...     context: dict[str, Any]
...
>>> class LLMInputState(TypedDict):
...     summarized_messages: list[AnyMessage]
...     context: dict[str, Any]
...
>>> summarization_node = SummarizationNode(
...     model=summarization_model,
...     max_tokens=256,
...     max_summary_tokens=128,
... )
>>>
>>> def call_model(state: LLMInputState):
...     response = model.invoke(state["summarized_messages"])
...     return {"messages": [response]}
...
>>> checkpointer = InMemorySaver()
>>> workflow = StateGraph(State)
>>> workflow.add_node(call_model)
>>> workflow.add_node("summarize", summarization_node)
>>> workflow.add_edge(START, "summarize")
>>> workflow.add_edge("summarize", "call_model")
>>> graph = workflow.compile(checkpointer=checkpointer)
>>>
>>> config = {"configurable": {"thread_id": "1"}}
>>> graph.invoke({"messages": "hi, my name is bob"}, config)
>>> graph.invoke({"messages": "write a short poem about cats"}, config)
>>> graph.invoke({"messages": "now do the same but for dogs"}, config)
>>> graph.invoke({"messages": "what's my name?"}, config)

SummarizationResult dataclass

Result of message summarization.

Attributes:

  • messages (list[AnyMessage]) –

    List of updated messages that are ready to be input to the LLM after summarization, including a message with a summary (if any).

  • running_summary (RunningSummary | None) –

    Information about previous summarization (the summary and the IDs of the previously summarized messages.

messages instance-attribute

messages: list[AnyMessage]

List of updated messages that are ready to be input to the LLM after summarization, including a message with a summary (if any).

running_summary class-attribute instance-attribute

running_summary: RunningSummary | None = None

Information about previous summarization (the summary and the IDs of the previously summarized messages. Can be None if no summarization was performed (not enough messages to summarize).

RunningSummary dataclass

Object for storing information about the previous summarization.

Used on subsequent calls to summarize_messages to avoid summarizing the same messages.

Attributes:

  • summary (str) –

    Latest summary of the messages, updated every time the summarization is performed.

  • summarized_message_ids (set[str]) –

    The IDs of all of the messages that have been previously summarized.

summary instance-attribute

summary: str

Latest summary of the messages, updated every time the summarization is performed.

summarized_message_ids instance-attribute

summarized_message_ids: set[str]

The IDs of all of the messages that have been previously summarized.

summarize_messages

summarize_messages(
    messages: list[AnyMessage],
    *,
    running_summary: RunningSummary | None,
    model: LanguageModelLike,
    max_tokens: int,
    max_summary_tokens: int = 256,
    token_counter: TokenCounter = count_tokens_approximately,
    initial_summary_prompt: ChatPromptTemplate = DEFAULT_INITIAL_SUMMARY_PROMPT,
    existing_summary_prompt: ChatPromptTemplate = DEFAULT_EXISTING_SUMMARY_PROMPT,
    final_prompt: ChatPromptTemplate = DEFAULT_FINAL_SUMMARY_PROMPT,
) -> SummarizationResult

Summarize messages when they exceed a token limit and replace them with a summary message.

This function processes the messages from oldest to newest: once the cumulative number of message tokens reaches max_tokens, all messages within max_tokens are summarized (excluding the system message, if any) and replaced with a new summary message. The resulting list of messages is [summary_message] + remaining_messages.

Parameters:

  • messages (list[AnyMessage]) –

    The list of messages to process.

  • running_summary (RunningSummary | None) –

    Optional running summary object with information about the previous summarization. If provided: - only messages that were not previously summarized will be processed - if no new summary is generated, the running summary will be added to the returned messages - if a new summary needs to be generated, it is generated by incorporating the existing summary value from the running summary

  • model (LanguageModelLike) –

    The language model to use for generating summaries.

  • max_tokens (int) –

    Maximum number of tokens to return. Will also be used as a threshold for triggering the summarization: once the cumulative number of message tokens reaches max_tokens, all messages within max_tokens will be summarized.

    Note

    If the last message within max_tokens is an AI message with tool calls or a human message, this message will not be summarized, and instead will be added to the returned messages.

  • max_summary_tokens (int, default: 256 ) –

    Maximum number of tokens to budget for the summary.

    Note

    This parameter is not passed to the summary-generating LLM to limit the length of the summary. It is only used for correctly estimating the maximum allowed token budget. If you want to enforce it, you would need to pass model.bind(max_tokens=max_summary_tokens) as the model parameter to this function.

  • token_counter (TokenCounter, default: count_tokens_approximately ) –

    Function to count tokens in a message. Defaults to approximate counting. For more accurate counts you can use model.get_num_tokens_from_messages.

  • initial_summary_prompt (ChatPromptTemplate, default: DEFAULT_INITIAL_SUMMARY_PROMPT ) –

    Prompt template for generating the first summary.

  • existing_summary_prompt (ChatPromptTemplate, default: DEFAULT_EXISTING_SUMMARY_PROMPT ) –

    Prompt template for updating an existing (running) summary.

  • final_prompt (ChatPromptTemplate, default: DEFAULT_FINAL_SUMMARY_PROMPT ) –

    Prompt template that combines summary with the remaining messages before returning.

Returns:

  • SummarizationResult –

    A SummarizationResult object containing the updated messages and a running summary. - messages: list of updated messages ready to be input to the LLM - running_summary: RunningSummary object - summary: text of the latest summary - summarized_message_ids: set of message IDs that were previously summarized

Example
>>> from langgraph.graph import StateGraph, START, MessagesState
>>> from langgraph.checkpoint.memory import InMemorySaver
>>> from langmem.short_term import summarize_messages, RunningSummary
>>> from langchain_openai import ChatOpenAI

>>> model = ChatOpenAI(model="gpt-4o")
>>> summarization_model = model.bind(max_tokens=128)

>>> class SummaryState(MessagesState):
...     summary: RunningSummary | None

>>> def call_model(state):
...     summarization_result = summarize_messages(
...         state["messages"],
...         running_summary=state.get("summary"),
...         model=summarization_model,
...         max_tokens=256,
...         max_summary_tokens=128
...     )
...     response = model.invoke(summarization_result.messages)
...     state_update = {"messages": [response]}
...     if summarization_result.running_summary:
...         state_update["summary"] = summarization_result.running_summary
...     return state_update

>>> checkpointer = InMemorySaver()
>>> workflow = StateGraph(SummaryState)
>>> workflow.add_node(call_model)
>>> workflow.add_edge(START, "call_model")
>>> graph = workflow.compile(checkpointer=checkpointer)

>>> config = {"configurable": {"thread_id": "1"}}
>>> graph.invoke({"messages": "hi, my name is bob"}, config)
>>> graph.invoke({"messages": "write a short poem about cats"}, config)
>>> graph.invoke({"messages": "now do the same but for dogs"}, config)
>>> graph.invoke({"messages": "what's my name?"}, config)

Comments