(2) Pull a Llama 3 model here. You can
also try Mistral models here, one of the
quantized Cohere Command-R models, or
any other model you'd like to try from the
Ollama library - just be sure that your computer
has sufficient RAM.
Now that you've chosen and set up your local models, load and index some source
documents. The code below uses some of
Lilian Weng's blog posts on LLMs and agents as
a data source, then loads them into a demo
MemoryVectorStore
instance. It then creates a
retriever from that
vector store for later use.
import{RecursiveCharacterTextSplitter}from"langchain/text_splitter";import{CheerioWebBaseLoader}from"@langchain/community/document_loaders/web/cheerio";import{MemoryVectorStore}from"langchain/vectorstores/memory";import{OllamaEmbeddings}from"@langchain/ollama";consturls=["https://lilianweng.github.io/posts/2023-06-23-agent/","https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/","https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/",];constdocs=awaitPromise.all(urls.map((url)=>{constloader=newCheerioWebBaseLoader(url);returnloader.load();}));constdocsList=docs.flat();consttextSplitter=newRecursiveCharacterTextSplitter({chunkSize:250,chunkOverlap:0,});constsplitDocs=awaittextSplitter.splitDocuments(docsList);constembeddings=newOllamaEmbeddings({model:"mxbai-embed-large",});// Add to vector storeconstvectorStore=awaitMemoryVectorStore.fromDocuments(splitDocs,embeddings,);constretriever=vectorStore.asRetriever();
First, create a chain that will route incoming questions towards either your
vector store if they are related to LLMs or agents, or to a general web search
if they are not.
You'll use Ollama's
JSON mode
to help keep the output format consistent.
import{ChatPromptTemplate}from"@langchain/core/prompts";import{JsonOutputParser}from"@langchain/core/output_parsers";import{ChatOllama}from"@langchain/ollama";constjsonModeLlm=newChatOllama({model:"llama3",format:"json",temperature:0,});constQUESTION_ROUTER_SYSTEM_TEMPLATE=`You are an expert at routing a user question to a vectorstore or web search.Use the vectorstore for questions on LLM agents, prompt engineering, and adversarial attacks.You do not need to be stringent with the keywords in the question related to these topics.Otherwise, use web-search. Give a binary choice 'web_search' or 'vectorstore' based on the question.Return the a JSON with a single key 'datasource' and no preamble or explanation.`;constquestionRouterPrompt=ChatPromptTemplate.fromMessages([["system",QUESTION_ROUTER_SYSTEM_TEMPLATE],["human","{question}"],]);constquestionRouter=questionRouterPrompt.pipe(jsonModeLlm).pipe(newJsonOutputParser(),);awaitquestionRouter.invoke({question:"llm agent memory"});
{ datasource: 'vectorstore' }
Above, notice that you invoked the router with a query related to the knowledge
our vector store contains, so it responds accordingly. Here's what happens if
you ask something irrelevant:
Create a grader that will check retrieved documents from our vector store for
relevancy:
constGRADER_TEMPLATE=`You are a grader assessing relevance of a retrieved document to a user question.Here is the retrieved document:<document>{content}</document>Here is the user question:<question>{question}</question>If the document contains keywords related to the user question, grade it as relevant.It does not need to be a stringent test. The goal is to filter out erroneous retrievals.Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question.Provide the binary score as a JSON with a single key 'score' and no preamble or explanation.`;constgraderPrompt=ChatPromptTemplate.fromTemplate(GRADER_TEMPLATE);constretrievalGrader=graderPrompt.pipe(jsonModeLlm).pipe(newJsonOutputParser(),);// Test runconsttestQuestion="agent memory";constdocs2=awaitretriever.invoke(testQuestion);awaitretrievalGrader.invoke({question:testQuestion,content:docs2[0].pageContent,});
{ score: 'yes' }
And you can see that it marks the first retrieved document as related to
"agent memory".
Next, create a chain that generates an answer based on retrieved documents.
import*ashubfrom"langchain/hub";import{StringOutputParser}from"@langchain/core/output_parsers";importtype{Document}from"@langchain/core/documents";// https://smith.langchain.com/hub/rlm/rag-promptconstragPrompt=awaithub.pull("rlm/rag-prompt");// Post-processingconstformatDocs=(docs:Document[])=>{returndocs.map((doc)=>doc.pageContent).join("\n\n");};// Initialize a new model without JSON mode activeconstllm=newChatOllama({model:"llama3",temperature:0,});// ChainconstragChain=ragPrompt.pipe(llm).pipe(newStringOutputParser());// Test runconsttestQuestion2="agent memory";constdocs3=awaitretriever.invoke(testQuestion2);awaitragChain.invoke({context:formatDocs(docs3),question:testQuestion2});
Based on the provided context, it appears that an agent's memory refers to its ability to record and reflect on past experiences, using both long-term and short-term memory modules. The long-term memory module, or "memory stream," stores a comprehensive list of agents' experiences in natural language, while the reflection mechanism synthesizes these memories into higher-level inferences over time to guide future behavior.
Create a chain that reviews a generated answer and checks for hallucinations.
We'll return to using JSON mode for this one:
constHALLUCINATION_GRADER_TEMPLATE=`You are a grader assessing whether an answer is grounded in / supported by a set of facts.Here are the facts used as context to generate the answer:<context>{context} </context>Here is the answer:<answer>{generation}</answer>Give a binary score 'yes' or 'no' score to indicate whether the answer is grounded in / supported by a set of facts.Provide the binary score as a JSON with a single key 'score' and no preamble or explanation.`;consthallucinationGraderPrompt=ChatPromptTemplate.fromTemplate(HALLUCINATION_GRADER_TEMPLATE,);consthallucinationGrader=hallucinationGraderPrompt.pipe(llm).pipe(newJsonOutputParser(),);// Test runconstgeneration2=awaitragChain.invoke({context:formatDocs(docs3),question:testQuestion2,});awaithallucinationGrader.invoke({context:formatDocs(docs3),generation:generation2});
Create a chain for checking the relevancy of the final answer:
constANSWER_GRADER_PROMPT_TEMPLATE=`You are a grader assessing whether an answer is useful to resolve a question.Here is the answer:<answer>{generation} </answer>Here is the question:<question>{question}</question>Give a binary score 'yes' or 'no' to indicate whether the answer is useful to resolve a question.Provide the binary score as a JSON with a single key 'score' and no preamble or explanation.`;constanswerGraderPrompt=ChatPromptTemplate.fromTemplate(ANSWER_GRADER_PROMPT_TEMPLATE,);constanswerGrader=answerGraderPrompt.pipe(jsonModeLlm).pipe(newJsonOutputParser(),);// Test runconstgeneration3=awaitragChain.invoke({context:formatDocs(docs3),question:testQuestion2,});awaitanswerGrader.invoke({question:testQuestion2,generation:generation3});
Create a question rewriter. This chain performs
query analysis
on the user questions and optimizes them for RAG to help handle difficult
queries.
constREWRITER_PROMPT_TEMPLATE=`You a question re-writer that converts an input question to a better version that is optimizedfor vectorstore retrieval. Look at the initial and formulate an improved question.Here is the initial question:<question>{question}</question>Respond only with an improved question. Do not include any preamble or explanation.`;constrewriterPrompt=ChatPromptTemplate.fromTemplate(REWRITER_PROMPT_TEMPLATE,);constrewriter=rewriterPrompt.pipe(llm).pipe(newStringOutputParser());// Test run// Test question is "agent memory"awaitrewriter.invoke({question:testQuestion2});
Finally, you'll need a web search tool that can handle questions out of scope
from the indexed documents. The code below initializes a
Tavily-powered
search tool
[{"title":"Family Friendly Burger Restaurant | Red Robin","url":"https://www.redrobin.com/","content":"Red Robin is donating 10¢ to Make-A-Wish ® for every Kids Meal purchased. You can contribute to life-changing wishes by simply purchasing a Kids Meal at Red Robin for Dine-in or To-Go. Join us for a memorable meal or order online and help transform lives, one wish at a time.","score":0.998043,"raw_content":null},{"title":"Red Robin United States of America Directory","url":"https://locations.redrobin.com/locations-list/us/","content":"Maps, Driving Directions and Local Restaurant Information for Red Robin Restaurants in United States","score":0.99786776,"raw_content":null},{"title":"Red Robin Restaurant Locations","url":"https://locations.redrobin.com/","content":"Maps, Driving Directions and Local Restaurant Information for Red Robin","score":0.99718815,"raw_content":null}]
Define the graph state like this. Since question and generation are simple
strings, we can use null as a shorthand for default behavior:
importtype{Document}from"@langchain/core/documents";import{Annotation}from"@langchain/langgraph";// This defines the agent state.// Returned documents from a node will override the current// "documents" value in the state object.constGraphState=Annotation.Root({question:Annotation<string>,generation:Annotation<string>,documents:Annotation<Document[]>({reducer:(_,y)=>y,default:()=>[],})})
Let's wrap our components in functions that match the interfaces required by
LangGraph. These functions will handle formatting inputs and outputs.
We'll use some components within nodes, and others to define conditional edges.
Each will take the graph state as a parameter. Nodes return state properties to
be updated, while conditional edges return the name of the next node to execute.
import{Document}from"@langchain/core/documents";/* ---Nodes--- */// Retrieve documents for a questionconstretrieve=async(state:typeofGraphState.State):Promise<Partial<typeofGraphState.State>>=>{console.log("---RETRIEVE---");constdocuments=awaitretriever.invoke(state.question);// Add sources to the statereturn{documents};};// RAG generationconstgenerate=async(state:typeofGraphState.State):Promise<Partial<typeofGraphState.State>>=>{console.log("---GENERATE---");constgeneration=awaitragChain.invoke({context:formatDocs(state.documents),question:state.question,});// Add generation to the statereturn{generation};};// Determines whether the retrieved documents are relevant to the question.constgradeDocuments=async(state:typeofGraphState.State):Promise<Partial<typeofGraphState.State>>=>{console.log("---CHECK DOCUMENT RELEVANCE TO QUESTION---");// Score each docconstrelevantDocs:Document[]=[];for(constdocofstate.documents){constgrade:{score:string}=awaitretrievalGrader.invoke({question:state.question,content:doc.pageContent,});if(grade.score==="yes"){console.log("---GRADE: DOCUMENT RELEVANT---");relevantDocs.push(doc);}else{console.log("---GRADE: DOCUMENT NOT RELEVANT---");}}return{documents:relevantDocs};};// Re-write questionconsttransformQuery=async(state:typeofGraphState.State):Promise<Partial<typeofGraphState.State>>=>{console.log("---TRANSFORM QUERY---");constbetterQuestion=awaitrewriter.invoke({question:state.question});return{question:betterQuestion};};// Web search based on the re-phrased questionconstwebSearch=async(state:typeofGraphState.State):Promise<Partial<typeofGraphState.State>>=>{console.log("---WEB SEARCH---");conststringifiedSearchResults=awaitwebSearchTool.invoke(state.question);return{documents:[newDocument({pageContent:stringifiedSearchResults})],};};/* ---Edges--- */// Decide on the datasource to route the initial question to.constrouteQuestion=async(state:typeofGraphState.State)=>{constsource:{datasource:string}=awaitquestionRouter.invoke({question:state.question,});if(source.datasource==="web_search"){console.log(`---ROUTING QUESTION "${state.question} TO WEB SEARCH---`);return"web_search";}else{console.log(`---ROUTING QUESTION "${state.question} TO RAG---`);return"retrieve";}};// Decide whether the current documents are sufficiently relevant// to come up with a good answer.constdecideToGenerate=async(state:typeofGraphState.State)=>{constfilteredDocuments=state.documents;// All documents have been filtered as irrelevant// Regenerate a new query and try againif(filteredDocuments.length===0){console.log("---DECISION: ALL DOCUMENTS ARE NOT RELEVANT TO QUESTION, TRANSFORM QUERY---",);return"transform_query";}else{// We have relevant documents, so generate answer.console.log("---DECISION: GENERATE---");return"generate";}};// Determines whether the generation is grounded in the document and answers question.constgradeGenerationDocumentsAndQuestion=async(state:typeofGraphState.State,)=>{consthallucinationGrade:{score:string}=awaithallucinationGrader.invoke({generation:state.generation,context:formatDocs(state.documents),});// Check for hallucinationif(hallucinationGrade.score==="yes"){console.log("---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---");// Check question answeringconsole.log("---GRADING GENERATION vs. QUESTION---");constonTopicGrade:{score:string}=awaitanswerGrader.invoke({question:state.question,generation:state.generation,});if(onTopicGrade.score==="yes"){console.log("---DECISION: GENERATION ADDRESSES QUESTION---");return"useful";}else{console.log("---DECISION: GENERATION DOES NOT ADDRESS QUESTION---");return"not_useful";}}else{console.log("---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RETRY---",);return"not_supported";}};
Now we build the graph. For fun, let's add a checkpointer and have the compiled
graph pause before making a web search. This will simulate asking for
permission.
You're all set! Time to ask some questions. First, try a question about
something related to agents:
awaitapp.invoke({question:"What are some features of long-term memory?",},{configurable:{thread_id:"1"}},);
---ROUTING QUESTION "What are some features of long-term memory? TO WEB SEARCH---{ question: 'What are some features of long-term memory?', documents: []}
You can see that your graph correctly routes the query to the vector store and
answers the question, filtering out some documents as necessary.
If you ask something not related to agents or LLMs, the graph should fall back
to information gleaned from the web. The graph will pause before executing, as
specified above:
awaitapp.invoke({question:"Where are the 2024 Euros being held?",},{configurable:{thread_id:"2"}},);
---ROUTING QUESTION "Where are the 2024 Euros being held? TO WEB SEARCH---{ question: 'Where are the 2024 Euros being held?', documents: [] }
You can see the graph paused before running the web search. And now we continue
by invoking the graph with null: