{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "eec8815f-79f5-49b1-a29d-48acc7795e57",
"metadata": {},
"source": [
"# Multi-modal eval: GPT-4 w/ multi-modal embeddings and multi-vector retriever\n",
"\n",
"`Multi-modal slide decks` is a public dataset that contains a dataset of question-answer pairs from slide decks with visual content.\n",
"\n",
"The question-answer pairs are derived from the visual content in the decks, testing the ability of RAG to perform visual reasoning.\n",
"\n",
"We evaluate this dataset using two approaches: \n",
"\n",
"(1) Vectorstore with multimodal embeddings\n",
"\n",
"(2) Multi-vector retriever with indexed image summaries\n",
"\n",
"## Pre-requisites"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3838e98a-0d68-406f-87ef-096cf9d7424f",
"metadata": {},
"outputs": [],
"source": [
"# %pip install -U langchain langsmith langchain_benchmarks\n",
"# %pip install -U openai chromadb pypdfium2 open-clip-torch pillow"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "50caa523-7d2b-458f-a06c-364157a17fe9",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"os.environ[\"LANGCHAIN_ENDPOINT\"] = \"https://api.smith.langchain.com\"\n",
"env_vars = [\"LANGCHAIN_API_KEY\", \"OPENAI_API_KEY\"]\n",
"for var in env_vars:\n",
" if var not in os.environ:\n",
" os.environ[var] = getpass.getpass(prompt=f\"Enter your {var}: \")"
]
},
{
"cell_type": "markdown",
"id": "131ef3c8-909a-4fd6-88c8-a64a30045927",
"metadata": {},
"source": [
"## Dataset\n",
"\n",
"We can browse the available LangChain benchmark datasets for retrieval."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "44c3a6ba-2e7a-474b-a9a3-51591917a3a6",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"Name | Type | Dataset ID | Description |
\n",
"\n",
"\n",
"LangChain Docs Q&A | RetrievalTask | 452ccafc-18e1-4314-885b-edd735f17b9d | Questions and answers based on a snapshot of the LangChain python docs.\n",
"\n",
"The environment provides the documents and the retriever information.\n",
"\n",
"Each example is composed of a question and reference answer.\n",
"\n",
"Success is measured based on the accuracy of the answer relative to the reference answer.\n",
"We also measure the faithfulness of the model's response relative to the retrieved documents (if any). |
\n",
"Semi-structured Reports | RetrievalTask | c47d9617-ab99-4d6e-a6e6-92b8daf85a7d | Questions and answers based on PDFs containing tables and charts.\n",
"\n",
"The task provides the raw documents as well as factory methods to easily index them\n",
"and create a retriever.\n",
"\n",
"Each example is composed of a question and reference answer.\n",
"\n",
"Success is measured based on the accuracy of the answer relative to the reference answer.\n",
"We also measure the faithfulness of the model's response relative to the retrieved documents (if any). |
\n",
"Multi-modal slide decks | RetrievalTask | 40afc8e7-9d7e-44ed-8971-2cae1eb59731 | This public dataset is a work-in-progress and will be extended over time.\n",
" \n",
"Questions and answers based on slide decks containing visual tables and charts.\n",
"\n",
"Each example is composed of a question and reference answer.\n",
"\n",
"Success is measured based on the accuracy of the answer relative to the reference answer. |
\n",
"\n",
"
"
],
"text/plain": [
"Registry(tasks=[RetrievalTask(name='LangChain Docs Q&A', dataset_id='https://smith.langchain.com/public/452ccafc-18e1-4314-885b-edd735f17b9d/d', description=\"Questions and answers based on a snapshot of the LangChain python docs.\\n\\nThe environment provides the documents and the retriever information.\\n\\nEach example is composed of a question and reference answer.\\n\\nSuccess is measured based on the accuracy of the answer relative to the reference answer.\\nWe also measure the faithfulness of the model's response relative to the retrieved documents (if any).\\n\", get_docs=, retriever_factories={'basic': , 'parent-doc': , 'hyde': }, architecture_factories={'conversational-retrieval-qa': }), RetrievalTask(name='Semi-structured Reports', dataset_id='https://smith.langchain.com/public/c47d9617-ab99-4d6e-a6e6-92b8daf85a7d/d', description=\"Questions and answers based on PDFs containing tables and charts.\\n\\nThe task provides the raw documents as well as factory methods to easily index them\\nand create a retriever.\\n\\nEach example is composed of a question and reference answer.\\n\\nSuccess is measured based on the accuracy of the answer relative to the reference answer.\\nWe also measure the faithfulness of the model's response relative to the retrieved documents (if any).\\n\", get_docs=, retriever_factories={'basic': , 'parent-doc': , 'hyde': }, architecture_factories={}), RetrievalTask(name='Multi-modal slide decks', dataset_id='https://smith.langchain.com/public/40afc8e7-9d7e-44ed-8971-2cae1eb59731/d', description='This public dataset is a work-in-progress and will be extended over time.\\n \\nQuestions and answers based on slide decks containing visual tables and charts.\\n\\nEach example is composed of a question and reference answer.\\n\\nSuccess is measured based on the accuracy of the answer relative to the reference answer.\\n', get_docs={}, retriever_factories={}, architecture_factories={})])"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_benchmarks import clone_public_dataset, registry\n",
"\n",
"registry = registry.filter(Type=\"RetrievalTask\")\n",
"registry"
]
},
{
"cell_type": "markdown",
"id": "f532a368-ed9b-4c4f-8a06-87a56ea547fa",
"metadata": {},
"source": [
"`Multi-modal slide decks` is the relevant dataset for our task."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "02ba8543-2c77-4b92-ae32-651b1699d0df",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"Name | Multi-modal slide decks |
\n",
"Type | RetrievalTask |
\n",
"Dataset ID | 40afc8e7-9d7e-44ed-8971-2cae1eb59731 |
\n",
"Description | This public dataset is a work-in-progress and will be extended over time.\n",
" \n",
"Questions and answers based on slide decks containing visual tables and charts.\n",
"\n",
"Each example is composed of a question and reference answer.\n",
"\n",
"Success is measured based on the accuracy of the answer relative to the reference answer. |
\n",
"Retriever Factories | |
\n",
"Architecture Factories | |
\n",
"get_docs | {} |
\n",
"\n",
"
"
],
"text/plain": [
"RetrievalTask(name='Multi-modal slide decks', dataset_id='https://smith.langchain.com/public/40afc8e7-9d7e-44ed-8971-2cae1eb59731/d', description='This public dataset is a work-in-progress and will be extended over time.\\n \\nQuestions and answers based on slide decks containing visual tables and charts.\\n\\nEach example is composed of a question and reference answer.\\n\\nSuccess is measured based on the accuracy of the answer relative to the reference answer.\\n', get_docs={}, retriever_factories={}, architecture_factories={})"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"task = registry[\"Multi-modal slide decks\"]\n",
"task"
]
},
{
"cell_type": "markdown",
"id": "563c4276-448e-4090-8b6c-56ea3fefbed2",
"metadata": {},
"source": [
"Clone the dataset so that it's available in our LangSmith datasets."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "a3ea657a-754c-4cf6-a7a6-c4633d78cb1e",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "43da988af2244ede808ad5e8212f8350",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/10 [00:00, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Finished fetching examples. Creating dataset...\n",
"New dataset created you can access it at https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/08a29acb-5ad6-42ce-a482-574c9e2e5306.\n",
"Done creating dataset.\n"
]
}
],
"source": [
"clone_public_dataset(task.dataset_id, dataset_name=task.name)"
]
},
{
"cell_type": "markdown",
"id": "14116e9c-f765-4afa-a918-1865562895cc",
"metadata": {},
"source": [
"Fetch the associated PDFs from remote cache for the dataset so that we can perform ingestion."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "f0d56161-3672-4c23-9653-deefb3e340a2",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain_benchmarks.rag.tasks.multi_modal_slide_decks import get_file_names\n",
"\n",
"file_names = list(get_file_names()) # PosixPath"
]
},
{
"cell_type": "markdown",
"id": "b0ddce4b-f94e-42d8-9648-d627508ae476",
"metadata": {},
"source": [
"## Load\n",
"\n",
"For each presentation, extract an image for each slide."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "119c6cb3-71e0-414e-b8ae-275c6b07cbef",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Extracting 30 images for DDOG_Q3_earnings_deck.pdf\n"
]
}
],
"source": [
"import os\n",
"from pathlib import Path\n",
"\n",
"import pypdfium2 as pdfium\n",
"\n",
"\n",
"def get_images(file):\n",
" \"\"\"\n",
" Get PIL images from PDF pages and save them to a specified directory\n",
" :param file: Path to file\n",
" :return: A list of PIL images\n",
" \"\"\"\n",
"\n",
" # Get presentation\n",
" pdf = pdfium.PdfDocument(file)\n",
" n_pages = len(pdf)\n",
"\n",
" # Extracting file name and creating the directory for images\n",
" file_name = Path(file).stem # Gets the file name without extension\n",
" img_dir = os.path.join(Path(file).parent, \"img\")\n",
" os.makedirs(img_dir, exist_ok=True)\n",
"\n",
" # Get images\n",
" pil_images = []\n",
" print(f\"Extracting {n_pages} images for {file.name}\")\n",
" for page_number in range(n_pages):\n",
" page = pdf.get_page(page_number)\n",
" bitmap = page.render(scale=1, rotation=0, crop=(0, 0, 0, 0))\n",
" pil_image = bitmap.to_pil()\n",
" pil_images.append(pil_image)\n",
"\n",
" # Saving the image with the specified naming convention\n",
" image_path = os.path.join(img_dir, f\"{file_name}_image_{page_number + 1}.jpg\")\n",
" pil_image.save(image_path, format=\"JPEG\")\n",
"\n",
" return pil_images\n",
"\n",
"\n",
"images = []\n",
"for fi in file_names:\n",
" images.extend(get_images(fi))"
]
},
{
"cell_type": "markdown",
"id": "f9ce4a16-89de-49f1-a6c6-24f0c14d607b",
"metadata": {},
"source": [
"Now, we convert each PIL image to a Base64 encoded string and set the image size.\n",
"\n",
"Base64 encoded string can be input to GPT-4V."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "a28a9c74-835a-43b2-be25-fc26f1daca0f",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import base64\n",
"import io\n",
"from io import BytesIO\n",
"\n",
"from PIL import Image\n",
"\n",
"\n",
"def resize_base64_image(base64_string, size=(128, 128)):\n",
" \"\"\"\n",
" Resize an image encoded as a Base64 string\n",
"\n",
" :param base64_string: Base64 string\n",
" :param size: Image size\n",
" :return: Re-sized Base64 string\n",
" \"\"\"\n",
" # Decode the Base64 string\n",
" img_data = base64.b64decode(base64_string)\n",
" img = Image.open(io.BytesIO(img_data))\n",
"\n",
" # Resize the image\n",
" resized_img = img.resize(size, Image.LANCZOS)\n",
"\n",
" # Save the resized image to a bytes buffer\n",
" buffered = io.BytesIO()\n",
" resized_img.save(buffered, format=img.format)\n",
"\n",
" # Encode the resized image to Base64\n",
" return base64.b64encode(buffered.getvalue()).decode(\"utf-8\")\n",
"\n",
"\n",
"def convert_to_base64(pil_image):\n",
" \"\"\"\n",
" Convert PIL images to Base64 encoded strings\n",
"\n",
" :param pil_image: PIL image\n",
" :return: Re-sized Base64 string\n",
" \"\"\"\n",
"\n",
" buffered = BytesIO()\n",
" pil_image.save(buffered, format=\"JPEG\") # You can change the format if needed\n",
" img_str = base64.b64encode(buffered.getvalue()).decode(\"utf-8\")\n",
" img_str = resize_base64_image(img_str, size=(960, 540))\n",
" return img_str\n",
"\n",
"\n",
"images_base_64 = [convert_to_base64(i) for i in images]"
]
},
{
"cell_type": "markdown",
"id": "93638ca6-62f6-419c-bf66-e6ff474dd73a",
"metadata": {},
"source": [
"If desired, we can plot the images to confirm that they were extracted correctly."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "19002dee-e52a-4887-a461-626ec97e96e6",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from IPython.display import HTML, display\n",
"\n",
"\n",
"def plt_img_base64(img_base64):\n",
" \"\"\"\n",
" Disply base64 encoded string as image\n",
"\n",
" :param img_base64: Base64 string\n",
" \"\"\"\n",
" # Create an HTML img tag with the base64 string as the source\n",
" image_html = f'
'\n",
" # Display the image by rendering the HTML\n",
" display(HTML(image_html))\n",
"\n",
"\n",
"i = 10\n",
"plt_img_base64(images_base_64[i])"
]
},
{
"cell_type": "markdown",
"id": "7f8eacc2-fda4-4118-8c3e-99ae6201ef31",
"metadata": {},
"source": [
"## Index\n",
"\n",
"We will test two approaches.\n",
"\n",
"### Option 1: Vectorstore with multimodal embeddings\n",
"\n",
"Here we will use [OpenCLIP multimodal embeddings](https://github.com/mlfoundations/open_clip).\n",
"\n",
"There are [many to choose from](https://github.com/mlfoundations/open_clip/blob/main/docs/openclip_results.csv).\n",
"\n",
"By default, it will use `model_name=\"ViT-H-14\", checkpoint=\"laion2b_s32b_b79k\"`.\n",
"\n",
"This model favorably balances memory and performance.\n",
"\n",
"However, you can test different models by passing them to OpenCLIPEmbeddings as `model_name=, checkpoint=`."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "288a3818-b7f2-430c-b624-0c7d5317dc52",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.vectorstores import Chroma\n",
"from langchain_experimental.open_clip import OpenCLIPEmbeddings\n",
"\n",
"# Make vectorstore\n",
"vectorstore_mmembd = Chroma(\n",
" collection_name=\"multi-modal-rag\",\n",
" embedding_function=OpenCLIPEmbeddings(),\n",
")\n",
"\n",
"# Read images we extracted above\n",
"img_dir = os.path.join(Path(file_names[0]).parent, \"img\")\n",
"image_uris = sorted(\n",
" [\n",
" os.path.join(img_dir, image_name)\n",
" for image_name in os.listdir(img_dir)\n",
" if image_name.endswith(\".jpg\")\n",
" ]\n",
")\n",
"\n",
"# Add images\n",
"vectorstore_mmembd.add_images(uris=image_uris)\n",
"\n",
"# Make retriever\n",
"retriever_mmembd = vectorstore_mmembd.as_retriever()"
]
},
{
"cell_type": "markdown",
"id": "98c29e5a-19d0-4960-8755-9054b091726f",
"metadata": {},
"source": [
"### Option 2: Multi-vector retriever\n",
"\n",
"This approach will generate and index image summaries. See detail [here](https://blog.langchain.dev/semi-structured-multi-modal-rag/).\n",
"\n",
"It will then retrieve the raw image to pass to GPT-4V for final synthesis.\n",
"\n",
"The idea is that retrieval on image summaries:\n",
"\n",
"* Does not rely on multi-modal embeddings\n",
"* Can perform better at retrieval of visually / semantically similar, but quantitatively different slide content\n",
"\n",
"Note: there OpenAI's GPT-4V API can experince [non-deterministic](https://community.openai.com/t/400-errors-on-gpt-vision-api-since-today/534538/16) `BadRequestError`, which we handle. Hopefully this is resolved soon."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "72dca949-a255-4543-a290-f898ffb50962",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.schema.messages import HumanMessage\n",
"\n",
"\n",
"def image_summarize(img_base64, prompt):\n",
" \"\"\"\n",
" Make image summary\n",
"\n",
" :param img_base64: Base64 encoded string for image\n",
" :param prompt: Text prompt for summarizatiomn\n",
" :return: Image summarization prompt\n",
"\n",
" \"\"\"\n",
" chat = ChatOpenAI(model=\"gpt-4-vision-preview\", max_tokens=1024)\n",
"\n",
" msg = chat.invoke(\n",
" [\n",
" HumanMessage(\n",
" content=[\n",
" {\"type\": \"text\", \"text\": prompt},\n",
" {\n",
" \"type\": \"image_url\",\n",
" \"image_url\": {\"url\": f\"data:image/jpeg;base64,{img_base64}\"},\n",
" },\n",
" ]\n",
" )\n",
" ]\n",
" )\n",
" return msg.content\n",
"\n",
"\n",
"def generate_img_summaries(img_base64_list):\n",
" \"\"\"\n",
" Generate summaries for images\n",
"\n",
" :param img_base64_list: Base64 encoded images\n",
" :return: List of image summaries and processed images\n",
" \"\"\"\n",
"\n",
" # Store image summaries\n",
" image_summaries = []\n",
" processed_images = []\n",
"\n",
" # Prompt\n",
" prompt = \"\"\"You are an assistant tasked with summarizing images for retrieval. \\\n",
" These summaries will be embedded and used to retrieve the raw image. \\\n",
" Give a concise summary of the image that is well optimized for retrieval.\"\"\"\n",
"\n",
" # Apply summarization to images\n",
" for i, base64_image in enumerate(img_base64_list):\n",
" try:\n",
" image_summaries.append(image_summarize(base64_image, prompt))\n",
" processed_images.append(base64_image)\n",
" except:\n",
" print(f\"BadRequestError with image {i+1}\")\n",
"\n",
" return image_summaries, processed_images\n",
"\n",
"\n",
"# Image summaries\n",
"image_summaries, images_base_64_processed = generate_img_summaries(images_base_64)"
]
},
{
"cell_type": "markdown",
"id": "81ae20ca-71d3-4644-993a-ebec14b1f762",
"metadata": {},
"source": [
"Add raw docs and doc summaries to [Multi Vector Retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/multi_vector#summary): \n",
"\n",
"* Store the raw images in the `docstore`.\n",
"* Store the image summaries in the `vectorstore` for semantic retrieval."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "9d5dc727-7057-4962-91e1-dd99bf78ef07",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import uuid\n",
"\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.retrievers.multi_vector import MultiVectorRetriever\n",
"from langchain.schema.document import Document\n",
"from langchain.schema.output_parser import StrOutputParser\n",
"from langchain.storage import InMemoryStore\n",
"\n",
"\n",
"def create_multi_vector_retriever(vectorstore, image_summaries, images):\n",
" \"\"\"\n",
" Create retriever that indexes summaries, but returns raw images or texts\n",
"\n",
" :param vectorstore: Vectorstore to store embedded image sumamries\n",
" :param image_summaries: Image summaries\n",
" :param images: Base64 encoded images\n",
" :return: Retriever\n",
" \"\"\"\n",
"\n",
" # Initialize the storage layer\n",
" store = InMemoryStore()\n",
" id_key = \"doc_id\"\n",
"\n",
" # Create the multi-vector retriever\n",
" retriever = MultiVectorRetriever(\n",
" vectorstore=vectorstore,\n",
" docstore=store,\n",
" id_key=id_key,\n",
" )\n",
"\n",
" # Helper function to add documents to the vectorstore and docstore\n",
" def add_documents(retriever, doc_summaries, doc_contents):\n",
" doc_ids = [str(uuid.uuid4()) for _ in doc_contents]\n",
" summary_docs = [\n",
" Document(page_content=s, metadata={id_key: doc_ids[i]})\n",
" for i, s in enumerate(doc_summaries)\n",
" ]\n",
" retriever.vectorstore.add_documents(summary_docs)\n",
" retriever.docstore.mset(list(zip(doc_ids, doc_contents)))\n",
"\n",
" add_documents(retriever, image_summaries, images)\n",
"\n",
" return retriever\n",
"\n",
"\n",
"# The vectorstore to use to index the summaries\n",
"vectorstore_mvr = Chroma(\n",
" collection_name=\"multi-modal-rag-mv\", embedding_function=OpenAIEmbeddings()\n",
")\n",
"\n",
"# Create retriever\n",
"retriever_multi_vector_img = create_multi_vector_retriever(\n",
" vectorstore_mvr,\n",
" image_summaries,\n",
" images_base_64_processed,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "47c44a67-8639-4a30-b659-5957df89c5d6",
"metadata": {},
"source": [
"## RAG\n",
"\n",
"Create a pipeline for retrieval of relevant images based on semantic similarity to the input question.\n",
"\n",
"Pass the images to GPT-4V for answer synthesis."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "b032d4e0-63d5-4e7f-bafc-e0550b5c2db0",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.schema.runnable import RunnableLambda, RunnablePassthrough\n",
"\n",
"\n",
"def prepare_images(docs):\n",
" \"\"\"\n",
" Prepare iamges for prompt\n",
"\n",
" :param docs: A list of base64-encoded images from retriever.\n",
" :return: Dict containing a list of base64-encoded strings.\n",
" \"\"\"\n",
" b64_images = []\n",
" for doc in docs:\n",
" if isinstance(doc, Document):\n",
" doc = doc.page_content\n",
" b64_images.append(doc)\n",
" return {\"images\": b64_images}\n",
"\n",
"\n",
"def img_prompt_func(data_dict, num_images=2):\n",
" \"\"\"\n",
" GPT-4V prompt for image analysis.\n",
"\n",
" :param data_dict: A dict with images and a user-provided question.\n",
" :param num_images: Number of images to include in the prompt.\n",
" :return: A list containing message objects for each image and the text prompt.\n",
" \"\"\"\n",
" messages = []\n",
" if data_dict[\"context\"][\"images\"]:\n",
" for image in data_dict[\"context\"][\"images\"][:num_images]:\n",
" image_message = {\n",
" \"type\": \"image_url\",\n",
" \"image_url\": {\"url\": f\"data:image/jpeg;base64,{image}\"},\n",
" }\n",
" messages.append(image_message)\n",
" text_message = {\n",
" \"type\": \"text\",\n",
" \"text\": (\n",
" \"You are an analyst tasked with answering questions about visual content.\\n\"\n",
" \"You will be give a set of image(s) from a slide deck / presentation.\\n\"\n",
" \"Use this information to answer the user question. \\n\"\n",
" f\"User-provided question: {data_dict['question']}\\n\\n\"\n",
" ),\n",
" }\n",
" messages.append(text_message)\n",
" return [HumanMessage(content=messages)]\n",
"\n",
"\n",
"def multi_modal_rag_chain(retriever):\n",
" \"\"\"\n",
" Multi-modal RAG chain\n",
" \"\"\"\n",
"\n",
" # Multi-modal LLM\n",
" model = ChatOpenAI(temperature=0, model=\"gpt-4-vision-preview\", max_tokens=1024)\n",
"\n",
" # RAG pipeline\n",
" chain = (\n",
" {\n",
" \"context\": retriever | RunnableLambda(prepare_images),\n",
" \"question\": RunnablePassthrough(),\n",
" }\n",
" | RunnableLambda(img_prompt_func)\n",
" | model\n",
" | StrOutputParser()\n",
" )\n",
"\n",
" return chain\n",
"\n",
"\n",
"# Create RAG chain\n",
"chain_multimodal_rag = multi_modal_rag_chain(retriever_multi_vector_img)\n",
"chain_multimodal_rag_mmembd = multi_modal_rag_chain(retriever_mmembd)"
]
},
{
"cell_type": "markdown",
"id": "f7c5f379-317c-4a2e-9190-61f6dfbbc77d",
"metadata": {},
"source": [
"## Eval\n",
"\n",
"Run evaluation on our dataset:\n",
"\n",
"* `task.name` is the dataset of QA pairs that we cloned\n",
"* `eval_config` specifies the [LangSmith evaluator](https://docs.smith.langchain.com/evaluation/evaluator-implementations#correctness-qa-evaluation) for our dataset, which will use GPT-4 as a grader\n",
"* The grader will evaluate the chain-generated answer to each question relative to ground truth"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "424f62cb-3dcd-4403-aa29-6f7497836439",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"View the evaluation results for project 'multi_modal_mvretriever_gpt4v-f6f7' at:\n",
"https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/08a29acb-5ad6-42ce-a482-574c9e2e5306/compare?selectedSessions=15dd3901-382c-4f0f-8433-077963fc4bb7\n",
"\n",
"View all tests for Dataset Multi-modal slide decks at:\n",
"https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/08a29acb-5ad6-42ce-a482-574c9e2e5306\n",
"[------------------------------------------------->] 10/10"
]
},
{
"data": {
"text/html": [
"Experiment Results:
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" output | \n",
" feedback.COT Contextual Accuracy | \n",
" error | \n",
" execution_time | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 10 | \n",
" 10.0 | \n",
" 0 | \n",
" 10.000000 | \n",
"
\n",
" \n",
" unique | \n",
" 10 | \n",
" NaN | \n",
" 0 | \n",
" NaN | \n",
"
\n",
" \n",
" top | \n",
" As of the third quarter of 2023 (Q3 2023), Dat... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" freq | \n",
" 1 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" mean | \n",
" NaN | \n",
" 1.0 | \n",
" NaN | \n",
" 13.430077 | \n",
"
\n",
" \n",
" std | \n",
" NaN | \n",
" 0.0 | \n",
" NaN | \n",
" 3.656360 | \n",
"
\n",
" \n",
" min | \n",
" NaN | \n",
" 1.0 | \n",
" NaN | \n",
" 10.319160 | \n",
"
\n",
" \n",
" 25% | \n",
" NaN | \n",
" 1.0 | \n",
" NaN | \n",
" 10.809424 | \n",
"
\n",
" \n",
" 50% | \n",
" NaN | \n",
" 1.0 | \n",
" NaN | \n",
" 11.675873 | \n",
"
\n",
" \n",
" 75% | \n",
" NaN | \n",
" 1.0 | \n",
" NaN | \n",
" 15.971083 | \n",
"
\n",
" \n",
" max | \n",
" NaN | \n",
" 1.0 | \n",
" NaN | \n",
" 20.940341 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" output \\\n",
"count 10 \n",
"unique 10 \n",
"top As of the third quarter of 2023 (Q3 2023), Dat... \n",
"freq 1 \n",
"mean NaN \n",
"std NaN \n",
"min NaN \n",
"25% NaN \n",
"50% NaN \n",
"75% NaN \n",
"max NaN \n",
"\n",
" feedback.COT Contextual Accuracy error execution_time \n",
"count 10.0 0 10.000000 \n",
"unique NaN 0 NaN \n",
"top NaN NaN NaN \n",
"freq NaN NaN NaN \n",
"mean 1.0 NaN 13.430077 \n",
"std 0.0 NaN 3.656360 \n",
"min 1.0 NaN 10.319160 \n",
"25% 1.0 NaN 10.809424 \n",
"50% 1.0 NaN 11.675873 \n",
"75% 1.0 NaN 15.971083 \n",
"max 1.0 NaN 20.940341 "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"View the evaluation results for project 'multi_modal_mmembd_gpt4v-f6f7' at:\n",
"https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/08a29acb-5ad6-42ce-a482-574c9e2e5306/compare?selectedSessions=ed6255b4-23b5-45ee-82f7-bcf6744c3f8e\n",
"\n",
"View all tests for Dataset Multi-modal slide decks at:\n",
"https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/08a29acb-5ad6-42ce-a482-574c9e2e5306\n",
"[------------------------------------------------->] 10/10"
]
},
{
"data": {
"text/html": [
"Experiment Results:
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" output | \n",
" feedback.COT Contextual Accuracy | \n",
" error | \n",
" execution_time | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 10 | \n",
" 10.000000 | \n",
" 0 | \n",
" 10.000000 | \n",
"
\n",
" \n",
" unique | \n",
" 10 | \n",
" NaN | \n",
" 0 | \n",
" NaN | \n",
"
\n",
" \n",
" top | \n",
" The images provided do not contain information... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" freq | \n",
" 1 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" mean | \n",
" NaN | \n",
" 0.500000 | \n",
" NaN | \n",
" 15.596197 | \n",
"
\n",
" \n",
" std | \n",
" NaN | \n",
" 0.527046 | \n",
" NaN | \n",
" 2.716853 | \n",
"
\n",
" \n",
" min | \n",
" NaN | \n",
" 0.000000 | \n",
" NaN | \n",
" 11.661625 | \n",
"
\n",
" \n",
" 25% | \n",
" NaN | \n",
" 0.000000 | \n",
" NaN | \n",
" 12.941465 | \n",
"
\n",
" \n",
" 50% | \n",
" NaN | \n",
" 0.500000 | \n",
" NaN | \n",
" 16.246343 | \n",
"
\n",
" \n",
" 75% | \n",
" NaN | \n",
" 1.000000 | \n",
" NaN | \n",
" 17.723280 | \n",
"
\n",
" \n",
" max | \n",
" NaN | \n",
" 1.000000 | \n",
" NaN | \n",
" 18.488639 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" output \\\n",
"count 10 \n",
"unique 10 \n",
"top The images provided do not contain information... \n",
"freq 1 \n",
"mean NaN \n",
"std NaN \n",
"min NaN \n",
"25% NaN \n",
"50% NaN \n",
"75% NaN \n",
"max NaN \n",
"\n",
" feedback.COT Contextual Accuracy error execution_time \n",
"count 10.000000 0 10.000000 \n",
"unique NaN 0 NaN \n",
"top NaN NaN NaN \n",
"freq NaN NaN NaN \n",
"mean 0.500000 NaN 15.596197 \n",
"std 0.527046 NaN 2.716853 \n",
"min 0.000000 NaN 11.661625 \n",
"25% 0.000000 NaN 12.941465 \n",
"50% 0.500000 NaN 16.246343 \n",
"75% 1.000000 NaN 17.723280 \n",
"max 1.000000 NaN 18.488639 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import uuid\n",
"\n",
"from langchain.smith import RunEvalConfig\n",
"from langsmith.client import Client\n",
"\n",
"# Evaluator configuration\n",
"client = Client()\n",
"eval_config = RunEvalConfig(\n",
" evaluators=[\"cot_qa\"],\n",
")\n",
"\n",
"# Experiments\n",
"chain_map = {\n",
" \"multi_modal_mvretriever_gpt4v\": chain_multimodal_rag,\n",
" \"multi_modal_mmembd_gpt4v\": chain_multimodal_rag_mmembd,\n",
"}\n",
"\n",
"# Run evaluation\n",
"run_id = uuid.uuid4().hex[:4]\n",
"test_runs = {}\n",
"for project_name, chain in chain_map.items():\n",
" test_runs[project_name] = client.run_on_dataset(\n",
" dataset_name=task.name,\n",
" llm_or_chain_factory=lambda: (lambda x: x[\"Question\"]) | chain,\n",
" evaluation=eval_config,\n",
" verbose=True,\n",
" project_name=f\"{project_name}-{run_id}\",\n",
" project_metadata={\"chain\": project_name},\n",
" )"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}