{ "cells": [ { "cell_type": "markdown", "id": "60bb467d-861d-4b07-a48d-8e5aa177c969", "metadata": {}, "source": [ "# Email Extraction\n", "\n", "\n", "Let's evaluate an LLM on its ability to extract structured information from email texts." ] }, { "cell_type": "code", "execution_count": null, "id": "47de0d20-d20b-44be-9e41-d2275f0866e8", "metadata": { "tags": [] }, "outputs": [], "source": [ "%pip install -U langchain langchain_benchmarks openai rapidfuzz" ] }, { "cell_type": "code", "execution_count": null, "id": "c401de19-814e-4bd7-bb9c-7ea6e217985c", "metadata": { "tags": [] }, "outputs": [], "source": [ "import os\n", "\n", "# Get your API key from https://smith.langchain.com/settings\n", "os.environ[\"LANGCHAIN_API_KEY\"] = \"sk-...\"\n", "os.environ[\"OPENAI_API_KEY\"] = \"sk-...\"" ] }, { "cell_type": "code", "execution_count": 1, "id": "b39159d0-9ea1-414f-a9d8-4a7b22b3d2cc", "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain_benchmarks import clone_public_dataset, registry" ] }, { "cell_type": "markdown", "id": "03488ab1-31ed-41c2-8da2-46b02599b181", "metadata": {}, "source": [ "For this code to work, please configure LangSmith environment variables with your credentials." ] }, { "cell_type": "code", "execution_count": 2, "id": "60f22779-a948-4833-8e8c-ace9ef17f56f", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Name Email Extraction
Type ExtractionTask
Dataset ID a1742786-bde5-4f51-a1d8-e148e5251ddb
DescriptionA dataset of 42 real emails deduped from a spam folder, with semantic HTML tags removed, as well as a script for initial extraction and formatting of other emails from an arbitrary .mbox file like the one exported by Gmail.\n", "\n", "Some additional cleanup of the data was done by hand after the initial pass.\n", "\n", "See https://github.com/jacoblee93/oss-model-extraction-evals.
" ], "text/plain": [ "ExtractionTask(name='Email Extraction', dataset_id='https://smith.langchain.com/public/a1742786-bde5-4f51-a1d8-e148e5251ddb/d', description='A dataset of 42 real emails deduped from a spam folder, with semantic HTML tags removed, as well as a script for initial extraction and formatting of other emails from an arbitrary .mbox file like the one exported by Gmail.\\n\\nSome additional cleanup of the data was done by hand after the initial pass.\\n\\nSee https://github.com/jacoblee93/oss-model-extraction-evals.\\n ', schema=, instructions=ChatPromptTemplate(input_variables=['input'], messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='You are an expert researcher.')), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], template='What can you tell me about the following email? Make sure to extract the question in the correct format. Here is the email:\\n ```\\n{input}\\n```'))]))" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "task = registry[\"Email Extraction\"]\n", "task" ] }, { "cell_type": "code", "execution_count": 3, "id": "49be36d2-343e-49df-8369-dd5bac405d5e", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A dataset of 42 real emails deduped from a spam folder, with semantic HTML tags removed, as well as a script for initial extraction and formatting of other emails from an arbitrary .mbox file like the one exported by Gmail.\n", "\n", "Some additional cleanup of the data was done by hand after the initial pass.\n", "\n", "See https://github.com/jacoblee93/oss-model-extraction-evals.\n", " \n" ] } ], "source": [ "print(task.description)" ] }, { "cell_type": "markdown", "id": "bc33a639-3caf-4314-8ea7-1c7c8b1d114d", "metadata": {}, "source": [ "Clone the dataset associated with this task" ] }, { "cell_type": "code", "execution_count": 4, "id": "70369f67-deb4-467a-801a-6d38c3d0460d", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset Email Extraction already exists. Skipping.\n", "You can access the dataset at https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/309a2fce-ce68-43aa-befb-67f94d0c3570.\n" ] } ], "source": [ "clone_public_dataset(task.dataset_id, dataset_name=task.name)" ] }, { "cell_type": "markdown", "id": "86f1378a-9a62-477e-bdb8-a7fd10915b62", "metadata": {}, "source": [ "#### Schema\n", "\n", "Each extraction task has an expected output schema defined in a Pydantic BaseModel object, which we can use to\n", "get a JSON schema object." ] }, { "cell_type": "code", "execution_count": 5, "id": "12e302e6-9b3d-42a4-b612-d672c591e8f0", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'definitions': {'ToneEnum': {'description': 'The tone of the email.',\n", " 'enum': ['positive', 'negative'],\n", " 'title': 'ToneEnum',\n", " 'type': 'string'}},\n", " 'description': 'Relevant information about an email.',\n", " 'properties': {'action_items': {'description': 'A list of action items '\n", " 'requested by the email',\n", " 'items': {'type': 'string'},\n", " 'title': 'Action Items',\n", " 'type': 'array'},\n", " 'sender': {'description': \"The sender's name, if available\",\n", " 'title': 'Sender',\n", " 'type': 'string'},\n", " 'sender_address': {'description': \"The sender's address, if \"\n", " 'available',\n", " 'title': 'Sender Address',\n", " 'type': 'string'},\n", " 'sender_phone_number': {'description': \"The sender's phone \"\n", " 'number, if available',\n", " 'title': 'Sender Phone Number',\n", " 'type': 'string'},\n", " 'tone': {'allOf': [{'$ref': '#/definitions/ToneEnum'}],\n", " 'description': 'The tone of the email.'},\n", " 'topic': {'description': 'High level description of what the '\n", " 'email is about',\n", " 'title': 'Topic',\n", " 'type': 'string'}},\n", " 'required': ['action_items', 'topic', 'tone'],\n", " 'title': 'Email',\n", " 'type': 'object'}\n" ] } ], "source": [ "import pprint\n", "\n", "pprint.pprint(task.schema.schema())" ] }, { "cell_type": "markdown", "id": "b462f7b8-fd42-4613-ab5f-5f3cbbc37d28", "metadata": {}, "source": [ "## Define an extraction chain\n", "\n", "Let's build the extraction chain that we can use to get structured information from the emails." ] }, { "cell_type": "code", "execution_count": 6, "id": "ade7077c-4602-4e5b-ad6d-3eb43cbd0247", "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain.chat_models import ChatOpenAI\n", "from langchain.output_parsers.openai_functions import JsonOutputFunctionsParser\n", "\n", "llm = ChatOpenAI(model=\"gpt-3.5-turbo-16k\", temperature=0).bind_functions(\n", " functions=[task.schema],\n", " function_call=task.schema.schema()[\"title\"],\n", ")\n", "\n", "output_parser = JsonOutputFunctionsParser()\n", "extraction_chain = task.instructions | llm | output_parser | (lambda x: {\"output\": x})" ] }, { "cell_type": "code", "execution_count": 7, "id": "f66ed218-e1db-49b5-bde3-40ebec961723", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'output': {'sender': 'Unknown',\n", " 'sender_phone_number': '000-1212-1111',\n", " 'sender_address': '12345 My Gold Way',\n", " 'action_items': ['Buy an envelope',\n", " 'Put gold inside',\n", " 'Close the envelope',\n", " \"Mail it to sender's address\"],\n", " 'topic': 'Request to send gold',\n", " 'tone': 'positive'}}" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "extraction_chain.invoke(\n", " {\n", " \"input\": \"Hello Dear MR. I want you to send me gold to get rich.\"\n", " \" First buy an envelope. Then open it and put some gold inside. \"\n", " \"Then close it and finally mail it to my address at 12345 My Gold Way.\"\n", " \" You can call me any time at 000-1212-1111.\"\n", " }\n", ")" ] }, { "cell_type": "markdown", "id": "87a64f76-65ae-4367-b43f-f2be3431e7af", "metadata": {}, "source": [ "Now it's time to measure our chain's effectiveness!" ] }, { "cell_type": "markdown", "id": "3821e4b0-8e67-418a-840c-470fcde42df0", "metadata": {}, "source": [ "## Evaluate\n", "\n", "Let's evaluate the chain now." ] }, { "cell_type": "code", "execution_count": 8, "id": "513042fe-2878-44f8-ae84-05b9d521c1de", "metadata": { "tags": [] }, "outputs": [], "source": [ "from langsmith.client import Client\n", "\n", "from langchain_benchmarks.extraction import get_eval_config" ] }, { "cell_type": "code", "execution_count": 9, "id": "2bedd9d1-fc06-4066-9f89-b874ae818d82", "metadata": { "tags": [] }, "outputs": [], "source": [ "client = Client()" ] }, { "cell_type": "code", "execution_count": 10, "id": "6826a2c6-8443-4215-9e15-b6f4bb570405", "metadata": { "tags": [] }, "outputs": [], "source": [ "eval_llm = ChatOpenAI(model=\"gpt-4\", model_kwargs={\"seed\": 42})\n", "eval_config = get_eval_config(eval_llm)" ] }, { "cell_type": "code", "execution_count": 11, "id": "aab7514e-a6ef-4c21-b90f-d9cbefcf5af1", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "View the evaluation results for project 'monthly-look-12' at:\n", "https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/projects/p/177d564f-516d-4b65-bae0-37154b529470?eval=true\n", "\n", "View all tests for Dataset Email Extraction at:\n", "https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/309a2fce-ce68-43aa-befb-67f94d0c3570\n", "[------------------------------------------------->] 42/42\n", " Eval quantiles:\n", " inputs.input \\\n", "count 42 \n", "unique 42 \n", "top --- \\n|\\n\\nEvery business faces its set of cu... \n", "freq 1 \n", "mean NaN \n", "std NaN \n", "min NaN \n", "25% NaN \n", "50% NaN \n", "75% NaN \n", "max NaN \n", "\n", " outputs.output \\\n", "count 42 \n", "unique 42 \n", "top {'sender': 'EMC Financial', 'sender_address': ... \n", "freq 1 \n", "mean NaN \n", "std NaN \n", "min NaN \n", "25% NaN \n", "50% NaN \n", "75% NaN \n", "max NaN \n", "\n", " feedback.json_edit_distance feedback.score_string:accuracy error \\\n", "count 42.000000 42.000000 0 \n", "unique NaN NaN 0 \n", "top NaN NaN NaN \n", "freq NaN NaN NaN \n", "mean 0.566434 0.485714 NaN \n", "std 0.178473 0.235374 NaN \n", "min 0.190883 0.100000 NaN \n", "25% 0.441978 0.300000 NaN \n", "50% 0.581750 0.300000 NaN \n", "75% 0.687949 0.700000 NaN \n", "max 0.901852 0.900000 NaN \n", "\n", " execution_time \n", "count 42.000000 \n", "unique NaN \n", "top NaN \n", "freq NaN \n", "mean 3.527634 \n", "std 0.518258 \n", "min 2.579424 \n", "25% 3.153659 \n", "50% 3.525745 \n", "75% 3.796416 \n", "max 5.144408 \n" ] } ], "source": [ "test_run = client.run_on_dataset(\n", " dataset_name=task.name,\n", " llm_or_chain_factory=extraction_chain,\n", " evaluation=eval_config,\n", " verbose=True,\n", " project_metadata={\n", " \"arch\": \"openai-functions\",\n", " },\n", ")" ] }, { "cell_type": "markdown", "id": "d9828990-f498-4d3f-9e51-76d72bf8f4e9", "metadata": {}, "source": [ "## Compare to another LLM\n", "\n", "Let's compare to an Anthropic LLM." ] }, { "cell_type": "code", "execution_count": 12, "id": "b9914571-d3f2-4f48-bdbb-2dfcfb03f26d", "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain.chat_models import ChatAnthropic\n", "from langchain.output_parsers.xml import XMLOutputParser\n", "from langchain.prompts import ChatPromptTemplate\n", "\n", "# This is the schema the model will populate\n", "xsd = \"\"\"\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\"\"\"\n", "\n", "prompt = ChatPromptTemplate.from_messages(\n", " [\n", " (\n", " \"system\",\n", " \"You are a data extraction bot. Always respond \"\n", " \"only with XML of the following schema:\\n{xsd}\",\n", " ),\n", " (\n", " \"user\",\n", " \"Extract Email from the folowing Document:\\n\"\n", " \"\\n{input}\\n\\n\"\n", " \"RESPOND ONLY IN XML THEN STOP.\",\n", " ),\n", " ]\n", ").partial(xsd=xsd)\n", "\n", "claude = ChatAnthropic(model=\"claude-2\", temperature=1)\n", "\n", "\n", "def convert_parsed_email(email_dict: dict) -> dict:\n", " \"\"\"Conver the XML-parsed dictionary to a flattened dict.\"\"\"\n", " if \"Email\" not in email_dict:\n", " return email_dict\n", " # Flatten the tags\n", " result = {k: v for item in email_dict[\"Email\"] for k, v in item.items()}\n", " result[\"action_items\"] = [\n", " item[\"item\"] for item in (result.get(\"action_items\") or [])\n", " ]\n", " return {\"output\": result}\n", "\n", "\n", "claude_extraction_chain = prompt | claude | XMLOutputParser() | convert_parsed_email" ] }, { "cell_type": "code", "execution_count": 13, "id": "83576fba-913c-4476-bfe5-fa2dfb7cdd95", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'output': {'sender': None,\n", " 'sender_phone_number': '000-1212-1111',\n", " 'sender_address': '12345 My Gold Way',\n", " 'action_items': ['buy an envelope',\n", " 'open it',\n", " 'put some gold inside',\n", " 'close it',\n", " 'mail it to my address'],\n", " 'topic': 'sending gold',\n", " 'tone': 'negative'}}" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result = claude_extraction_chain.invoke(\n", " {\n", " \"input\": \"Hello Dear MR. I want you to send me gold to get rich.\"\n", " \" First buy an envelope. Then open it and put some gold inside. \"\n", " \"Then close it and finally mail it to my address at 12345 My Gold Way.\"\n", " \" You can call me any time at 000-1212-1111.\"\n", " }\n", ")\n", "result" ] }, { "cell_type": "code", "execution_count": 14, "id": "f8d58656-108d-48d2-ba16-815fc9bdebcc", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "View the evaluation results for project 'frosty-moon-4' at:\n", "https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/projects/p/81d41017-bcda-450d-8991-9bf744c7ebb8?eval=true\n", "\n", "View all tests for Dataset Email Extraction at:\n", "https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/309a2fce-ce68-43aa-befb-67f94d0c3570\n", "[--------------------------------------> ] 33/42" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Chain failed for example 9a707fca-4ba7-4f7d-8912-b9fd71e9901e with inputs {'input': \"---|---|---|--- \\n \\nBook with Fall Sale Extras Through November 21! Savings! OBC! Visa Gift Card\\n+ More \\n \\n--- \\n|\\n\\n| | | | | | | \\n--- \\n| | \\n--- \\n| | SHOP THE FALL CRUISE SALE \\n--- \\n| | \\n--- \\n \\n**Celebrity Cruises** Celebrity Cruises receive **Exclusive Pricing** with\\nup to **$450 BONUS Savings per Stateroom** based on double\\noccupancyand even more for extra guests! Enjoy **Exclusive Tips**\\non 2024 sailings, up to**$2150 Onboard Credit** , and up to a **$1700 Visa\\nGiftCard** on Galapagos sailings or up to a **$650 Visa Gift Card** on\\nother departures. **Drinks** and **Wi-Fi** are All Included, too! **See=\\nThis Offer =E2=96=B8** \\n \\n| | \\n--- \\n \\n**Viking** Enjoy your favorite Viking voyages with up to=C2=A0 **$1200\\nShipboard Credit** from Online Vacation Center when you book by Nov 21!\\nPlus, select sailings get **Airfare** , **Stateroom Upgrades** , **Special\\nFares** =C2=A0and only **$25 Deposits** on the world's #1 Cruise Line for\\nOceans, Rivers & Expeditions! Guided Tours, Wi-Fi, Select Beverages, Meals &\\nMore Included. **SeeThis Offer =E2=96=B8** \\n \\n| | \\n--- \\n \\n**Royal Caribbean** Sail Away on Royal Caribbean withup to **$1000 BONUS\\nOnboard Credit** and **Specialty Dining** exclusively from Online Vacation\\nCenter!=C2=A0Plus, up to **30% SAVINGS** on all Cruises, **Kids Sail =\\nFree** on select sailings and up to **$500 Savings on Airfare** on select\\nAlaska and Europe sailings. **SeeThis Offer =E2=96=B8** \\n \\n| | \\n--- \\n \\n**Oceania Cruises** Choose Your Offer! Receive **Prepaid Gratuities** on\\nselect sailings OR receive up to **$1000 Onboard Credit** on 30 Europe\\nvoyages. Enjoy _simply_ MOREā„¢ with **2 for 1** Cruise Fares, **Roundtrip\\nAirfare** , Transfers & Taxes, **Unlimited Wi-Fi** , up to **$1600 Shore\\nExcursion Credit** , Specialty Dining, Champagne, Wine, and more. Plus,\\nreceive up to a **$1500 Visa Gift Card** from Online Vacation Center!\\n**SeeThis Offer =E2=96=B8** \\n \\n| | \\n--- \\n \\n**Regent Seven Seas Cruises** Book your luxury cruise on Regent Seven Seas\\nby Nov 21 and receive up to **$2000** in **Exclusive Savings** per Suite on\\nall sailings through June 2026! Plus, enjoy **Bonus Savings =** worth up to\\n**30%** on select 2024 sailings when you book by Nov 12. Receive up to a\\n**$1400 Visa Gift Card** from us, and enjoy Regent standard inclusions like\\n**Business Class Airfare** on intercontinental flights and **Airfare** on\\ndomestic flights, **Shore Excursions** , **Gratuities** and More. **See This\\nOffer =E2=96=B8** \\n \\n| | \\n--- \\n \\n**Azamara** Enjoy up to **$1500 Onboard Credit** , up to an=C2=A0 **$800\\nVisa Gift Card** , **Stateroom Upgrades** and **20% Off Suites** onselect\\nsailings, and More on Azamara during our Fall Sale! Plus up to a **$200\\nBONUS Visa Gift Card** on our Exclusive Cruise Packages. Receive Azamara\\nstandard inclusions like select **Beverages , **Gratuities** and More. **See\\nThis Offer =E2=96=B8** \\n \\n| | \\n--- \\n \\n**Norwegian Cruise Line** Enjoy up to **$1000 Onboard Credit** and\\n**Gratuities** on 7+ night Balconies or higher during our Fall Sale! Plus\\n**50% OFF** Cruise Fares and **Free at Sea:** Open Bar, Specialty Dining, =\\nWi-Fi, Shore Excursion Credits and extra guests on select sailings. **See=\\nThis Offer =E2=96=B8** \\n \\n| | \\n--- \\n \\n**Luxury Hotels** Whether your personal definition ofluxury is an urban\\noasis or an opulent villa, a wine-country cottage or a Caribbean hammock,\\nOnline Vacation Center has the perfect accommodations for your next\\nvacation. Book now for **Exclusive Offers** **Discounts** ,\\n**Extra Nights** , **Resort Credits** , **Complimentary Amenities** and\\nMore! **SeeThis Offer =E2=96=B8** \\n \\n| | \\n--- \\n \\n**Enrichment Journeys** Book an **Enrichment Journey** on Celebrity Cruises\\nfor up to **$2150 Onboard Credit** , up to **$450 Off** per stateroom and up\\nto a **$650 Visa Gift Card** with **Exclusive Tips** on 2024 sailings +\\n**Drinks** and **Wi-Fi** All Included. Journeys include **Airfare**\\n, 4-star+ **Hotel** Stays, **Transfers** , **Taxes** , select **Meals**\\nand More. **SeeThis Offer =E2=96=B8** \\n \\n| | \\n--- \\n \\n**Princess Cruises** Enjoy up to **$1200 Onboard Credit** , up to **50% Off\\nCruise Fares =** & **50% Off Deposits** during our Fall Sale! Choose =\\nPrincess Plus to receive Included **Drinks, Crew Appreciation** & **Wi-Fi**\\n_(over $950 in added value!)_ OR skip the frills for the lowest rate. **See\\nThis Offer =E2=96=B8** \\n \\n| | \\n--- \\n \\n**Holland America Line** Get more on your Holland America cruise with up to\\n**$1450 Onboard Credit** and **Gratuities** on select sailings, exclusively\\nfrom us! Plus, **Have It All** with **Wi-Fi, Beverages, Specialty Dining**\\nand **Shore Excursions** or skip the frills for a lower cruise fare. For a\\nlimited time, enjoy **BONUS Shore Excursion** & **Air Credits** , $99\\nDeposits and **Kids Sail Free** on select 2024 sailings. **SeeThis Offer\\n=E2=96=B8** \\n \\n| | \\n--- \\n| | \\n--- \\n| | \\n--- \\n|\\n\\n### Hours of Operation\\n\\n**Monday=E2=80=93Friday** 9 am=E2=80=936 pm ET **Saturday** 10 am=E2=80=934\\npm ET **Sunday** Closed \\n \\n--- \\n| | \\n--- \\n \\n**Terms and Conditions** : New Bookings Only. Select Sailings Apply.\\nRates, itinerary and any available amenities are by sail date and are\\nsubject to change. **Repricing an existing reservation or requesting a\\ncancel/rebook is not permitted for this promotion. This promotion is not\\napplicable for reservations that used FCCs or utilized Lift & Shift program.\\nCall to see what you qualify for (please note that any modifications may\\nresult in a $100 per person change fee). Fall Sale**: Offer expires\\n11/21/23. Airfare is included on select sailings from select gateways.\\nAdditional gateways may be available for lowadd-ons. The identity of the air\\ncarrier, which may include the carrier's code-share partner, will be\\nassigned and disclosed at a later date. Purchases made onboard plane or in\\nterminal not included. Onboard Credit isper stateroom on select sailings.\\nPrices are per person, double occupancy.Prices and itineraries are based on\\navailability and are subject to changewithout notice. Offer can be withdrawn\\nat any time. All fares may be subject to fuel surcharges if imposed by\\ncruise lines and airlines. Government taxes, air taxes, transfers, service\\nfees and other ancillary charges are additional unless otherwise noted.\\nAdditional terms, conditionsand restrictions apply; view individual offers\\nfor more information. Online Vacation Center reserves the right to cancel\\nthe Offer at any time, correct any errors, inaccuracies or omissions, and\\nchange or update fares, fees and surcharges at any time without prior\\nnotice. Online Vacation Center is a registered Seller of Travel with the\\nStates of Florida (ST-32947), California (CST-2064227-40) and Washington (WA\\nSOT 602250083). 110823CB \\n \\n| | \\n--- \\n \\n* * *\\n\\nThis message was sent to address: jacob@gmail.com \\n \\nMore Travel Deals \\\\- Sign Up \\\\- Forward to Friend \\\\- Unsubscribe \\\\- Privacy \\\\-\\nDisclaimers \\n \\n(C) 2023 Dunhill Vacations Inc. - 2307 W. Broward Blvd, Ste 402 - Fort\\nLauderdale, FL 33312 \\n \\n--- \\n\\\\----_NmP-64d90535a0e2740e-Part_1--\\n\\n\"}\n", "Error Type: ValueError, Message: Could not parse output: \n", " \n", " \n", " \n", " \n", " Book Celebrity Cruises by Nov 21 for exclusive pricing, bonuses, and gifts\n", " Book Viking by Nov 21 for bonuses and special offers\n", " Book Royal Caribbean by Nov 21 for onboard credits, dining, and savings\n", " Book Oceania Cruises by Nov 21 for prepaid gratuities or onboard credits\n", " Book Regent Seven Seas by Nov 21 for exclusive savings and gift cards\n", " Book Azamara by Nov 21 for onboard credits, upgrades, and savings\n", " Book Norwegian Cruise Line for discounts, amenities, and savings\n", " Book luxury hotels for exclusive offers and discounts\n", " Book an Enrichment Journey on Celebrity Cruises for bonuses and inclusions\n", " Book Princess Cruises for discounts, amenities, and onboard credits\n", " Book Holland America Line for bonuses,\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[------------------------------------------------->] 42/42\n", " Eval quantiles:\n", " inputs.input \\\n", "count 42 \n", "unique 42 \n", "top --- \\n|\\n\\nEvery business faces its set of cu... \n", "freq 1 \n", "mean NaN \n", "std NaN \n", "min NaN \n", "25% NaN \n", "50% NaN \n", "75% NaN \n", "max NaN \n", "\n", " outputs.output \\\n", "count 41 \n", "unique 41 \n", "top {'sender': 'Sam', 'sender_phone_number': '800.... \n", "freq 1 \n", "mean NaN \n", "std NaN \n", "min NaN \n", "25% NaN \n", "50% NaN \n", "75% NaN \n", "max NaN \n", "\n", " feedback.json_edit_distance feedback.score_string:accuracy \\\n", "count 41.000000 41.000000 \n", "unique NaN NaN \n", "top NaN NaN \n", "freq NaN NaN \n", "mean 0.382352 0.565854 \n", "std 0.164442 0.238338 \n", "min 0.107011 0.100000 \n", "25% 0.252252 0.300000 \n", "50% 0.375427 0.700000 \n", "75% 0.532982 0.700000 \n", "max 0.753704 1.000000 \n", "\n", " error execution_time \n", "count 1 42.000000 \n", "unique 1 NaN \n", "top Could not parse output: \\n \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
inputs.inputoutputs.outputreferencefeedback.json_edit_distancefeedback.score_string:accuracyerrorexecution_timeinputs.input_claudeoutputs.output_claudereference_claudefeedback.json_edit_distance_claudefeedback.score_string:accuracy_claudeerror_claudeexecution_time_claude
61c40266-b994-49a2-8768-d54704cee079--- \\n|\\n\\nEvery business faces its set of cu...{'sender': 'EMC Financial', 'sender_address': ...{'output': {'tone': 'positive', 'topic': 'Busi...0.5621120.7None4.358837--- \\n|\\n\\nEvery business faces its set of cu...{'sender': 'Sam', 'sender_phone_number': '800....{'output': {'tone': 'positive', 'topic': 'Busi...0.3012420.7None10.501042
2dcfadff-51dc-458c-8af0-f47a795d0c9bHello Jacob!\\n\\n \\n\\nHave you noticed thesurg...{'sender': 'Sam at EMC', 'action_items': ['Fil...{'output': {'tone': 'positive', 'topic': 'Gree...0.5053380.7None3.946547Hello Jacob!\\n\\n \\n\\nHave you noticed thesurg...{'sender': 'Sam at EMC', 'sender_phone_number'...{'output': {'tone': 'positive', 'topic': 'Gree...0.1138790.7None8.511848
a9c481ba-9ca5-408c-8c9c-f29127a70f7bHi there,\\n\\n | \\n--- \\n \\nWe've updated ou...{'sender': 'Crunchbase Team', 'action_items': ...{'output': {'tone': 'positive', 'topic': 'Upda...0.2452830.9None3.972396Hi there,\\n\\n | \\n--- \\n \\nWe've updated ou...{'sender': None, 'sender_phone_number': None, ...{'output': {'tone': 'positive', 'topic': 'Upda...0.3434340.7None9.739630
98358188-6e36-42ef-9298-83acf8d9dd12Consider all ways to give to \\nSave the Redwo...{'sender': 'Tim Whalen', 'sender_address': 'Sa...{'output': {'tone': 'positive', 'topic': 'Dona...0.2805560.7None3.890567Consider all ways to give to \\nSave the Redwo...{'sender': None, 'sender_phone_number': None, ...{'output': {'tone': 'positive', 'topic': 'Dona...0.2555560.3None9.640687
0f29e857-fc08-45dd-b1ea-dde1e00c4a62Some travelers plan ahead; others prefer a bit...{'sender': 'Dunhill Vacations Inc.', 'sender_a...{'output': {'tone': 'positive', 'topic': 'Trav...0.5524630.7None4.252478Some travelers plan ahead; others prefer a bit...{'sender': 'Dunhill Vacations Inc.', 'sender_p...{'output': {'tone': 'positive', 'topic': 'Trav...0.5845820.3None6.803259
\n", "" ], "text/plain": [ " inputs.input \\\n", "61c40266-b994-49a2-8768-d54704cee079 --- \\n|\\n\\nEvery business faces its set of cu... \n", "2dcfadff-51dc-458c-8af0-f47a795d0c9b Hello Jacob!\\n\\n \\n\\nHave you noticed thesurg... \n", "a9c481ba-9ca5-408c-8c9c-f29127a70f7b Hi there,\\n\\n | \\n--- \\n \\nWe've updated ou... \n", "98358188-6e36-42ef-9298-83acf8d9dd12 Consider all ways to give to \\nSave the Redwo... \n", "0f29e857-fc08-45dd-b1ea-dde1e00c4a62 Some travelers plan ahead; others prefer a bit... \n", "\n", " outputs.output \\\n", "61c40266-b994-49a2-8768-d54704cee079 {'sender': 'EMC Financial', 'sender_address': ... \n", "2dcfadff-51dc-458c-8af0-f47a795d0c9b {'sender': 'Sam at EMC', 'action_items': ['Fil... \n", "a9c481ba-9ca5-408c-8c9c-f29127a70f7b {'sender': 'Crunchbase Team', 'action_items': ... \n", "98358188-6e36-42ef-9298-83acf8d9dd12 {'sender': 'Tim Whalen', 'sender_address': 'Sa... \n", "0f29e857-fc08-45dd-b1ea-dde1e00c4a62 {'sender': 'Dunhill Vacations Inc.', 'sender_a... \n", "\n", " reference \\\n", "61c40266-b994-49a2-8768-d54704cee079 {'output': {'tone': 'positive', 'topic': 'Busi... \n", "2dcfadff-51dc-458c-8af0-f47a795d0c9b {'output': {'tone': 'positive', 'topic': 'Gree... \n", "a9c481ba-9ca5-408c-8c9c-f29127a70f7b {'output': {'tone': 'positive', 'topic': 'Upda... \n", "98358188-6e36-42ef-9298-83acf8d9dd12 {'output': {'tone': 'positive', 'topic': 'Dona... \n", "0f29e857-fc08-45dd-b1ea-dde1e00c4a62 {'output': {'tone': 'positive', 'topic': 'Trav... \n", "\n", " feedback.json_edit_distance \\\n", "61c40266-b994-49a2-8768-d54704cee079 0.562112 \n", "2dcfadff-51dc-458c-8af0-f47a795d0c9b 0.505338 \n", "a9c481ba-9ca5-408c-8c9c-f29127a70f7b 0.245283 \n", "98358188-6e36-42ef-9298-83acf8d9dd12 0.280556 \n", "0f29e857-fc08-45dd-b1ea-dde1e00c4a62 0.552463 \n", "\n", " feedback.score_string:accuracy error \\\n", "61c40266-b994-49a2-8768-d54704cee079 0.7 None \n", "2dcfadff-51dc-458c-8af0-f47a795d0c9b 0.7 None \n", "a9c481ba-9ca5-408c-8c9c-f29127a70f7b 0.9 None \n", "98358188-6e36-42ef-9298-83acf8d9dd12 0.7 None \n", "0f29e857-fc08-45dd-b1ea-dde1e00c4a62 0.7 None \n", "\n", " execution_time \\\n", "61c40266-b994-49a2-8768-d54704cee079 4.358837 \n", "2dcfadff-51dc-458c-8af0-f47a795d0c9b 3.946547 \n", "a9c481ba-9ca5-408c-8c9c-f29127a70f7b 3.972396 \n", "98358188-6e36-42ef-9298-83acf8d9dd12 3.890567 \n", "0f29e857-fc08-45dd-b1ea-dde1e00c4a62 4.252478 \n", "\n", " inputs.input_claude \\\n", "61c40266-b994-49a2-8768-d54704cee079 --- \\n|\\n\\nEvery business faces its set of cu... \n", "2dcfadff-51dc-458c-8af0-f47a795d0c9b Hello Jacob!\\n\\n \\n\\nHave you noticed thesurg... \n", "a9c481ba-9ca5-408c-8c9c-f29127a70f7b Hi there,\\n\\n | \\n--- \\n \\nWe've updated ou... \n", "98358188-6e36-42ef-9298-83acf8d9dd12 Consider all ways to give to \\nSave the Redwo... \n", "0f29e857-fc08-45dd-b1ea-dde1e00c4a62 Some travelers plan ahead; others prefer a bit... \n", "\n", " outputs.output_claude \\\n", "61c40266-b994-49a2-8768-d54704cee079 {'sender': 'Sam', 'sender_phone_number': '800.... \n", "2dcfadff-51dc-458c-8af0-f47a795d0c9b {'sender': 'Sam at EMC', 'sender_phone_number'... \n", "a9c481ba-9ca5-408c-8c9c-f29127a70f7b {'sender': None, 'sender_phone_number': None, ... \n", "98358188-6e36-42ef-9298-83acf8d9dd12 {'sender': None, 'sender_phone_number': None, ... \n", "0f29e857-fc08-45dd-b1ea-dde1e00c4a62 {'sender': 'Dunhill Vacations Inc.', 'sender_p... \n", "\n", " reference_claude \\\n", "61c40266-b994-49a2-8768-d54704cee079 {'output': {'tone': 'positive', 'topic': 'Busi... \n", "2dcfadff-51dc-458c-8af0-f47a795d0c9b {'output': {'tone': 'positive', 'topic': 'Gree... \n", "a9c481ba-9ca5-408c-8c9c-f29127a70f7b {'output': {'tone': 'positive', 'topic': 'Upda... \n", "98358188-6e36-42ef-9298-83acf8d9dd12 {'output': {'tone': 'positive', 'topic': 'Dona... \n", "0f29e857-fc08-45dd-b1ea-dde1e00c4a62 {'output': {'tone': 'positive', 'topic': 'Trav... \n", "\n", " feedback.json_edit_distance_claude \\\n", "61c40266-b994-49a2-8768-d54704cee079 0.301242 \n", "2dcfadff-51dc-458c-8af0-f47a795d0c9b 0.113879 \n", "a9c481ba-9ca5-408c-8c9c-f29127a70f7b 0.343434 \n", "98358188-6e36-42ef-9298-83acf8d9dd12 0.255556 \n", "0f29e857-fc08-45dd-b1ea-dde1e00c4a62 0.584582 \n", "\n", " feedback.score_string:accuracy_claude \\\n", "61c40266-b994-49a2-8768-d54704cee079 0.7 \n", "2dcfadff-51dc-458c-8af0-f47a795d0c9b 0.7 \n", "a9c481ba-9ca5-408c-8c9c-f29127a70f7b 0.7 \n", "98358188-6e36-42ef-9298-83acf8d9dd12 0.3 \n", "0f29e857-fc08-45dd-b1ea-dde1e00c4a62 0.3 \n", "\n", " error_claude execution_time_claude \n", "61c40266-b994-49a2-8768-d54704cee079 None 10.501042 \n", "2dcfadff-51dc-458c-8af0-f47a795d0c9b None 8.511848 \n", "a9c481ba-9ca5-408c-8c9c-f29127a70f7b None 9.739630 \n", "98358188-6e36-42ef-9298-83acf8d9dd12 None 9.640687 \n", "0f29e857-fc08-45dd-b1ea-dde1e00c4a62 None 6.803259 " ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head(5)" ] }, { "cell_type": "code", "execution_count": 30, "id": "f2b570b4-8f09-4fe5-9c99-f6391dc34301", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "(0.5664337704936568, 0.382351925386955)" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(\n", " df[\"feedback.json_edit_distance\"].mean(),\n", " df[\"feedback.json_edit_distance_claude\"].mean(),\n", ")" ] }, { "cell_type": "code", "execution_count": 33, "id": "d5fbeb46-79b4-4b90-817b-233a6caab9d8", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "(0.48571428571428565, 0.5658536585365853)" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(\n", " df[\"feedback.score_string:accuracy\"].mean(),\n", " df[\"feedback.score_string:accuracy_claude\"].mean(),\n", ")" ] }, { "cell_type": "code", "execution_count": 46, "id": "f19b30ae-e51b-4364-999b-1a0c71836737", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
inputs.inputoutputs.outputoutputs.output_claude
98358188-6e36-42ef-9298-83acf8d9dd12Consider all ways to give to \\nSave the Redwo...{'sender': 'Tim Whalen', 'sender_address': 'Sa...{'sender': None, 'sender_phone_number': None, ...
0f29e857-fc08-45dd-b1ea-dde1e00c4a62Some travelers plan ahead; others prefer a bit...{'sender': 'Dunhill Vacations Inc.', 'sender_a...{'sender': 'Dunhill Vacations Inc.', 'sender_p...
35414bbc-4d38-41ed-876f-2a6a067e66d5--- \\n \\n|\\n\\nWe Passed the Stop Dangerous P...{'sender': 'Matt Haney', 'sender_address': '10...{'sender': 'Matt Haney', 'sender_phone_number'...
ff1b2ed6-26a7-4501-96aa-6e3e10eadc72--- \\n|\\n\\n# We Provide Unique Financing Opti...{'sender': 'info@championadvance.com', 'sender...{'sender': None, 'sender_phone_number': None, ...
\n", "
" ], "text/plain": [ " inputs.input \\\n", "98358188-6e36-42ef-9298-83acf8d9dd12 Consider all ways to give to \\nSave the Redwo... \n", "0f29e857-fc08-45dd-b1ea-dde1e00c4a62 Some travelers plan ahead; others prefer a bit... \n", "35414bbc-4d38-41ed-876f-2a6a067e66d5 --- \\n \\n|\\n\\nWe Passed the Stop Dangerous P... \n", "ff1b2ed6-26a7-4501-96aa-6e3e10eadc72 --- \\n|\\n\\n# We Provide Unique Financing Opti... \n", "\n", " outputs.output \\\n", "98358188-6e36-42ef-9298-83acf8d9dd12 {'sender': 'Tim Whalen', 'sender_address': 'Sa... \n", "0f29e857-fc08-45dd-b1ea-dde1e00c4a62 {'sender': 'Dunhill Vacations Inc.', 'sender_a... \n", "35414bbc-4d38-41ed-876f-2a6a067e66d5 {'sender': 'Matt Haney', 'sender_address': '10... \n", "ff1b2ed6-26a7-4501-96aa-6e3e10eadc72 {'sender': 'info@championadvance.com', 'sender... \n", "\n", " outputs.output_claude \n", "98358188-6e36-42ef-9298-83acf8d9dd12 {'sender': None, 'sender_phone_number': None, ... \n", "0f29e857-fc08-45dd-b1ea-dde1e00c4a62 {'sender': 'Dunhill Vacations Inc.', 'sender_p... \n", "35414bbc-4d38-41ed-876f-2a6a067e66d5 {'sender': 'Matt Haney', 'sender_phone_number'... \n", "ff1b2ed6-26a7-4501-96aa-6e3e10eadc72 {'sender': None, 'sender_phone_number': None, ... " ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Rows for which OAI > Claude by at least 30%, according to the LLM-based evaluator\n", "oai_beats_claude = df[\n", " (df[\"feedback.score_string:accuracy\"] - df[\"feedback.score_string:accuracy_claude\"])\n", " >= 0.3\n", "]\n", "\n", "oai_beats_claude[[\"inputs.input\", \"outputs.output\", \"outputs.output_claude\"]]" ] }, { "cell_type": "code", "execution_count": 51, "id": "9f81cc5e-bb0e-4b27-bfca-766c08ce0f2b", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
inputs.inputoutputs.outputoutputs.output_claude
02cfdfc4-c3dc-47e6-ad44-8e437ebf2dce---|---|---|--- \\n \\n| \\n--- \\n **Limited ...{'action_items': [], 'topic': 'Limited Time Up...{'sender': 'Dunhill Vacations Inc.', 'sender_p...
198dc232-8f98-484a-a65e-048cfb517282Hello Jacob,\\n\\n \\n\\nFor many small businesse...{'sender': 'Sam at EMC', 'action_items': ['Kic...{'sender': 'Sam at EMC', 'sender_phone_number'...
c222957f-cc7e-46af-9cca-1270f3fa5621Hello Jacob,\\n\\n \\n\\nDo you know what Fortune...{'sender': 'Sam at EMC', 'action_items': ['qua...{'sender': 'Sam at EMC', 'sender_phone_number'...
119ef037-8744-4eb9-93df-64458278e4f8--- \\n| | QUALIFY NOW \\n--- \\n \\n \\nHell...{'sender': 'Sam at EMC', 'action_items': ['Che...{'sender': 'Sam at EMC id:2023-09-19-20:17:53:...
\n", "
" ], "text/plain": [ " inputs.input \\\n", "02cfdfc4-c3dc-47e6-ad44-8e437ebf2dce ---|---|---|--- \\n \\n| \\n--- \\n **Limited ... \n", "198dc232-8f98-484a-a65e-048cfb517282 Hello Jacob,\\n\\n \\n\\nFor many small businesse... \n", "c222957f-cc7e-46af-9cca-1270f3fa5621 Hello Jacob,\\n\\n \\n\\nDo you know what Fortune... \n", "119ef037-8744-4eb9-93df-64458278e4f8 --- \\n| | QUALIFY NOW \\n--- \\n \\n \\nHell... \n", "\n", " outputs.output \\\n", "02cfdfc4-c3dc-47e6-ad44-8e437ebf2dce {'action_items': [], 'topic': 'Limited Time Up... \n", "198dc232-8f98-484a-a65e-048cfb517282 {'sender': 'Sam at EMC', 'action_items': ['Kic... \n", "c222957f-cc7e-46af-9cca-1270f3fa5621 {'sender': 'Sam at EMC', 'action_items': ['qua... \n", "119ef037-8744-4eb9-93df-64458278e4f8 {'sender': 'Sam at EMC', 'action_items': ['Che... \n", "\n", " outputs.output_claude \n", "02cfdfc4-c3dc-47e6-ad44-8e437ebf2dce {'sender': 'Dunhill Vacations Inc.', 'sender_p... \n", "198dc232-8f98-484a-a65e-048cfb517282 {'sender': 'Sam at EMC', 'sender_phone_number'... \n", "c222957f-cc7e-46af-9cca-1270f3fa5621 {'sender': 'Sam at EMC', 'sender_phone_number'... \n", "119ef037-8744-4eb9-93df-64458278e4f8 {'sender': 'Sam at EMC id:2023-09-19-20:17:53:... " ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Rows for which Claude > OAI by at least 50%, according to the LLM-based evaluator\n", "oai_beats_claude = df[\n", " (df[\"feedback.score_string:accuracy_claude\"] - df[\"feedback.score_string:accuracy\"])\n", " >= 0.5\n", "]\n", "\n", "oai_beats_claude[[\"inputs.input\", \"outputs.output\", \"outputs.output_claude\"]]" ] }, { "cell_type": "code", "execution_count": null, "id": "fa7f74a3-8cfe-48f1-ab85-249dc88aa307", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 5 }