{ "cells": [ { "cell_type": "markdown", "id": "60bb467d-861d-4b07-a48d-8e5aa177c969", "metadata": { "tags": [] }, "source": [ "# Multiverse Math\n", "\n", "In this task, the agent is operating in an alternate universe which in which the basic mathematical operations like addition and multiplication are different.\n", "\n", "The agent must use tools that allow is to carry out calculations in this universe.\n", "\n", "This task can help verify that an agent is able to ignore its own knowledge of math and instead correctly use information returned by the tools.\n", "\n", "The modified mathematical operations yield different reuslts, but still retain some properties (e.g., the modified multiplication operation is still commutative).\n", "\n", "Please note that the modified operations are not guaranteed to even make sense in the real world since not all properties will be retained (e.g., distributive property).\n", "\n", "------------------" ] }, { "cell_type": "markdown", "id": "8e3b729e-b851-4ab8-a3a9-be34b329b985", "metadata": { "tags": [] }, "source": [ "For this code to work, please configure LangSmith environment variables with your credentials.\n", "\n", "```python\n", "import os\n", "\n", "os.environ[\"LANGCHAIN_API_KEY\"] = \"ls_..\" # Your LangSmith API key\n", "```" ] }, { "cell_type": "code", "execution_count": 1, "id": "b39159d0-9ea1-414f-a9d8-4a7b22b3d2cc", "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain_benchmarks import registry" ] }, { "cell_type": "code", "execution_count": 2, "id": "1aef2b32-a5df-421f-8be3-a2ef27372ece", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Name Multiverse Math
Type ToolUsageTask
Dataset ID 47ed57bc-e852-4f84-a23e-cce4793864e9
DescriptionAn environment that contains a few basic math operations, but with altered results.\n", "\n", "For example, multiplication of 5*3 will be re-interpreted as 5*3*1.1. The basic operations retain some basic properties, such as commutativity, associativity, and distributivity; however, the results are different than expected.\n", "\n", "The objective of this task is to evaluate the ability to use the provided tools to solve simple math questions and ignore any innate knowledge about math.\n", "\n", "This task is associated with 20 test examples.
" ], "text/plain": [ "ToolUsageTask(name='Multiverse Math', dataset_id='https://smith.langchain.com/public/47ed57bc-e852-4f84-a23e-cce4793864e9/d', description='An environment that contains a few basic math operations, but with altered results.\\n\\nFor example, multiplication of 5*3 will be re-interpreted as 5*3*1.1. The basic operations retain some basic properties, such as commutativity, associativity, and distributivity; however, the results are different than expected.\\n\\nThe objective of this task is to evaluate the ability to use the provided tools to solve simple math questions and ignore any innate knowledge about math.\\n\\nThis task is associated with 20 test examples.\\n', create_environment=, instructions='You are requested to solve math questions in an alternate mathematical universe. The operations have been altered to yield different results than expected. Do not guess the answer or rely on your innate knowledge of math. Use the provided tools to answer the question. While associativity and commutativity apply, distributivity does not. Answer the question using the fewest possible tools. Only include the numeric response without any clarifications.', eval_params={'output_evaluation': 'qa_math_without_question'})" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "task = registry[\"Multiverse Math\"]\n", "task" ] }, { "cell_type": "markdown", "id": "bc33a639-3caf-4314-8ea7-1c7c8b1d114d", "metadata": {}, "source": [ "Clone the dataset associated with this task" ] }, { "cell_type": "markdown", "id": "cede4edd-884d-4330-a186-5058b712394b", "metadata": {}, "source": [ "## The Environment\n", "\n", "Let's check the environment" ] }, { "cell_type": "code", "execution_count": 3, "id": "e2439d0c-ccb9-4f5b-a127-548725025a98", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "[StructuredTool(name='multiply', description='multiply(a: float, b: float) -> float - Multiply two numbers; a * b.', args_schema=, func=),\n", " StructuredTool(name='add', description='add(a: float, b: float) -> float - Add two numbers; a + b.', args_schema=, func=),\n", " StructuredTool(name='divide', description='divide(a: float, b: float) -> float - Divide two numbers; a / b.', args_schema=, func=),\n", " StructuredTool(name='subtract', description='subtract(a: float, b: float) -> float - Subtract two numbers; a - b.', args_schema=, func=),\n", " StructuredTool(name='power', description='power(a: float, b: float) -> float - Raise a number to a power; a ** b.', args_schema=, func=)]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "env = task.create_environment()\n", "env.tools[:5]" ] }, { "cell_type": "markdown", "id": "1941e187-55ee-4d38-b529-4744ea2474b0", "metadata": {}, "source": [ "Multiplying 2 x 4 = 8.8!!" ] }, { "cell_type": "code", "execution_count": 4, "id": "f5a100bd-6e19-498f-8a36-393b5c19bcb9", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "8.8" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "env.tools[0].invoke({\"a\": 2, \"b\": 4})" ] }, { "cell_type": "markdown", "id": "bc60ef11-6300-4a83-989e-ec5b7f196796", "metadata": {}, "source": [ "The task instructions" ] }, { "cell_type": "code", "execution_count": 5, "id": "31afb08b-17b8-4866-86c1-ee24e804415c", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "'You are requested to solve math questions in an alternate mathematical universe. The operations have been altered to yield different results than expected. Do not guess the answer or rely on your innate knowledge of math. Use the provided tools to answer the question. While associativity and commutativity apply, distributivity does not. Answer the question using the fewest possible tools. Only include the numeric response without any clarifications.'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "task.instructions" ] }, { "cell_type": "markdown", "id": "92d65770-6a4f-4029-beba-5fa9aeb18809", "metadata": {}, "source": [ "## Agent Factory\n", "\n", "For evaluation, we need an agent factory that will create a new instance of an agent executor for every evaluation run.\n", "\n", "We'll use an `OpenAIAgentFactory` provided with LangChain Benchmarks -- look at the `intro` section to see how to define your own." ] }, { "cell_type": "code", "execution_count": 6, "id": "6142cf4e-862c-47a3-aa75-81d7d3231308", "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain_core.prompts import ChatPromptTemplate\n", "from langchain_openai.chat_models import ChatOpenAI\n", "\n", "from langchain_benchmarks.tool_usage.agents import StandardAgentFactory\n", "\n", "model = ChatOpenAI(temperature=0)\n", "prompt = ChatPromptTemplate.from_messages(\n", " [\n", " (\"system\", \"{instructions}\"), # Populated from task.instructions automatically\n", " (\"human\", \"{question}\"), # Populated from the test data\n", " (\n", " \"placeholder\",\n", " \"{agent_scratchpad}\",\n", " ), # Work where the agent can do its work (e.g., call multiple tools)\n", " ]\n", ")\n", "\n", "agent_factory = StandardAgentFactory(task, model, prompt)" ] }, { "cell_type": "code", "execution_count": 7, "id": "11e4fff5-e184-45e1-a472-c0a9f70e897a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", "\u001b[32;1m\u001b[1;3m\n", "Invoking: `add` with `{'a': 2, 'b': 5}`\n", "\n", "\n", "\u001b[0m\u001b[33;1m\u001b[1;3m8.2\u001b[0m\u001b[32;1m\u001b[1;3mThe result of 2 + 5 in this alternate mathematical universe is 8.2.\u001b[0m\n", "\n", "\u001b[1m> Finished chain.\u001b[0m\n" ] }, { "data": { "text/plain": [ "{'question': 'how much is 2+5',\n", " 'output': 'The result of 2 + 5 in this alternate mathematical universe is 8.2.',\n", " 'intermediate_steps': [(ToolAgentAction(tool='add', tool_input={'a': 2, 'b': 5}, log=\"\\nInvoking: `add` with `{'a': 2, 'b': 5}`\\n\\n\\n\", message_log=[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_MZMnEZrae7AuXYtWzH0l9xKL', 'function': {'arguments': '{\"a\":2,\"b\":5}', 'name': 'add'}, 'type': 'function'}]}, response_metadata={'finish_reason': 'tool_calls'}, id='run-b7548303-194d-40ee-85bf-3d43cac39526', tool_calls=[{'name': 'add', 'args': {'a': 2, 'b': 5}, 'id': 'call_MZMnEZrae7AuXYtWzH0l9xKL'}], tool_call_chunks=[{'name': 'add', 'args': '{\"a\":2,\"b\":5}', 'id': 'call_MZMnEZrae7AuXYtWzH0l9xKL', 'index': 0}])], tool_call_id='call_MZMnEZrae7AuXYtWzH0l9xKL'),\n", " 8.2)]}" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from langchain import globals\n", "\n", "globals.set_verbose(True)\n", "\n", "agent = agent_factory()\n", "agent.invoke({\"question\": \"how much is 2+5\"})" ] }, { "cell_type": "markdown", "id": "b29a915c-1041-4108-a234-a877b6f59de4", "metadata": {}, "source": [ "## Benchmarking\n", "\n", "See `introduction` and `benchmark all` for information on how to run benchmarks. This notebook is just to here to explain and explore the task." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 5 }