{ "cells": [ { "cell_type": "markdown", "id": "60bb467d-861d-4b07-a48d-8e5aa177c969", "metadata": { "tags": [] }, "source": [ "# Typewriter: Single Tool\n", "\n", "In this task, an agent is given access to a single tool called \"type_letter\".\n", "This tool takes one argument called \"letter\" which is expected to be a character.\n", "\n", "The agent must repeat the input string from the user, printing one\n", "character a time on a piece of virtual paper.\n", "\n", "The agent is evaluated based on its ability to print the correct string using\n", "the \"type_letter\" tool.\n", "\n", "--------" ] }, { "cell_type": "code", "execution_count": 1, "id": "b39159d0-9ea1-414f-a9d8-4a7b22b3d2cc", "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain_benchmarks import registry" ] }, { "cell_type": "code", "execution_count": 2, "id": "1aef2b32-a5df-421f-8be3-a2ef27372ece", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Name Tool Usage - Typewriter (1 tool)
Type ToolUsageTask
Dataset ID 59577193-8938-4ccf-92a7-e8a96bcf4f86
DescriptionEnvironment with a single tool that accepts a single letter as input, and prints it on a piece of virtual paper.\n", "\n", "The objective of this task is to evaluate the ability of the model to use the provided tools to repeat a given input string.\n", "\n", "For example, if the string is 'abc', the tools 'a', 'b', and 'c' must be invoked in that order.\n", "\n", "The dataset includes examples of varying difficulty. The difficulty is measured by the length of the string.
" ], "text/plain": [ "ToolUsageTask(name='Tool Usage - Typewriter (1 tool)', dataset_id='https://smith.langchain.com/public/59577193-8938-4ccf-92a7-e8a96bcf4f86/d', description=\"Environment with a single tool that accepts a single letter as input, and prints it on a piece of virtual paper.\\n\\nThe objective of this task is to evaluate the ability of the model to use the provided tools to repeat a given input string.\\n\\nFor example, if the string is 'abc', the tools 'a', 'b', and 'c' must be invoked in that order.\\n\\nThe dataset includes examples of varying difficulty. The difficulty is measured by the length of the string.\\n\", create_environment=, instructions=\"Repeat the given string using the provided tools. Do not write anything else or provide any explanations. For example, if the string is 'abc', you must print the letters 'a', 'b', and 'c' one at a time and in that order. \", eval_params={'output_evaluation': 'none'})" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "task = registry[\"Tool Usage - Typewriter (1 tool)\"]\n", "task" ] }, { "cell_type": "markdown", "id": "fc78a3e1-80da-4607-98c3-a99c2037e7ca", "metadata": {}, "source": [ "## The Environment\n", "\n", "The environment consists of a single tool and a virtual paper.\n", "\n", "The tool accepts a single letter as an input and prints the leter on the virtual paper. If successful, the tool returns the output \"OK\".\n", "\n", "To determine what's written on the paper, one needs to read the environment state." ] }, { "cell_type": "code", "execution_count": 3, "id": "64e538ae-5cf2-4cd5-a312-25ee6924e869", "metadata": { "tags": [] }, "outputs": [], "source": [ "env = task.create_environment()" ] }, { "cell_type": "code", "execution_count": 4, "id": "5516a34b-1e9b-4f1e-9462-cfc4d5bc29f9", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "[StructuredTool(name='type_letter', description='type_letter(letter: str) -> str - Print the given letter on the paper.', args_schema=, func=.type_letter at 0x73a65909ee80>)]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "env.tools" ] }, { "cell_type": "code", "execution_count": 5, "id": "80501e1a-f1f6-4b38-8637-894503029d86", "metadata": { "tags": [] }, "outputs": [], "source": [ "tool = env.tools[0]" ] }, { "cell_type": "code", "execution_count": 6, "id": "3f352e32-fdb6-4d9e-b1c4-3d78b4f50646", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "'OK'" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tool.invoke({\"letter\": \"a\"})" ] }, { "cell_type": "code", "execution_count": 7, "id": "ec9c2e68-b55e-4087-bc1a-c38f4cfd401b", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "'OK'" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tool.invoke({\"letter\": \"b\"})" ] }, { "cell_type": "code", "execution_count": 8, "id": "2cc5b174-25a4-4d5a-8535-56ecea62ea81", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "'ab'" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "env.read_state()" ] }, { "cell_type": "markdown", "id": "cd13d120-1bf9-481c-9392-c15ebdd9d77f", "metadata": {}, "source": [ "## Explore the task\n", "\n", "For evaluation, we need an agent factory that will create a new instance of an agent executor for every evaluation run.\n", "\n", "We'll use the `StandardAgentFactory` -- look at the `intro` for more information about what it does and/or how to create a custom one." ] }, { "cell_type": "code", "execution_count": 23, "id": "e2acab1e-78a7-4198-8e79-4529c95ce7e2", "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain_core.prompts import ChatPromptTemplate\n", "from langchain_openai.chat_models import ChatOpenAI\n", "\n", "from langchain_benchmarks.tool_usage.agents import StandardAgentFactory\n", "\n", "model = ChatOpenAI(temperature=0)\n", "prompt = ChatPromptTemplate.from_messages(\n", " [\n", " (\"system\", \"{instructions}\"), # Populated from task.instructions automatically\n", " (\"human\", \"{question}\"), # Populated from the test data\n", " (\n", " \"placeholder\",\n", " \"{agent_scratchpad}\",\n", " ), # Work where the agent can do its work (e.g., call multiple tools)\n", " ]\n", ")\n", "\n", "agent_factory = StandardAgentFactory(task, model, prompt)" ] }, { "cell_type": "code", "execution_count": 25, "id": "ceaa8edf-292b-48a1-be94-e6bfea0e75b1", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", "\u001b[32;1m\u001b[1;3m\n", "Invoking: `type_letter` with `{'letter': 'a'}`\n", "\n", "\n", "\u001b[0m\u001b[36;1m\u001b[1;3mOK\u001b[0m\u001b[32;1m\u001b[1;3m\n", "Invoking: `type_letter` with `{'letter': 'b'}`\n", "\n", "\n", "\u001b[0m\u001b[36;1m\u001b[1;3mOK\u001b[0m\u001b[32;1m\u001b[1;3m\n", "Invoking: `type_letter` with `{'letter': 'c'}`\n", "\n", "\n", "\u001b[0m\u001b[36;1m\u001b[1;3mOK\u001b[0m\u001b[32;1m\u001b[1;3mabc\u001b[0m\n", "\n", "\u001b[1m> Finished chain.\u001b[0m\n" ] }, { "data": { "text/plain": [ "{'question': 'abc',\n", " 'output': 'abc',\n", " 'intermediate_steps': [(ToolAgentAction(tool='type_letter', tool_input={'letter': 'a'}, log=\"\\nInvoking: `type_letter` with `{'letter': 'a'}`\\n\\n\\n\", message_log=[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_f4exPQMfz4VWxFJw4LhyMc80', 'function': {'arguments': '{\"letter\": \"a\"}', 'name': 'type_letter'}, 'type': 'function'}, {'index': 1, 'id': 'call_DHOJfLJEKuOKdzBa8ZLRYJZq', 'function': {'arguments': '{\"letter\": \"b\"}', 'name': 'type_letter'}, 'type': 'function'}, {'index': 2, 'id': 'call_EziJvB6jtUEg3CmXSsQ7OWBj', 'function': {'arguments': '{\"letter\": \"c\"}', 'name': 'type_letter'}, 'type': 'function'}]}, response_metadata={'finish_reason': 'tool_calls'}, id='run-7d6be045-b9e2-4f24-991c-8e34ccd53b98', tool_calls=[{'name': 'type_letter', 'args': {'letter': 'a'}, 'id': 'call_f4exPQMfz4VWxFJw4LhyMc80'}, {'name': 'type_letter', 'args': {'letter': 'b'}, 'id': 'call_DHOJfLJEKuOKdzBa8ZLRYJZq'}, {'name': 'type_letter', 'args': {'letter': 'c'}, 'id': 'call_EziJvB6jtUEg3CmXSsQ7OWBj'}], tool_call_chunks=[{'name': 'type_letter', 'args': '{\"letter\": \"a\"}', 'id': 'call_f4exPQMfz4VWxFJw4LhyMc80', 'index': 0}, {'name': 'type_letter', 'args': '{\"letter\": \"b\"}', 'id': 'call_DHOJfLJEKuOKdzBa8ZLRYJZq', 'index': 1}, {'name': 'type_letter', 'args': '{\"letter\": \"c\"}', 'id': 'call_EziJvB6jtUEg3CmXSsQ7OWBj', 'index': 2}])], tool_call_id='call_f4exPQMfz4VWxFJw4LhyMc80'),\n", " 'OK'),\n", " (ToolAgentAction(tool='type_letter', tool_input={'letter': 'b'}, log=\"\\nInvoking: `type_letter` with `{'letter': 'b'}`\\n\\n\\n\", message_log=[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_f4exPQMfz4VWxFJw4LhyMc80', 'function': {'arguments': '{\"letter\": \"a\"}', 'name': 'type_letter'}, 'type': 'function'}, {'index': 1, 'id': 'call_DHOJfLJEKuOKdzBa8ZLRYJZq', 'function': {'arguments': '{\"letter\": \"b\"}', 'name': 'type_letter'}, 'type': 'function'}, {'index': 2, 'id': 'call_EziJvB6jtUEg3CmXSsQ7OWBj', 'function': {'arguments': '{\"letter\": \"c\"}', 'name': 'type_letter'}, 'type': 'function'}]}, response_metadata={'finish_reason': 'tool_calls'}, id='run-7d6be045-b9e2-4f24-991c-8e34ccd53b98', tool_calls=[{'name': 'type_letter', 'args': {'letter': 'a'}, 'id': 'call_f4exPQMfz4VWxFJw4LhyMc80'}, {'name': 'type_letter', 'args': {'letter': 'b'}, 'id': 'call_DHOJfLJEKuOKdzBa8ZLRYJZq'}, {'name': 'type_letter', 'args': {'letter': 'c'}, 'id': 'call_EziJvB6jtUEg3CmXSsQ7OWBj'}], tool_call_chunks=[{'name': 'type_letter', 'args': '{\"letter\": \"a\"}', 'id': 'call_f4exPQMfz4VWxFJw4LhyMc80', 'index': 0}, {'name': 'type_letter', 'args': '{\"letter\": \"b\"}', 'id': 'call_DHOJfLJEKuOKdzBa8ZLRYJZq', 'index': 1}, {'name': 'type_letter', 'args': '{\"letter\": \"c\"}', 'id': 'call_EziJvB6jtUEg3CmXSsQ7OWBj', 'index': 2}])], tool_call_id='call_DHOJfLJEKuOKdzBa8ZLRYJZq'),\n", " 'OK'),\n", " (ToolAgentAction(tool='type_letter', tool_input={'letter': 'c'}, log=\"\\nInvoking: `type_letter` with `{'letter': 'c'}`\\n\\n\\n\", message_log=[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_f4exPQMfz4VWxFJw4LhyMc80', 'function': {'arguments': '{\"letter\": \"a\"}', 'name': 'type_letter'}, 'type': 'function'}, {'index': 1, 'id': 'call_DHOJfLJEKuOKdzBa8ZLRYJZq', 'function': {'arguments': '{\"letter\": \"b\"}', 'name': 'type_letter'}, 'type': 'function'}, {'index': 2, 'id': 'call_EziJvB6jtUEg3CmXSsQ7OWBj', 'function': {'arguments': '{\"letter\": \"c\"}', 'name': 'type_letter'}, 'type': 'function'}]}, response_metadata={'finish_reason': 'tool_calls'}, id='run-7d6be045-b9e2-4f24-991c-8e34ccd53b98', tool_calls=[{'name': 'type_letter', 'args': {'letter': 'a'}, 'id': 'call_f4exPQMfz4VWxFJw4LhyMc80'}, {'name': 'type_letter', 'args': {'letter': 'b'}, 'id': 'call_DHOJfLJEKuOKdzBa8ZLRYJZq'}, {'name': 'type_letter', 'args': {'letter': 'c'}, 'id': 'call_EziJvB6jtUEg3CmXSsQ7OWBj'}], tool_call_chunks=[{'name': 'type_letter', 'args': '{\"letter\": \"a\"}', 'id': 'call_f4exPQMfz4VWxFJw4LhyMc80', 'index': 0}, {'name': 'type_letter', 'args': '{\"letter\": \"b\"}', 'id': 'call_DHOJfLJEKuOKdzBa8ZLRYJZq', 'index': 1}, {'name': 'type_letter', 'args': '{\"letter\": \"c\"}', 'id': 'call_EziJvB6jtUEg3CmXSsQ7OWBj', 'index': 2}])], tool_call_id='call_EziJvB6jtUEg3CmXSsQ7OWBj'),\n", " 'OK')],\n", " 'state': 'abc'}" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from langchain import globals\n", "\n", "globals.set_verbose(True)\n", "\n", "agent = agent_factory()\n", "agent.invoke({\"question\": \"abc\"})" ] }, { "cell_type": "markdown", "id": "4729e72c-3903-478a-b298-4a586af33912", "metadata": {}, "source": [ "## Benchmarking\n", "\n", "See `introduction` and `benchmark all` for information on how to run benchmarks. This notebook is just to here to explain and explore the task." ] }, { "cell_type": "code", "execution_count": null, "id": "87055296-62e1-4fa9-8868-5c213f4ea2e6", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 5 }