{ "cells": [ { "cell_type": "markdown", "id": "15e64bc9-bdf7-4fe5-9839-5d160c425c61", "metadata": { "tags": [] }, "source": [ "# Typewriter: 26 Tools\n", "\n", "This is a variation of the typewriter task in which the agent has access to 26 parameterless tools.\n", "\n", "Each tool represents a letter of the alphabet (e.g., 'a', 'b', 'c').\n", "\n", "The agent can use each tool to \"print\" the corresponding letter on a piece of virtual paper.\n", "\n", "The objective for the agent is to \"print\" the user's input on the paper exactly.\n", "\n", "---------\n", "\n", "For this code to work, please configure LangSmith environment variables with your credentials.\n", "\n", "```python\n", "import os\n", "\n", "os.environ[\"LANGCHAIN_API_KEY\"] = \"sk-...\" # Your api key.\n", "```" ] }, { "cell_type": "code", "execution_count": 1, "id": "b39159d0-9ea1-414f-a9d8-4a7b22b3d2cc", "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain_benchmarks import registry" ] }, { "cell_type": "code", "execution_count": 2, "id": "1aef2b32-a5df-421f-8be3-a2ef27372ece", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Name Tool Usage - Typewriter (26 tools)
Type ToolUsageTask
Dataset ID 128af05e-aa00-4e3b-a958-d166dd450581
DescriptionEnvironment with 26 tools each tool represents a letter of the alphabet.\n", "\n", "The objective of this task is to evaluate the model's ability the use tools\n", "for a simple repetition task.\n", "\n", "For example, if the string is 'abc', the tools 'a', 'b', and 'c' must be invoked in that order.\n", "\n", "The dataset includes examples of varying difficulty. The difficulty is measured by the length of the string.\n", "\n", "This is a variation of the typer writer task, where 26 parameterless tools are\n", "given instead of a single tool that takes a letter as an argument.
" ], "text/plain": [ "ToolUsageTask(name='Tool Usage - Typewriter (26 tools)', dataset_id='https://smith.langchain.com/public/128af05e-aa00-4e3b-a958-d166dd450581/d', description=\"Environment with 26 tools each tool represents a letter of the alphabet.\\n\\nThe objective of this task is to evaluate the model's ability the use tools\\nfor a simple repetition task.\\n\\nFor example, if the string is 'abc', the tools 'a', 'b', and 'c' must be invoked in that order.\\n\\nThe dataset includes examples of varying difficulty. The difficulty is measured by the length of the string.\\n\\nThis is a variation of the typer writer task, where 26 parameterless tools are\\ngiven instead of a single tool that takes a letter as an argument.\\n\", create_environment=, instructions=\"Repeat the given string by using the provided tools. Do not write anything else or provide any explanations. For example, if the string is 'abc', you must invoke the tools 'a', 'b', and 'c' in that order. Please invoke the functions without any arguments.\", eval_params={'output_evaluation': 'none'})" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "task = registry[\"Tool Usage - Typewriter (26 tools)\"]\n", "task" ] }, { "cell_type": "markdown", "id": "b462f7b8-fd42-4613-ab5f-5f3cbbc37d28", "metadata": {}, "source": [ "Let's build an agent that we can use for evaluation." ] }, { "cell_type": "markdown", "id": "6ce51f81-1b3a-4dda-a382-c2fed3013af1", "metadata": {}, "source": [ "## The Environment\n", "\n", "The environment consists of 26 tools and a virtual paper.\n", "\n", "Each tool is responsible for printing a letter on the paper that corresponds to it." ] }, { "cell_type": "code", "execution_count": 3, "id": "61535a75-24f6-4727-9549-f76c263e9153", "metadata": { "tags": [] }, "outputs": [], "source": [ "env = task.create_environment()" ] }, { "cell_type": "code", "execution_count": 4, "id": "f35a0a1d-5a1e-4de1-8d8c-c7c9a264a6c7", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "[StructuredTool(name='a', description='a() -> str - Run to Type the letter \"a\".', args_schema=, func=.func at 0x75aa9defc180>),\n", " StructuredTool(name='b', description='b() -> str - Run to Type the letter \"b\".', args_schema=, func=.func at 0x75aa9defc220>),\n", " StructuredTool(name='c', description='c() -> str - Run to Type the letter \"c\".', args_schema=, func=.func at 0x75aa9defc2c0>),\n", " StructuredTool(name='d', description='d() -> str - Run to Type the letter \"d\".', args_schema=, func=.func at 0x75aa9defc360>),\n", " StructuredTool(name='e', description='e() -> str - Run to Type the letter \"e\".', args_schema=, func=.func at 0x75aa9defc400>)]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "env.tools[:5]" ] }, { "cell_type": "code", "execution_count": 5, "id": "5bea0190-39ec-4f30-9a00-90136bc6bf0b", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "'OK'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "env.tools[0].invoke({})" ] }, { "cell_type": "code", "execution_count": 6, "id": "bf7444da-15a1-455a-b22e-639cbfff8432", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "'OK'" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "env.tools[3].invoke({})" ] }, { "cell_type": "code", "execution_count": 7, "id": "d12bd710-5c01-4539-a4b9-afbf03164923", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "'ad'" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "env.read_state()" ] }, { "cell_type": "markdown", "id": "f1d62a13-3771-460f-b131-4443f669ca3d", "metadata": {}, "source": [ "## Explore the task\n", "\n", "For evaluation, we need an agent factory that will create a new instance of an agent executor for every evaluation run.\n", "\n", "We'll use the `StandardAgentFactory` -- look at the `intro` for more information about what it does and/or how to create a custom one." ] }, { "cell_type": "code", "execution_count": 10, "id": "6142cf4e-862c-47a3-aa75-81d7d3231308", "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain_core.prompts import ChatPromptTemplate\n", "from langchain_openai.chat_models import ChatOpenAI\n", "\n", "from langchain_benchmarks.tool_usage.agents import StandardAgentFactory\n", "\n", "model = ChatOpenAI(temperature=0)\n", "prompt = ChatPromptTemplate.from_messages(\n", " [\n", " (\"system\", \"{instructions}\"), # Populated from task.instructions automatically\n", " (\"human\", \"{question}\"), # Populated from the test data\n", " (\n", " \"placeholder\",\n", " \"{agent_scratchpad}\",\n", " ), # Work where the agent can do its work (e.g., call multiple tools)\n", " ]\n", ")\n", "\n", "agent_factory = StandardAgentFactory(task, model, prompt)" ] }, { "cell_type": "code", "execution_count": 12, "id": "fb32763c-79ab-426a-8fc6-bf8ebb0dd432", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", "\u001b[32;1m\u001b[1;3m\n", "Invoking: `a` with `{}`\n", "\n", "\n", "\u001b[0m\u001b[36;1m\u001b[1;3mOK\u001b[0m\u001b[32;1m\u001b[1;3m\n", "Invoking: `b` with `{}`\n", "\n", "\n", "\u001b[0m\u001b[33;1m\u001b[1;3mOK\u001b[0m\u001b[32;1m\u001b[1;3m\n", "Invoking: `c` with `{}`\n", "\n", "\n", "\u001b[0m\u001b[38;5;200m\u001b[1;3mOK\u001b[0m\u001b[32;1m\u001b[1;3mabcabcabc\u001b[0m\n", "\n", "\u001b[1m> Finished chain.\u001b[0m\n" ] }, { "data": { "text/plain": [ "{'question': 'abc',\n", " 'output': 'abcabcabc',\n", " 'intermediate_steps': [(ToolAgentAction(tool='a', tool_input={}, log='\\nInvoking: `a` with `{}`\\n\\n\\n', message_log=[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_OrpjShN5uNzw2Rsb1tWF6swI', 'function': {'arguments': '{}', 'name': 'a'}, 'type': 'function'}, {'index': 1, 'id': 'call_2XO5RNgt9FjGvTXztgD0tKqW', 'function': {'arguments': '{}', 'name': 'b'}, 'type': 'function'}, {'index': 2, 'id': 'call_MRAOAgbi8vT445clqC8OybMR', 'function': {'arguments': '{}', 'name': 'c'}, 'type': 'function'}]}, response_metadata={'finish_reason': 'tool_calls'}, id='run-9a1af767-29e4-4759-ab28-5b29236e8f22', tool_calls=[{'name': 'a', 'args': {}, 'id': 'call_OrpjShN5uNzw2Rsb1tWF6swI'}, {'name': 'b', 'args': {}, 'id': 'call_2XO5RNgt9FjGvTXztgD0tKqW'}, {'name': 'c', 'args': {}, 'id': 'call_MRAOAgbi8vT445clqC8OybMR'}], tool_call_chunks=[{'name': 'a', 'args': '{}', 'id': 'call_OrpjShN5uNzw2Rsb1tWF6swI', 'index': 0}, {'name': 'b', 'args': '{}', 'id': 'call_2XO5RNgt9FjGvTXztgD0tKqW', 'index': 1}, {'name': 'c', 'args': '{}', 'id': 'call_MRAOAgbi8vT445clqC8OybMR', 'index': 2}])], tool_call_id='call_OrpjShN5uNzw2Rsb1tWF6swI'),\n", " 'OK'),\n", " (ToolAgentAction(tool='b', tool_input={}, log='\\nInvoking: `b` with `{}`\\n\\n\\n', message_log=[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_OrpjShN5uNzw2Rsb1tWF6swI', 'function': {'arguments': '{}', 'name': 'a'}, 'type': 'function'}, {'index': 1, 'id': 'call_2XO5RNgt9FjGvTXztgD0tKqW', 'function': {'arguments': '{}', 'name': 'b'}, 'type': 'function'}, {'index': 2, 'id': 'call_MRAOAgbi8vT445clqC8OybMR', 'function': {'arguments': '{}', 'name': 'c'}, 'type': 'function'}]}, response_metadata={'finish_reason': 'tool_calls'}, id='run-9a1af767-29e4-4759-ab28-5b29236e8f22', tool_calls=[{'name': 'a', 'args': {}, 'id': 'call_OrpjShN5uNzw2Rsb1tWF6swI'}, {'name': 'b', 'args': {}, 'id': 'call_2XO5RNgt9FjGvTXztgD0tKqW'}, {'name': 'c', 'args': {}, 'id': 'call_MRAOAgbi8vT445clqC8OybMR'}], tool_call_chunks=[{'name': 'a', 'args': '{}', 'id': 'call_OrpjShN5uNzw2Rsb1tWF6swI', 'index': 0}, {'name': 'b', 'args': '{}', 'id': 'call_2XO5RNgt9FjGvTXztgD0tKqW', 'index': 1}, {'name': 'c', 'args': '{}', 'id': 'call_MRAOAgbi8vT445clqC8OybMR', 'index': 2}])], tool_call_id='call_2XO5RNgt9FjGvTXztgD0tKqW'),\n", " 'OK'),\n", " (ToolAgentAction(tool='c', tool_input={}, log='\\nInvoking: `c` with `{}`\\n\\n\\n', message_log=[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_OrpjShN5uNzw2Rsb1tWF6swI', 'function': {'arguments': '{}', 'name': 'a'}, 'type': 'function'}, {'index': 1, 'id': 'call_2XO5RNgt9FjGvTXztgD0tKqW', 'function': {'arguments': '{}', 'name': 'b'}, 'type': 'function'}, {'index': 2, 'id': 'call_MRAOAgbi8vT445clqC8OybMR', 'function': {'arguments': '{}', 'name': 'c'}, 'type': 'function'}]}, response_metadata={'finish_reason': 'tool_calls'}, id='run-9a1af767-29e4-4759-ab28-5b29236e8f22', tool_calls=[{'name': 'a', 'args': {}, 'id': 'call_OrpjShN5uNzw2Rsb1tWF6swI'}, {'name': 'b', 'args': {}, 'id': 'call_2XO5RNgt9FjGvTXztgD0tKqW'}, {'name': 'c', 'args': {}, 'id': 'call_MRAOAgbi8vT445clqC8OybMR'}], tool_call_chunks=[{'name': 'a', 'args': '{}', 'id': 'call_OrpjShN5uNzw2Rsb1tWF6swI', 'index': 0}, {'name': 'b', 'args': '{}', 'id': 'call_2XO5RNgt9FjGvTXztgD0tKqW', 'index': 1}, {'name': 'c', 'args': '{}', 'id': 'call_MRAOAgbi8vT445clqC8OybMR', 'index': 2}])], tool_call_id='call_MRAOAgbi8vT445clqC8OybMR'),\n", " 'OK')],\n", " 'state': 'abc'}" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from langchain import globals\n", "\n", "globals.set_verbose(True)\n", "\n", "agent = agent_factory()\n", "agent.invoke({\"question\": \"abc\"})" ] }, { "cell_type": "markdown", "id": "89124d06-41f7-4432-9f2e-542c0d85e2e5", "metadata": {}, "source": [ "## Benchmarking\n", "\n", "See `introduction` and `benchmark all` for information on how to run benchmarks. This notebook is just to here to explain and explore the task." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 5 }