Datasets#

Here, we’ll see how to work with LangSmith datasets.

%pip install -U langchain-benchmarks
from langchain_benchmarks import clone_public_dataset, download_public_dataset

Let’s first download the dataset to the local file system

download_public_dataset(
    "https://smith.langchain.com/public/452ccafc-18e1-4314-885b-edd735f17b9d/examples"
)
Fetching examples...
Done fetching examples.

we can take a look at the first two examples

import json

with open("./e95d45da-aaa3-44b3-ba2b-7c15ff6e46f5.json", "r", encoding="utf-8") as f:
    print(json.dumps(json.load(f)[:2], indent=2, sort_keys=True))
[
  {
    "created_at": "2023-11-15T15:26:53.511629",
    "dataset_id": "9f73165c-d333-4d14-8f59-bd7eede5db08",
    "id": "0703a989-2693-4039-a1f6-7281fc1b4cb0",
    "inputs": {
      "question": "do bob and alice live in the same city?"
    },
    "modified_at": "2023-11-15T15:26:53.511629",
    "outputs": {
      "expected_steps": [
        "find_users_by_name",
        "get_user_location",
        "get_city_for_location",
        "get_user_location",
        "get_city_for_location"
      ],
      "order_matters": false,
      "reference": "no"
    },
    "runs": []
  },
  {
    "created_at": "2023-11-15T15:26:53.491359",
    "dataset_id": "9f73165c-d333-4d14-8f59-bd7eede5db08",
    "id": "b258b95a-9524-4da7-b758-c5481109322d",
    "inputs": {
      "question": "Is it likely that Donna is outside with an umbrella at this time?"
    },
    "modified_at": "2023-11-15T15:26:53.491359",
    "outputs": {
      "expected_steps": [
        "find_users_by_name",
        "get_user_location",
        "get_current_time_for_location",
        "get_current_weather_for_location"
      ],
      "order_matters": false,
      "reference": "yes"
    },
    "runs": []
  }
]

We can also clone the dataset to our local tenant

Executing this command will clone the dataset to your own LangSmith tenant. For this to work you must have a LangSmith account set up.

import os

# Get from https://smith.langchain.com/settings
os.environ["LANGCHAIN_API_KEY"] = "ls_..."
clone_public_dataset(
    "https://smith.langchain.com/public/452ccafc-18e1-4314-885b-edd735f17b9d/examples",
    dataset_name="Agent Dataset",
)