Datasets#
Here, we’ll see how to work with LangSmith datasets.
%pip install -U langchain-benchmarks
from langchain_benchmarks import clone_public_dataset, download_public_dataset
Let’s first download the dataset to the local file system
download_public_dataset(
"https://smith.langchain.com/public/452ccafc-18e1-4314-885b-edd735f17b9d/examples"
)
Fetching examples...
Done fetching examples.
we can take a look at the first two examples
import json
with open("./e95d45da-aaa3-44b3-ba2b-7c15ff6e46f5.json", "r", encoding="utf-8") as f:
print(json.dumps(json.load(f)[:2], indent=2, sort_keys=True))
[
{
"created_at": "2023-11-15T15:26:53.511629",
"dataset_id": "9f73165c-d333-4d14-8f59-bd7eede5db08",
"id": "0703a989-2693-4039-a1f6-7281fc1b4cb0",
"inputs": {
"question": "do bob and alice live in the same city?"
},
"modified_at": "2023-11-15T15:26:53.511629",
"outputs": {
"expected_steps": [
"find_users_by_name",
"get_user_location",
"get_city_for_location",
"get_user_location",
"get_city_for_location"
],
"order_matters": false,
"reference": "no"
},
"runs": []
},
{
"created_at": "2023-11-15T15:26:53.491359",
"dataset_id": "9f73165c-d333-4d14-8f59-bd7eede5db08",
"id": "b258b95a-9524-4da7-b758-c5481109322d",
"inputs": {
"question": "Is it likely that Donna is outside with an umbrella at this time?"
},
"modified_at": "2023-11-15T15:26:53.491359",
"outputs": {
"expected_steps": [
"find_users_by_name",
"get_user_location",
"get_current_time_for_location",
"get_current_weather_for_location"
],
"order_matters": false,
"reference": "yes"
},
"runs": []
}
]
We can also clone the dataset to our local tenant
Executing this command will clone the dataset to your own LangSmith tenant. For this to work you must have a LangSmith account set up.
import os
# Get from https://smith.langchain.com/settings
os.environ["LANGCHAIN_API_KEY"] = "ls_..."
clone_public_dataset(
"https://smith.langchain.com/public/452ccafc-18e1-4314-885b-edd735f17b9d/examples",
dataset_name="Agent Dataset",
)