LangSmithLoader#

class langchain_core.document_loaders.langsmith.LangSmithLoader(*, dataset_id: UUID | str | None = None, dataset_name: str | None = None, example_ids: Sequence[UUID | str] | None = None, as_of: datetime | str | None = None, splits: Sequence[str] | None = None, inline_s3_urls: bool = True, offset: int = 0, limit: int | None = None, metadata: dict | None = None, filter: str | None = None, content_key: str = '', format_content: Callable[[...], str] | None = None, client: Client | None = None, **client_kwargs: Any)[source]#

Load LangSmith Dataset examples as Documents.

Loads the example inputs as the Document page content and places the entire example into the Document metadata. This allows you to easily create few-shot example retrievers from the loaded documents.

Lazy load
from langchain_core.document_loaders import LangSmithLoader

loader = LangSmithLoader(dataset_id="...", limit=100)
docs = []
for doc in loader.lazy_load():
    docs.append(doc)
# -> [Document("...", metadata={"inputs": {...}, "outputs": {...}, ...}), ...]

New in version 0.2.34.

Parameters:
  • dataset_id (UUID | str | None) – The ID of the dataset to filter by. Defaults to None.

  • dataset_name (str | None) – The name of the dataset to filter by. Defaults to None.

  • content_key (str) – The inputs key to set as Document page content. "." characters are interpreted as nested keys. E.g. content_key="first.second" will result in Document(page_content=format_content(example.inputs["first"]["second"]))

  • format_content (Callable[[...], str] | None) – Function for converting the content extracted from the example inputs into a string. Defaults to JSON-encoding the contents.

  • example_ids (Sequence[UUID | str] | None) – The IDs of the examples to filter by. Defaults to None.

  • as_of (datetime | str | None) – The dataset version tag OR timestamp to retrieve the examples as of. Response examples will only be those that were present at the time of the tagged (or timestamped) version.

  • splits (Sequence[str] | None) – A list of dataset splits, which are divisions of your dataset such as ‘train’, ‘test’, or ‘validation’. Returns examples only from the specified splits.

  • inline_s3_urls (bool) – Whether to inline S3 URLs. Defaults to True.

  • offset (int) – The offset to start from. Defaults to 0.

  • limit (int | None) – The maximum number of examples to return.

  • filter (str | None) – A structured filter string to apply to the examples.

  • client (Client | None) – LangSmith Client. If not provided will be initialized from below args.

  • client_kwargs (Any) – Keyword args to pass to LangSmith client init. Should only be specified if client isn’t.

  • metadata (dict | None) –

Methods

__init__(*[, dataset_id, dataset_name, ...])

param dataset_id:

The ID of the dataset to filter by. Defaults to None.

alazy_load()

A lazy loader for Documents.

aload()

Load data into Document objects.

lazy_load()

A lazy loader for Documents.

load()

Load data into Document objects.

load_and_split([text_splitter])

Load Documents and split into chunks.

__init__(*, dataset_id: UUID | str | None = None, dataset_name: str | None = None, example_ids: Sequence[UUID | str] | None = None, as_of: datetime | str | None = None, splits: Sequence[str] | None = None, inline_s3_urls: bool = True, offset: int = 0, limit: int | None = None, metadata: dict | None = None, filter: str | None = None, content_key: str = '', format_content: Callable[[...], str] | None = None, client: Client | None = None, **client_kwargs: Any) None[source]#
Parameters:
  • dataset_id (UUID | str | None) – The ID of the dataset to filter by. Defaults to None.

  • dataset_name (str | None) – The name of the dataset to filter by. Defaults to None.

  • content_key (str) – The inputs key to set as Document page content. "." characters are interpreted as nested keys. E.g. content_key="first.second" will result in Document(page_content=format_content(example.inputs["first"]["second"]))

  • format_content (Callable[[...], str] | None) – Function for converting the content extracted from the example inputs into a string. Defaults to JSON-encoding the contents.

  • example_ids (Sequence[UUID | str] | None) – The IDs of the examples to filter by. Defaults to None.

  • as_of (datetime | str | None) – The dataset version tag OR timestamp to retrieve the examples as of. Response examples will only be those that were present at the time of the tagged (or timestamped) version.

  • splits (Sequence[str] | None) – A list of dataset splits, which are divisions of your dataset such as ‘train’, ‘test’, or ‘validation’. Returns examples only from the specified splits.

  • inline_s3_urls (bool) – Whether to inline S3 URLs. Defaults to True.

  • offset (int) – The offset to start from. Defaults to 0.

  • limit (int | None) – The maximum number of examples to return.

  • filter (str | None) – A structured filter string to apply to the examples.

  • client (Client | None) – LangSmith Client. If not provided will be initialized from below args.

  • client_kwargs (Any) – Keyword args to pass to LangSmith client init. Should only be specified if client isn’t.

  • metadata (dict | None) –

Return type:

None

async alazy_load() AsyncIterator[Document]#

A lazy loader for Documents.

Return type:

AsyncIterator[Document]

async aload() List[Document]#

Load data into Document objects.

Return type:

List[Document]

lazy_load() Iterator[Document][source]#

A lazy loader for Documents.

Return type:

Iterator[Document]

load() List[Document]#

Load data into Document objects.

Return type:

List[Document]

load_and_split(text_splitter: TextSplitter | None = None) List[Document]#

Load Documents and split into chunks. Chunks are returned as Documents.

Do not override this method. It should be considered to be deprecated!

Parameters:

text_splitter (Optional[TextSplitter]) – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.

Returns:

List of Documents.

Return type:

List[Document]

Examples using LangSmithLoader