AstraDBLoader#

Load DataStax Astra DB documents.

Parameters:

collection_name (str) – name of the Astra DB collection to use.
token (str | TokenProvider | None) – API token for Astra DB usage, either in the form of a string or a subclass of astrapy.authentication.TokenProvider. If not provided, the environment variable ASTRA_DB_APPLICATION_TOKEN is inspected.
api_endpoint (str | None) – full URL to the API endpoint, such as https://<DB-ID>-us-east1.apps.astra.datastax.com. If not provided, the environment variable ASTRA_DB_API_ENDPOINT is inspected.
environment (str | None) – a string specifying the environment of the target Data API. If omitted, defaults to “prod” (Astra DB production). Other values are in astrapy.constants.Environment enum class.
namespace (str | None) – namespace (aka keyspace) where the collection resides. If not provided, the environment variable ASTRA_DB_KEYSPACE is inspected. Defaults to the database’s “default namespace”.
filter_criteria (dict[str, Any] | None) – Criteria to filter documents.
projection (dict[str, Any] | None) – Specifies the fields to return. If not provided, reads fall back to the Data API default projection.
limit (int | None) – a maximum number of documents to return in the read query.
nb_prefetched (int) – Max number of documents to pre-fetch. IGNORED starting from v. 0.3.5: astrapy v1.0+ does not support it.
page_content_mapper (Callable[[dict], str]) – Function applied to collection documents to create the page_content of the LangChain Document. Defaults to json.dumps.
metadata_mapper (Callable[[dict], dict[str, Any]] | None) –
Function applied to collection documents to create the metadata of the LangChain Document. Defaults to returning the

namespace, API endpoint and collection name.
ext_callers (list[tuple[str | None, str | None] | str | None] | None) – one or more caller identities to identify Data API calls in the User-Agent header. This is a list of (name, version) pairs, or just strings if no version info is provided, which, if supplied, becomes the leading part of the User-Agent string in all API requests related to this component.
api_options (APIOptions | None) – an instance of astrapy.utils.api_options.APIOptions that can be supplied to customize the interaction with the Data API regarding serialization/deserialization, timeouts, custom headers and so on. The provided options are applied on top of settings already tailored to this library, and if specified will take precedence. Passing None (default) means no customization is requested. Refer to the astrapy documentation for details.

Methods

`__init__`(collection_name, *[, token, ...])	Load DataStax Astra DB documents.
`alazy_load`()	A lazy loader for Documents.
`aload`()	Load data into Document objects.
`lazy_load`()	A lazy loader for Documents.
`load`()	Load data into Document objects.
`load_and_split`([text_splitter])	Load Documents and split into chunks.

Load DataStax Astra DB documents.

Parameters:

collection_name (str) – name of the Astra DB collection to use.
token (str | TokenProvider | None) – API token for Astra DB usage, either in the form of a string or a subclass of astrapy.authentication.TokenProvider. If not provided, the environment variable ASTRA_DB_APPLICATION_TOKEN is inspected.
api_endpoint (str | None) – full URL to the API endpoint, such as https://<DB-ID>-us-east1.apps.astra.datastax.com. If not provided, the environment variable ASTRA_DB_API_ENDPOINT is inspected.
environment (str | None) – a string specifying the environment of the target Data API. If omitted, defaults to “prod” (Astra DB production). Other values are in astrapy.constants.Environment enum class.
namespace (str | None) – namespace (aka keyspace) where the collection resides. If not provided, the environment variable ASTRA_DB_KEYSPACE is inspected. Defaults to the database’s “default namespace”.
filter_criteria (dict[str, Any] | None) – Criteria to filter documents.
projection (dict[str, Any] | None) – Specifies the fields to return. If not provided, reads fall back to the Data API default projection.
limit (int | None) – a maximum number of documents to return in the read query.
nb_prefetched (int) – Max number of documents to pre-fetch. IGNORED starting from v. 0.3.5: astrapy v1.0+ does not support it.
page_content_mapper (Callable[[dict], str]) – Function applied to collection documents to create the page_content of the LangChain Document. Defaults to json.dumps.
metadata_mapper (Callable[[dict], dict[str, Any]] | None) –
Function applied to collection documents to create the metadata of the LangChain Document. Defaults to returning the

namespace, API endpoint and collection name.
ext_callers (list[tuple[str | None, str | None] | str | None] | None) – one or more caller identities to identify Data API calls in the User-Agent header. This is a list of (name, version) pairs, or just strings if no version info is provided, which, if supplied, becomes the leading part of the User-Agent string in all API requests related to this component.
api_options (APIOptions | None) – an instance of astrapy.utils.api_options.APIOptions that can be supplied to customize the interaction with the Data API regarding serialization/deserialization, timeouts, custom headers and so on. The provided options are applied on top of settings already tailored to this library, and if specified will take precedence. Passing None (default) means no customization is requested. Refer to the astrapy documentation for details.

Return type:

None

async alazy_load() → AsyncIterator[Document][source]#

A lazy loader for Documents.

Return type:: AsyncIterator[Document]

async aload() → list[Document][source]#

Load data into Document objects.

Return type:: list[Document]

lazy_load() → Iterator[Document][source]#

A lazy loader for Documents.

Return type:: Iterator[Document]

load() → list[Document]#

Load data into Document objects.

Return type:: list[Document]

load_and_split( text_splitter: TextSplitter | None = None, ) → list[Document]#

Load Documents and split into chunks. Chunks are returned as Documents.

Do not override this method. It should be considered to be deprecated!

Parameters:: text_splitter (Optional[TextSplitter]) – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.
Returns:: List of Documents.
Return type:: list[Document]

Examples using AstraDBLoader

AstraDB