OpenAIEmbeddings#

class langchain_openai.embeddings.base.OpenAIEmbeddings[source]#

Bases: BaseModel, Embeddings

OpenAI embedding model integration.

Setup:

Install langchain_openai and set environment variable OPENAI_API_KEY.

pip install -U langchain_openai
export OPENAI_API_KEY="your-api-key"
Key init args — embedding params:
model: str

Name of OpenAI model to use.

dimensions: Optional[int] = None

The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.

Key init args — client params:
api_key: Optional[SecretStr] = None

OpenAI API key.

organization: Optional[str] = None

OpenAI organization ID. If not passed in will be read from env var OPENAI_ORG_ID.

max_retries: int = 2

Maximum number of retries to make when generating.

request_timeout: Optional[Union[float, Tuple[float, float], Any]] = None

Timeout for requests to OpenAI completion API

See full list of supported init args and their descriptions in the params section.

Instantiate:
from langchain_openai import OpenAIEmbeddings

embed = OpenAIEmbeddings(
    model="text-embedding-3-large"
    # With the `text-embedding-3` class
    # of models, you can specify the size
    # of the embeddings you want returned.
    # dimensions=1024
)
Embed single text:
input_text = "The meaning of life is 42"
vector = embeddings.embed_query("hello")
print(vector[:3])
[-0.024603435769677162, -0.007543657906353474, 0.0039630369283258915]
Embed multiple texts:
vectors = embeddings.embed_documents(["hello", "goodbye"])
# Showing only the first 3 coordinates
print(len(vectors))
print(vectors[0][:3])
2
[-0.024603435769677162, -0.007543657906353474, 0.0039630369283258915]
Async:
await embed.aembed_query(input_text)
print(vector[:3])

# multiple:
# await embed.aembed_documents(input_texts)
[-0.009100092574954033, 0.005071679595857859, -0.0029193938244134188]

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

param allowed_special: Literal['all'] | Set[str] | None = None#
param check_embedding_ctx_length: bool = True#

Whether to check the token length of inputs and automatically split inputs longer than embedding_ctx_length.

param chunk_size: int = 1000#

Maximum number of texts to embed in each batch

param default_headers: Mapping[str, str] | None = None#
param default_query: Mapping[str, object] | None = None#
param deployment: str | None = 'text-embedding-ada-002'#
param dimensions: int | None = None#

The number of dimensions the resulting output embeddings should have.

Only supported in text-embedding-3 and later models.

param disallowed_special: Literal['all'] | Set[str] | Sequence[str] | None = None#
param embedding_ctx_length: int = 8191#

The maximum number of tokens to embed at once.

param headers: Any = None#
param http_async_client: Any | None = None#

Optional httpx.AsyncClient. Only used for async invocations. Must specify http_client as well if you’d like a custom client for sync invocations.

param http_client: Any | None = None#

Optional httpx.Client. Only used for sync invocations. Must specify http_async_client as well if you’d like a custom client for async invocations.

param max_retries: int = 2#

Maximum number of retries to make when generating.

param model: str = 'text-embedding-ada-002'#
param model_kwargs: Dict[str, Any] [Optional]#

Holds any model parameters valid for create call not explicitly specified.

param openai_api_base: str | None [Optional] (alias 'base_url')#

Base URL path for API requests, leave blank if not using a proxy or service emulator.

param openai_api_key: SecretStr | None [Optional] (alias 'api_key')#

Automatically inferred from env var OPENAI_API_KEY if not provided.

param openai_api_type: str | None [Optional]#
param openai_api_version: str | None [Optional] (alias 'api_version')#

Automatically inferred from env var OPENAI_API_VERSION if not provided.

param openai_organization: str | None [Optional] (alias 'organization')#

Automatically inferred from env var OPENAI_ORG_ID if not provided.

param openai_proxy: str | None [Optional]#
param request_timeout: float | Tuple[float, float] | Any | None = None (alias 'timeout')#

Timeout for requests to OpenAI completion API. Can be float, httpx.Timeout or None.

param retry_max_seconds: int = 20#

Max number of seconds to wait between retries

param retry_min_seconds: int = 4#

Min number of seconds to wait between retries

param show_progress_bar: bool = False#

Whether to show a progress bar when embedding.

param skip_empty: bool = False#

Whether to skip empty strings when embedding or raise an error. Defaults to not skipping.

param tiktoken_enabled: bool = True#

Set this to False for non-OpenAI implementations of the embeddings API, e.g. the –extensions openai extension for text-generation-webui

param tiktoken_model_name: str | None = None#

The model name to pass to tiktoken when using this class. Tiktoken is used to count the number of tokens in documents to constrain them to be under a certain limit. By default, when set to None, this will be the same as the embedding model name. However, there are some cases where you may want to use this Embedding class with a model name not supported by tiktoken. This can include when using Azure embeddings or when using one of the many model providers that expose an OpenAI-like API but with different models. In those cases, in order to avoid erroring when tiktoken is called, you can specify a model name to use here.

async aembed_documents(texts: List[str], chunk_size: int | None = None) List[List[float]][source]#

Call out to OpenAI’s embedding endpoint async for embedding search docs.

Parameters:
  • texts (List[str]) – The list of texts to embed.

  • chunk_size (int | None) – The chunk size of embeddings. If None, will use the chunk size specified by the class.

Returns:

List of embeddings, one for each text.

Return type:

List[List[float]]

async aembed_query(text: str) List[float][source]#

Call out to OpenAI’s embedding endpoint async for embedding query text.

Parameters:

text (str) – The text to embed.

Returns:

Embedding for the text.

Return type:

List[float]

embed_documents(texts: List[str], chunk_size: int | None = None) List[List[float]][source]#

Call out to OpenAI’s embedding endpoint for embedding search docs.

Parameters:
  • texts (List[str]) – The list of texts to embed.

  • chunk_size (int | None) – The chunk size of embeddings. If None, will use the chunk size specified by the class.

Returns:

List of embeddings, one for each text.

Return type:

List[List[float]]

embed_query(text: str) List[float][source]#

Call out to OpenAI’s embedding endpoint for embedding query text.

Parameters:

text (str) – The text to embed.

Returns:

Embedding for the text.

Return type:

List[float]

Examples using OpenAIEmbeddings