OllamaEmbeddings#

class langchain_ollama.embeddings.OllamaEmbeddings[source]#

Bases: BaseModel, Embeddings

Ollama embedding model integration.

Set up a local Ollama instance:

Install the Ollama package and set up a local Ollama instance using the instructions here: ollama/ollama .

You will need to choose a model to serve.

You can view a list of available models via the model library (https://ollama.com/library).

To fetch a model from the Ollama model library use ollama pull <name-of-model>.

For example, to pull the llama3 model:

ollama pull llama3

This will download the default tagged version of the model. Typically, the default points to the latest, smallest sized-parameter model.

  • On Mac, the models will be downloaded to ~/.ollama/models

  • On Linux (or WSL), the models will be stored at /usr/share/ollama/.ollama/models

You can specify the exact version of the model of interest as such ollama pull vicuna:13b-v1.5-16k-q4_0.

To view pulled models:

ollama list

To start serving:

ollama serve

View the Ollama documentation for more commands.

ollama help
Install the langchain-ollama integration package:
pip install -U langchain_ollama
Key init args — completion params:
model: str

Name of Ollama model to use.

base_url: Optional[str]

Base url the model is hosted under.

See full list of supported init args and their descriptions in the params section.

Instantiate:
from langchain_ollama import OllamaEmbeddings

embed = OllamaEmbeddings(
    model="llama3"
)
Embed single text:
input_text = "The meaning of life is 42"
vector = embed.embed_query(input_text)
print(vector[:3])
[-0.024603435769677162, -0.007543657906353474, 0.0039630369283258915]
Embed multiple texts:
 input_texts = ["Document 1...", "Document 2..."]
vectors = embed.embed_documents(input_texts)
print(len(vectors))
# The first 3 coordinates for the first vector
print(vectors[0][:3])
2
[-0.024603435769677162, -0.007543657906353474, 0.0039630369283258915]
Async:
 vector = await embed.aembed_query(input_text)
print(vector[:3])

 # multiple:
 # await embed.aembed_documents(input_texts)
[-0.009100092574954033, 0.005071679595857859, -0.0029193938244134188]

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

param base_url: str | None = None#

Base url the model is hosted under.

param client_kwargs: dict | None = {}#

Additional kwargs to pass to the httpx Client. For a full list of the params, see [this link](https://pydoc.dev/httpx/latest/httpx.Client.html)

param mirostat: int | None = None#

Enable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)

param mirostat_eta: float | None = None#

Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive. (Default: 0.1)

param mirostat_tau: float | None = None#

Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0)

param model: str [Required]#

Model name to use.

param num_ctx: int | None = None#

Sets the size of the context window used to generate the next token. (Default: 2048)

param num_gpu: int | None = None#

The number of GPUs to use. On macOS it defaults to 1 to enable metal support, 0 to disable.

param num_thread: int | None = None#

Sets the number of threads to use during computation. By default, Ollama will detect this for optimal performance. It is recommended to set this value to the number of physical CPU cores your system has (as opposed to the logical number of cores).

param repeat_last_n: int | None = None#

Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)

param repeat_penalty: float | None = None#

Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)

param stop: List[str] | None = None#

Sets the stop tokens to use.

param temperature: float | None = None#

The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8)

param tfs_z: float | None = None#

Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting. (default: 1)

param top_k: int | None = None#

Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)

param top_p: float | None = None#

Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)

async aembed_documents(texts: List[str]) List[List[float]][source]#

Embed search docs.

Parameters:

texts (List[str])

Return type:

List[List[float]]

async aembed_query(text: str) List[float][source]#

Embed query text.

Parameters:

text (str)

Return type:

List[float]

embed_documents(texts: List[str]) List[List[float]][source]#

Embed search docs.

Parameters:

texts (List[str])

Return type:

List[List[float]]

embed_query(text: str) List[float][source]#

Embed query text.

Parameters:

text (str)

Return type:

List[float]

Examples using OllamaEmbeddings