JSFrameworkTextSplitter#

class langchain_text_splitters.jsx.JSFrameworkTextSplitter(

separators: list[str] | None = None,

chunk_size: int = 2000,

chunk_overlap: int = 0,

**kwargs: Any,

)[source]#

Text splitter that handles React (JSX), Vue, and Svelte code.

This splitter extends RecursiveCharacterTextSplitter to handle React (JSX), Vue, and Svelte code by:

Detecting and extracting custom component tags from the text
Using those tags as additional separators along with standard JS syntax

The splitter combines:

Custom component tags as separators (e.g. <Component, <div)
JavaScript syntax elements (function, const, if, etc)
Standard text splitting on newlines

This allows chunks to break at natural boundaries in React, Vue, and Svelte component code.

Initialize the JS Framework text splitter.

Parameters:

separators (list[str] | None) – Optional list of custom separator strings to use
chunk_size (int) – Maximum size of chunks to return
chunk_overlap (int) – Overlap in characters between chunks
**kwargs (Any) – Additional arguments to pass to parent class

Methods

`__init__`([separators, chunk_size, chunk_overlap])	Initialize the JS Framework text splitter.
`atransform_documents`(documents, **kwargs)	Asynchronously transform a list of documents.
`create_documents`(texts[, metadatas])	Create documents from a list of texts.
`from_huggingface_tokenizer`(tokenizer, **kwargs)	Text splitter that uses HuggingFace tokenizer to count length.
`from_language`(language, **kwargs)	Return an instance of this class based on a specific language.
`from_tiktoken_encoder`([encoding_name, ...])	Text splitter that uses tiktoken encoder to count length.
`get_separators_for_language`(language)	Retrieve a list of separators specific to the given language.
`split_documents`(documents)	Split documents.
`split_text`(text)	Split text into chunks.
`transform_documents`(documents, **kwargs)	Transform sequence of documents by splitting them.

__init__(

separators: list[str] | None = None,

chunk_size: int = 2000,

chunk_overlap: int = 0,

**kwargs: Any,

) → None[source]#

Initialize the JS Framework text splitter.

Parameters:

separators (list[str] | None) – Optional list of custom separator strings to use
chunk_size (int) – Maximum size of chunks to return
chunk_overlap (int) – Overlap in characters between chunks
**kwargs (Any) – Additional arguments to pass to parent class

Return type:

None

async atransform_documents(

documents: Sequence[Document],

**kwargs: Any,

) → Sequence[Document]#

Asynchronously transform a list of documents.

Parameters:

documents (Sequence[Document]) – A sequence of Documents to be transformed.
kwargs (Any)

Returns:

A sequence of transformed Documents.

Return type:

Sequence[Document]

create_documents( texts: list[str], metadatas: list[dict[Any, Any]] | None = None, ) → list[Document]#

Create documents from a list of texts.

Parameters:

texts (list[str])
metadatas (list[dict[Any, Any]] | None)

Return type:

list[Document]

classmethod from_huggingface_tokenizer(

tokenizer: PreTrainedTokenizerBase,

**kwargs: Any,

) → TextSplitter#

Text splitter that uses HuggingFace tokenizer to count length.

Parameters:

tokenizer (PreTrainedTokenizerBase)
kwargs (Any)

Return type:

TextSplitter

classmethod from_language(

language: Language,

**kwargs: Any,

) → RecursiveCharacterTextSplitter#

Return an instance of this class based on a specific language.

This method initializes the text splitter with language-specific separators.

Parameters:

language (Language) – The language to configure the text splitter for.
**kwargs (Any) – Additional keyword arguments to customize the splitter.

Returns:

An instance of the text splitter configured for the specified language.

Return type:

RecursiveCharacterTextSplitter

classmethod from_tiktoken_encoder(

encoding_name: str = 'gpt2',

model_name: str | None = None,

allowed_special: Literal['all'] | AbstractSet[str] = {},

disallowed_special: Literal['all'] | Collection[str] = 'all',

**kwargs: Any,

) → Self#

Text splitter that uses tiktoken encoder to count length.

Parameters:

encoding_name (str)
model_name (Optional[str])
allowed_special (Union[Literal['all'], AbstractSet[str]])
disallowed_special (Union[Literal['all'], Collection[str]])
kwargs (Any)

Return type:

Self

static get_separators_for_language( language: Language, ) → list[str]#

Retrieve a list of separators specific to the given language.

Parameters:: language (Language) – The language for which to get the separators.
Returns:: A list of separators appropriate for the specified language.
Return type:: List[str]

split_documents( documents: Iterable[Document], ) → list[Document]#

Split documents.

Parameters:: documents (Iterable[Document])
Return type:: list[Document]

split_text(text: str) → list[str][source]#

Split text into chunks.

This method splits the text into chunks by:

Extracting unique opening component tags using regex
Creating separators list with extracted tags and JS separators
Splitting the text using the separators by calling the parent class method

Parameters:: text (str) – String containing code to split
Returns:: List of text chunks split on component and JS boundaries
Return type:: list[str]

transform_documents(

documents: Sequence[Document],

**kwargs: Any,

) → Sequence[Document]#

Transform sequence of documents by splitting them.

Parameters:

documents (Sequence[Document])
kwargs (Any)

Return type:

Sequence[Document]