create_openai_data_generator#

langchain_experimental.tabular_synthetic_data.openai.create_openai_data_generator(output_schema: Dict[str, Any] | Type[BaseModel], llm: ChatOpenAI, prompt: BasePromptTemplate, output_parser: BaseLLMOutputParser | None = None, **kwargs: Any) SyntheticDataGenerator[source]#

Create an instance of SyntheticDataGenerator tailored for OpenAI models.

This function creates an LLM chain designed for structured output based on the provided schema, language model, and prompt template. The resulting chain is then used to instantiate and return a SyntheticDataGenerator.

Parameters:
  • output_schema (Union[Dict[str, Any], Type[BaseModel]]) – Schema for expected

  • a (output. This can be either a dictionary representing a valid JsonSchema or)

  • class. (Pydantic BaseModel)

  • llm (ChatOpenAI) – OpenAI language model to use.

  • prompt (BasePromptTemplate) – Template to be used for generating prompts.

  • output_parser (Optional[BaseLLMOutputParser], optional) – Parser for

  • provided (processing model outputs. If none is)

  • inferred (a default will be)

  • types. (from the function)

  • kwargs (Any) – Additional keyword arguments to be passed to

  • create_structured_output_chain.

Return type:

SyntheticDataGenerator

Returns: SyntheticDataGenerator: An instance of the data generator set up with the constructed chain.

Usage:

To generate synthetic data with a structured output, first define your desired output schema. Then, use this function to create a SyntheticDataGenerator instance. After obtaining the generator, you can utilize its methods to produce the desired synthetic data.

Examples using create_openai_data_generator