Invoke

Beta
POST
/api/v0/tools/structured-data-extractor/invoke

Extract structured data.

tl;dr:

  • pass a valid JSON schema in json_schema
  • pass the page chunks as a list of Chunk objects, by default: {"type": "text", "content": "..."}
  • leave all other fields as default

Detailed configuration (only relevant for complex use cases):

The structured data extractor’s architecture follows the map-reduce pattern, where the asset is divided into chunks, the schema is extracted from each chunk, and the chunks are then reduced to a single structured data object.

In some applications, you may not want to:

  • map (if your input asset is small enough)
  • reduce (if your output object is large enough that it will overflow the output length; if you’re extracting a long list of entities; if youre ) to extract all instances of the schema).

You can configure these behaviors with the map and reduce fields.

Request

This endpoint expects an object.
chunkslist of objectsRequired

The chunks from which to extract structured data.

json_schemamap from strings to anyRequired

The JSON schema to use for validation (version draft 2020-12). See the docs here.

chunk_messageslist of objectsOptional

The prompt to use for the data extraction over each individual chunk. It must be a list of messages. The chunk content will be appended as a list of human messages.

reducebooleanOptional

If map, whether to reduce the chunks to a single structured object (true) or return the full list (false). Use True unless you want to preserve duplicates from each page or expect the object to overflow the output context.

reduce_messageslist of objectsOptional

The prompt to use for the reduce steps. It must be a list of messages. The two extraction attempts will be appended as a list of human messages.

Response

This endpoint returns an object.
chunk_by_chunk_datalist of objectsOptional

The extracted structured data for each chunk. A list where each element is guaranteed to match json_schema.

reduced_datamap from strings to anyOptional

If reduce is True, the reduced structured data, otherwise null. Guaranteed to match json_schema.

Built with