Invoke
Extract structured data.
tl;dr:
- pass a valid JSON schema in
json_schema
- pass the page chunks as a list of
Chunk
objects, by default:{"type": "text", "content": "..."}
- leave all other fields as default
Detailed configuration (only relevant for complex use cases):
The structured data extractor’s architecture follows the map-reduce pattern, where the asset is divided into chunks, the schema is extracted from each chunk, and the chunks are then reduced to a single structured data object.
In some applications, you may not want to:
- map (if your input asset is small enough)
- reduce (if your output object is large enough that it will overflow the output length; if you’re extracting a long list of entities; if youre ) to extract all instances of the schema).
You can configure these behaviors with the map
and reduce
fields.
Headers
Request
The chunks from which to extract structured data.
The JSON schema to use for validation (version draft 2020-12). See the docs here.
The prompt to use for the data extraction over each individual chunk. It must be a list of messages. The chunk content will be appended as a list of human messages.
If map
, whether to reduce the chunks to a single structured object (true) or return the full list (false). Use True unless you want to preserve duplicates from each page or expect the object to overflow the output context.
The prompt to use for the reduce steps. It must be a list of messages. The two extraction attempts will be appended as a list of human messages.
Response
Successful Response
The extracted structured data for each chunk. A list where each element is guaranteed to match json_schema
.
If reduce is True, the reduced structured data, otherwise null. Guaranteed to match json_schema
.