For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
LoginBook a Demo
  • Getting Started
    • Athena SDK Quickstart
  • Database API
    • Database Filters & CRUD
  • Python Guides
    • Build with Agents
    • Use Models Directly
    • Load Data Frames
    • Upload Files
    • Create Assets
    • Structured Output
    • Long-Running AOP Execution
  • TypeScript Guides
    • Build with Agents
    • Working with Assets
    • Use Models Directly
    • Execute AOPs
    • Sheets API
    • Browser JavaScript (ESM)
    • UI Registry (Assistant.js)
    • Upload Files
    • Create Assets
    • Structured Output
  • API Reference
Logo
LoginBook a Demo
Python Guides

Structured Output

Was this page helpful?
Previous

Long-Running AOP Execution

Next
Built with

Use the Structured Data Extractor to extract structured data from text chunks using JSON schemas. This is useful for parsing unstructured text into well-defined data structures.

Key features:

  • Define output structure using JSON Schema (draft 2020-12)
  • Process multiple text chunks with map-reduce pattern
  • Get validated, structured output matching your schema
1

Install Package

1!pip install -U athena-intelligence
2

Set Up Client

1from athena import Athena, Chunk, ChunkContentItem_Text
2
3athena = Athena(api_key="<YOUR_API_KEY>")
3

Define Your Schema

Create a JSON schema that describes the structure you want to extract:

1person_schema = {
2 "title": "Person",
3 "description": "Information about a person",
4 "type": "object",
5 "properties": {
6 "name": {"type": "string"},
7 "age": {"type": "integer"},
8 "email": {"type": "string"}
9 },
10 "required": ["name"]
11}
4

Extract Structured Data

Pass text chunks and your schema to the structured data extractor:

1response = athena.tools.structured_data_extractor.invoke(
2 chunks=[
3 Chunk(
4 chunk_id="1",
5 content=[
6 ChunkContentItem_Text(
7 text="John Smith is a 35 year old software developer. Contact him at john.smith@example.com"
8 )
9 ]
10 ),
11 Chunk(
12 chunk_id="2",
13 content=[
14 ChunkContentItem_Text(
15 text="Jane Doe is a 28 year old data scientist. Her email is jane.doe@example.com"
16 )
17 ]
18 )
19 ],
20 json_schema=person_schema,
21 reduce=True
22)
23
24print(response.reduced_data)
1{
2 "name": "John Smith",
3 "age": 35,
4 "email": "john.smith@example.com"
5}
5

Access Chunk-by-Chunk Results

To get extracted data from each chunk individually, set reduce=False:

1response = athena.tools.structured_data_extractor.invoke(
2 chunks=[
3 Chunk(
4 chunk_id="1",
5 content=[
6 ChunkContentItem_Text(
7 text="John Smith is a 35 year old software developer."
8 )
9 ]
10 ),
11 Chunk(
12 chunk_id="2",
13 content=[
14 ChunkContentItem_Text(
15 text="Jane Doe is a 28 year old data scientist."
16 )
17 ]
18 )
19 ],
20 json_schema=person_schema,
21 reduce=False
22)
23
24for chunk_result in response.chunk_by_chunk_data:
25 print(f"Chunk {chunk_result.chunk_id}: {chunk_result.data}")
Chunk 1: {'name': 'John Smith', 'age': 35}
Chunk 2: {'name': 'Jane Doe', 'age': 28}