For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
LoginBook a Demo
  • Getting Started
    • Athena SDK Quickstart
  • Database API
    • Database Filters & CRUD
  • Python Guides
    • Build with Agents
    • Use Models Directly
    • Load Data Frames
    • Upload Files
    • Create Assets
    • Structured Output
    • Long-Running AOP Execution
  • TypeScript Guides
    • Build with Agents
    • Working with Assets
    • Use Models Directly
    • Execute AOPs
    • Sheets API
    • Browser JavaScript (ESM)
    • UI Registry (Assistant.js)
    • Upload Files
    • Create Assets
    • Structured Output
  • API Reference
Logo
LoginBook a Demo
On this page
  • Set up environment
  • Load a JSON-serialisable data frame
  • Load a large or complex data frame
  • Load a data frame with another package
Python Guides

Load Data Frames

Was this page helpful?
Previous

Upload Files

Next
Built with

Set up environment

1!pip install -U athena-intelligence
1import os
2
3ATHENA_API_KEY = os.environ["ATHENA_API_KEY"]
4
5from athena.client import Athena
6
7athena = Athena(
8 api_key=ATHENA_API_KEY,
9)

Load a JSON-serialisable data frame

Call tools.data_frame() to load a data frame from a CSV/excel file:

1df = athena.tools.data_frame(
2 asset_id='doc_9249292-d118-42d3-95b4-00eccfe0754f'
3)
4df

Athena returns a simple pandas DataFrame representation with the default parsing options. You can adjust the following options:

  • row_limit: int number of rows to load,
  • index_column: int column to use as an index,
  • columns: list[str | int] indices or names of columns to include,
  • sheet_name: str | int name of the sheet to load, only applicable to Excel files
  • separator: str separator to use when parsing, only applicable to CSV files

For example, when working with large datasets, it might be beneficial to first examine at the initial five rows:

1df_head = athena.tools.data_frame(
2 asset_id='doc_9249292-d118-42d3-95b4-00eccfe0754f',
3 row_limit=5
4)
5df_head

Load a large or complex data frame

The tools.data_frame() method is sufficient for handling well-formatted, medium-sized data frames and provides interface that is agnostic to the SDK version (a sister method is available in the TypeScript SDK).

However, if your Excel files include values that cannot be JSON-serialized, are serializable with a loss of precision, or contain additional metadata, you may prefer to use tools.read_data_frame() method. This method skips the JSON serialization step and provides a raw byte stream to the pandas read_csv or read_excel methods, as appropriate.

The keyword arguments provided to read_data_frame will be passed to the underlying read_csv/read_excel, depending on the file type.

1df_head = athena.tools.read_data_frame(
2 asset_id='doc_9249292-d118-42d3-95b4-00eccfe0754f',
3 dtype={"a": np.float64, "b": np.int32}
4)
5df_head

Load a data frame with another package

If you prefer to use another data frame implementation, you can access the raw bytes stream object using the tools.get_file() method, which accepts a single argument - the document identifier. The resulting object complies with the io.BytesIO interface and can be used with most data frame libraries, for example:

1import polars as pl
2
3bytes_io = athena.tools.get_file(
4 asset_id='doc_9249292-d118-42d3-95b4-00eccfe0754f',
5)
6df = pl.read_csv(bytes_io)