Load Data Frames
Set up environment
Load a JSON-serialisable data frame
Call tools.data_frame()
to load a data frame from a CSV/excel file:
Athena returns a simple pandas DataFrame
representation with the default parsing options. You can adjust the following options:
row_limit: int
number of rows to load,index_column: int
column to use as an index,columns: list[str | int]
indices or names of columns to include,sheet_name: str | int
name of the sheet to load, only applicable to Excel filesseparator: str
separator to use when parsing, only applicable to CSV files
For example, when working with large datasets, it might be beneficial to first examine at the initial five rows:
Load a large or complex data frame
The tools.data_frame()
method is sufficient for handling well-formatted,
medium-sized data frames and provides interface that is agnostic to the SDK version
(a sister method is available in the TypeScript SDK).
However, if your Excel files include values that cannot be JSON-serialized,
are serializable with a loss of precision, or contain additional metadata,
you may prefer to use tools.read_data_frame()
method.
This method skips the JSON serialization step and provides a raw byte stream
to the pandas read_csv
or read_excel
methods, as appropriate.
The keyword arguments provided to read_data_frame
will be passed to
the underlying read_csv
/read_excel
, depending on the file type.
Load a data frame with another package
If you prefer to use another data frame implementation, you can access the
raw bytes stream object using the tools.get_file()
method, which accepts
a single argument - the document identifier. The resulting object complies
with the io.BytesIO
interface and can be used with most data frame libraries,
for example: