Reading/writing to storage

Importing data

Terality uses the same methods as pandas to load data, such as read_csv, read_parquet and similar. Example:
1
import terality as te
2
3
# Load all parquet files at this S3 location
4
df = te.read_parquet("s3://my-datasets/path/to/objects/")
5
6
# Load a CSV file from disk
7
df = te.read_csv("/path/to/my/data.csv")
Copied!
You can import data just as you would do using pandas, for example using a read_csv or a read_parquet on your local file or your cloud storage (such as AWS S3). You can find the currently supported functions in the Data formats section.
You can also read multiples files by specifying a folder path to the read method. This is supported for the following functions
  • read_csv
  • read_parquet
  • read_excel
  • read_json
Do not hesitate to contact us if you want us to implement any other read function.
In addition, Terality provides a way to convert pandas objects into Terality structures, using the from_pandas method.

Exporting data

If you're done working on your DataFrame for the moment, or it's still too big to be held in memory on your computer, you may want to download and save it back on your computer's drive/cloud storage. To do this, you can simply use the same API as pandas:
1
# Forinstance, for AWS S3 and parquet:
2
df.to_parquet("s3://my_bucket/my_key/my_data.parquet")
Copied!
You can also export to multiple files using to_csv_folder or to_parquet_folder from the Terality API.
Best practice: we recommend adopting a modern and scalable data workflow by using:
  • a cloud storage rather than local storage (to avoid having transfers being limited by your bandwidth)
  • a modern, fast, scalable and powerful data format such as parquet, rather than CSV