From/to pandas


Building a Terality DataFrame from a pandas DataFrame

If you have a pandas DataFrame or Series, you can import it into Terality with the from_pandas method. You can also convert to Terality DataFrame to a pandas DataFrame with to_pandas.

This is useful when you want to import data with a format not yet supported by Terality: read it with pandas first, then call from_pandas.

import terality as te
import pandas as pd

df_pd = pd.DataFrame({"a": ["he", "llo", "wor", "ld", "!"]})
df_te = te.DataFrame.from_pandas(df_pd)

Loading a pandas DataFrame in memory with for example pd.read_parquet and then using te.Dataframe.from_pandas is less performant than directly reading the file with Terality with for examplete.read_parquet. Prefer reading directly from a file when possible.

import pandas as pd
import terality as te

path = "..."

# Read the DataFrame in memory locally...
df_pd = pd.read_parquet(path)
# ...then send it over the network to Terality.
df_te = te.DataFrame.from_pandas(df_pd)

# Better option:
# Files are read only on Terality cluster. Local memory is not used.
# If the files are stored on a cloud provider, does an efficient
# cloud to cloud transfer not limited by your own bandwidth.
df_te = te.read_parquet(path)

Retrieving a Terality data structure as a pandas data structure in memory

If you want to continue working on your data in Python, for example doing Machine Learning, you can export back the Terality DataFrame to a pandas DataFrame locally and start using your favorite ML framework.

For this option, you must ensure that the DataFrame is small enough to fit in memory. You're leaving the world of Terality's unlimited memory!

# Convert back the Terality DataFrame to a pandas DataFrame
df_pd_2 = df_te.to_pandas()

# we can check that we recovered our original pandas DataFrame
pd.assert(df_pd, df_pd2)

Last updated