From/to pandas
Blob
Building a Terality DataFrame from a pandas DataFrame
If you have a pandas
DataFrame or Series, you can import it into Terality with the from_pandas
method. You can also convert to Terality DataFrame to a pandas
DataFrame with to_pandas
.
This is useful when you want to import data with a format not yet supported by Terality: read it with pandas first, then call from_pandas
.
import terality as te
import pandas as pd
df_pd = pd.DataFrame({"a": ["he", "llo", "wor", "ld", "!"]})
df_te = te.DataFrame.from_pandas(df_pd)
Loading a pandas DataFrame in memory with for example pd.read_parquet
and then using te.Dataframe.from_pandas
is less performant than directly reading the file with Terality with for examplete.read_parquet
. Prefer reading directly from a file when possible.
import pandas as pd
import terality as te
path = "..."
# Read the DataFrame in memory locally...
df_pd = pd.read_parquet(path)
# ...then send it over the network to Terality.
df_te = te.DataFrame.from_pandas(df_pd)
# Better option:
# Files are read only on Terality cluster. Local memory is not used.
# If the files are stored on a cloud provider, does an efficient
# cloud to cloud transfer not limited by your own bandwidth.
df_te = te.read_parquet(path)
Retrieving a Terality data structure as a pandas data structure in memory
If you want to continue working on your data in Python, for example doing Machine Learning, you can export back the Terality DataFrame to a pandas DataFrame locally and start using your favorite ML framework.
For this option, you must ensure that the DataFrame is small enough to fit in memory. You're leaving the world of Terality's unlimited memory!
# Convert back the Terality DataFrame to a pandas DataFrame
df_pd_2 = df_te.to_pandas()
# we can check that we recovered our original pandas DataFrame
pd.assert(df_pd, df_pd2)
Last updated
Was this helpful?