Terality Docs
  • What is Terality?
  • Documentation
    • Quickstart
      • Setup
      • Tutorial
      • Next steps
    • User guide
      • Supported configurations
      • User dashboard
      • Importing and exporting data
        • Reading/writing to storage
        • Reading from multiple files
        • Writing to multiple files
        • Storage services
        • Data formats
        • From/to pandas
      • .apply() and passing callables
      • Caching
      • Best practices and anti-patterns
      • Upgrading your client version
      • Client configuration (CLI)
      • Support for external packages
      • Advanced topics
        • Data mutability and ownership: Terality vs pandas
    • API Reference
      • Conversion from/to pandas
      • Write to multiple files
    • Deploy Terality in your own AWS account
    • Releases
  • FAQ
    • Differences with
      • Terality and Pandas
      • Terality vs Spark
      • Terality vs Dask
    • Pricing
    • Security
    • Support & contact
    • Common setup issues
Powered by GitBook
On this page

Was this helpful?

  1. Documentation
  2. API Reference

Write to multiple files

Terality provides two additional methods that allow you to store DataFrame into multiple files as csv or parquet : DataFrame.to_csv_folder or DataFrame.to_parquet_folder.

In addition to the original parameters of their respective counterparts to_csv and to_parquet, these methods provide four additional parameters so you can choose how to split and store the DataFrame in several files.

DataFrame.to_csv_folder(
        path_or_buf=None, 
        num_files=None,  # new
        num_rows_per_file=None,  # new
        in_memory_file_size=None,  # new
        with_leading_zeros=False,  # new
        sep=',', 
        na_rep='', 
        float_format=None,    
        columns=None, 
        header=True, 
        index=True, 
        index_label=None, 
        mode='w', 
        encoding=None, 
        compression='infer', 
        quoting=None, 
        quotechar='"', 
        line_terminator=None, 
        chunksize=None, 
        date_format=None, 
        doublequote=True, 
        escapechar=None, 
        decimal='.', 
        errors='strict', 
        storage_options=None
) -> None
DataFrame.to_parquet_folder(
        path=None, 
        num_files=None, # new
        num_rows_per_file=None, # new
        in_memory_file_size=None, # new
        with_leading_zeros=False, # new
        engine='auto', 
        compression='snappy',
        index=None, 
        partition_cols=None, 
        storage_options=None, 
        **kwargs
) -> None

path/path_or_buf:The location to store the files. Basename must contain the special character * that will be replaced by the file number. Example : path="path/to/folder/file_name_*.parquet".

num_files: Optional[int] -> The number of output files.

num_rows_per_file: Optional[int] -> The number of rows in each output file. Total number of files will be deduced from the DataFrame rows number.

in_memory_file_size: Optional[int] -> The in-memory size in megabytes of each chunk of the input DataFrame to save. Total number of files will be deduced from the DataFrame memory size. Use chunks of 1GB (maximum size allowed) if none of num_files, num_rows_per_file or in_memory_file_size is filled.

with_leading_zeros: Optional[bool] -> Whether file names numbers should have leading zeros so all file names have an identical length. Default False.

Only one of num_files, num_rows_per_file or in_memory_file_size can be provided. If none of them are provided, the DataFrame is stored in chunks of 1GB.

Here is an example on how to use to_csv_folder. Syntax is identical for to_parquet_folder.

import terality as pd

df = pd.DataFrame({"A": range(10_000)})
df.to_csv_folder(path_or_buf="folder/file_name_*.csv", num_rows_per_file=4000)
# creates 3 files :
#   - folder/file_name_0.csv with 4000 rows.
#   - folder/file_name_1.csv with 4000 rows.
#   - folder/file_name_2.csv with 2000 rows.

PreviousConversion from/to pandasNextDeploy Terality in your own AWS account

Last updated 3 years ago

Was this helpful?

Other parameters are strictly identical to and .

DataFrame.to_csv
DataFrame.to_parquet