Writing to multiple files

Exporting data (multiple files)

When dealing with big data, it is often inconvenient to store all the data in a single huge file of several GBs (or worse tens, or hundreds of GBs). To help with this issue we added some functions to allow you to save your DataFrame over several_files:

df.to_parquet_folder(
    path="s3://my_bucket/my_key/part_*.parquet",
    num_rows_per_file=1_000_000,
)

Here by setting num_rows_per_file we specify the number of rows of each resulting file, or you could also specify the number of files or the in-memory size per file.

Check out the API for all parameters or other formats (such as to_csv_folder).

Last updated