Write to multiple files
Terality provides two additional methods that allow you to store DataFrame
into multiple files as csv or parquet : DataFrame.to_csv_folder
or DataFrame.to_parquet_folder
.
In addition to the original parameters of their respective counterparts to_csv
and to_parquet
, these methods provide four additional parameters so you can choose how to split and store the DataFrame
in several files.
path/path_or_buf:
The location to store the files. Basename must contain the special character *
that will be replaced by the file number. Example : path="path/to/folder/file_name_*.parquet"
.
num_files:
Optional[int]
-> The number of output files.
num_rows_per_file: Optional[int]
-> The number of rows in each output file. Total number of files will be deduced from the DataFrame rows number.
in_memory_file_size: Optional[int]
-> The in-memory size in megabytes of each chunk of the input DataFrame to save. Total number of files will be deduced from the DataFrame memory size. Use chunks of 1GB (maximum size allowed) if none of num_files
, num_rows_per_file
or in_memory_file_size
is filled.
with_leading_zeros: Optional[bool]
-> Whether file names numbers should have leading zeros so all file names have an identical length. Default False
.
Other parameters are strictly identical to DataFrame.to_csv and DataFrame.to_parquet.
Only one of num_files
, num_rows_per_file
or in_memory_file_size
can be provided. If none of them are provided, the DataFrame is stored in chunks of 1GB.
Here is an example on how to use to_csv_folder
. Syntax is identical for to_parquet_folder
.
Last updated