pandas.DataFrame.to_parquet¶
-
DataFrame.to_parquet(fname, engine='auto', compression='snappy', index=None, partition_cols=None, **kwargs)[source]¶ Write a DataFrame to the binary parquet format.
New in version 0.21.0.
This function writes the dataframe as a parquet file. You can choose different parquet backends, and have the option of compression. See the user guide for more details.
- Parameters
fname : str
File path or Root Directory path. Will be used as Root Directory path while writing a partitioned dataset.
Changed in version 0.24.0.
engine : {‘auto’, ‘pyarrow’, ‘fastparquet’}, default ‘auto’
Parquet library to use. If ‘auto’, then the option
io.parquet.engineis used. The defaultio.parquet.enginebehavior is to try ‘pyarrow’, falling back to ‘fastparquet’ if ‘pyarrow’ is unavailable.compression : {‘snappy’, ‘gzip’, ‘brotli’, None}, default ‘snappy’
Name of the compression to use. Use
Nonefor no compression.index : bool, default None
If
True, include the dataframe’s index(es) in the file output. IfFalse, they will not be written to the file. IfNone, the behavior depends on the chosen engine.New in version 0.24.0.
partition_cols : list, optional, default None
Column names by which to partition the dataset Columns are partitioned in the order they are given
New in version 0.24.0.
**kwargs
Additional arguments passed to the parquet library. See pandas io for more details.
See also
read_parquetRead a parquet file.
DataFrame.to_csvWrite a csv file.
DataFrame.to_sqlWrite to a sql table.
DataFrame.to_hdfWrite to hdf.
Notes
This function requires either the fastparquet or pyarrow library.
Examples
>>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]}) >>> df.to_parquet('df.parquet.gzip', ... compression='gzip') >>> pd.read_parquet('df.parquet.gzip') col1 col2 0 1 3 1 2 4