emrpy.data.loaders
Data Loading Utilities
Functions for loading CSV and Parquet files with support for both pandas and polars, lazy loading, and sampling options.
Functions
|
Load CSV files with pandas or polars backend. |
|
Load Parquet files with pandas or polars backend. |
- emrpy.data.loaders.load_csv(file_path, engine='pandas', lazy=False, sample_frac=None, sample_n=None, **kwargs)
Load CSV files with pandas or polars backend.
- Return type:
Union
[DataFrame
,DataFrame
,LazyFrame
]
Parameters:
- file_pathstr or Path
Path to the CSV file
- enginestr, default “pandas”
Backend to use: “pandas” or “polars”
- lazybool, default False
Whether to use lazy loading (polars only, ignored for pandas)
- sample_fracfloat, optional
Fraction of data to sample (0.0 to 1.0)
- sample_nint, optional
Number of rows to sample (takes precedence over sample_frac)
- **kwargs
Additional arguments passed to pandas.read_csv() or polars.read_csv()
Returns:
: DataFrame or LazyFrame depending on engine and lazy parameter
Examples:
>>> # Load with pandas >>> df = load_csv("data.csv")
>>> # Load with polars (lazy) >>> df = load_csv("data.csv", engine="polars", lazy=True)
>>> # Sample 10% of data >>> df = load_csv("data.csv", sample_frac=0.1)
>>> # Sample 1000 rows >>> df = load_csv("data.csv", sample_n=1000)
- emrpy.data.loaders.load_parquet(file_path, engine='pandas', lazy=False, sample_frac=None, sample_n=None, **kwargs)
Load Parquet files with pandas or polars backend.
- Return type:
Union
[DataFrame
,DataFrame
,LazyFrame
]
Parameters:
- file_pathstr or Path
Path to the Parquet file
- enginestr, default “pandas”
Backend to use: “pandas” or “polars”
- lazybool, default False
Whether to use lazy loading (polars only, ignored for pandas)
- sample_fracfloat, optional
Fraction of data to sample (0.0 to 1.0)
- sample_nint, optional
Number of rows to sample (takes precedence over sample_frac)
- **kwargs
Additional arguments passed to pandas.read_parquet() or polars.read_parquet()
Returns:
: DataFrame or LazyFrame depending on engine and lazy parameter
Examples:
>>> # Load with pandas >>> df = load_parquet("data.parquet")
>>> # Load with polars (lazy) >>> df = load_parquet("data.parquet", engine="polars", lazy=True)
>>> # Sample 5% of data >>> df = load_parquet("data.parquet", sample_frac=0.05)