emrpy.data.loaders

Data Loading Utilities

Functions for loading CSV and Parquet files with support for both pandas and polars, lazy loading, and sampling options.

Functions

load_csv(file_path[, engine, lazy, ...])

Load CSV files with pandas or polars backend.

load_parquet(file_path[, engine, lazy, ...])

Load Parquet files with pandas or polars backend.

emrpy.data.loaders.load_csv(file_path, engine='pandas', lazy=False, sample_frac=None, sample_n=None, **kwargs)

Load CSV files with pandas or polars backend.

Return type:

Union[DataFrame, DataFrame, LazyFrame]

Parameters:

file_pathstr or Path

Path to the CSV file

enginestr, default “pandas”

Backend to use: “pandas” or “polars”

lazybool, default False

Whether to use lazy loading (polars only, ignored for pandas)

sample_fracfloat, optional

Fraction of data to sample (0.0 to 1.0)

sample_nint, optional

Number of rows to sample (takes precedence over sample_frac)

**kwargs

Additional arguments passed to pandas.read_csv() or polars.read_csv()

Returns:

: DataFrame or LazyFrame depending on engine and lazy parameter

Examples:

>>> # Load with pandas
>>> df = load_csv("data.csv")
>>> # Load with polars (lazy)
>>> df = load_csv("data.csv", engine="polars", lazy=True)
>>> # Sample 10% of data
>>> df = load_csv("data.csv", sample_frac=0.1)
>>> # Sample 1000 rows
>>> df = load_csv("data.csv", sample_n=1000)
emrpy.data.loaders.load_parquet(file_path, engine='pandas', lazy=False, sample_frac=None, sample_n=None, **kwargs)

Load Parquet files with pandas or polars backend.

Return type:

Union[DataFrame, DataFrame, LazyFrame]

Parameters:

file_pathstr or Path

Path to the Parquet file

enginestr, default “pandas”

Backend to use: “pandas” or “polars”

lazybool, default False

Whether to use lazy loading (polars only, ignored for pandas)

sample_fracfloat, optional

Fraction of data to sample (0.0 to 1.0)

sample_nint, optional

Number of rows to sample (takes precedence over sample_frac)

**kwargs

Additional arguments passed to pandas.read_parquet() or polars.read_parquet()

Returns:

: DataFrame or LazyFrame depending on engine and lazy parameter

Examples:

>>> # Load with pandas
>>> df = load_parquet("data.parquet")
>>> # Load with polars (lazy)
>>> df = load_parquet("data.parquet", engine="polars", lazy=True)
>>> # Sample 5% of data
>>> df = load_parquet("data.parquet", sample_frac=0.05)