src.load_data

Load data from CSV files and PDF files using Polars and OCR.

Functions

load_dossier(raw_data_path, dossier_id)

Load and merge data from CSV files using Polars, with improvements for efficiency.

load_pdf_dossier(upload_data_path)

Load and transform the data from PDF files using OCR.

src.load_data.load_dossier(raw_data_path, dossier_id)[source]

Load and merge data from CSV files using Polars, with improvements for efficiency.

Parameters:
  • raw_data_path (Path) – The path to the raw data files.

  • dossier_id (str) – The dossier ID to filter the data.

Return type:

DataFrame

Returns:

A merged dataframe with all data filtered by the selected dossier ID.

src.load_data.load_pdf_dossier(upload_data_path)[source]

Load and transform the data from PDF files using OCR.

Parameters:

upload_data_path (Path) – The path to the raw upload data files.

Return type:

DataFrame

Returns:

A dataframe with the extracted text from the PDF files.