src.preprocessing¶

Preprocessing package.

Modules

`src.preprocessing.cluster_documents`	Compute the membership vectors for each cluster.
`src.preprocessing.cluster_explainer`	Generate a summary of the clusters.
`src.preprocessing.compute_layout`	Computes the x and y coordinates for the nodes in the graph, based on a story that each row is in.
`src.preprocessing.compute_topical_distributions`	Generates topical distributions.
`src.preprocessing.create_events`	Cluster the documents based on time and event similarity.
`src.preprocessing.extract_dates_regex`	Extract the creation dates from the full text of documents.
`src.preprocessing.extract_important_sentences`	A TransformationBlock that extracts the most important sentences from the data.
`src.preprocessing.filter_redundant_edges`	Filter redundant edges from the data.
`src.preprocessing.find_storylines`	Find the storylines in the data.
`src.preprocessing.generate_roberta_embedding`	Generate the embeddings for the data using a RoBERTa model.
`src.preprocessing.impute_dates`	Impute missing dates by filling them with the most similar embedding.
`src.preprocessing.linear_programming`	Perform the linear programming on the clusters.
`src.preprocessing.pdf_to_text`	Extract text from PDFs.