src.preprocessing.compute_topical_distributions¶

Generates topical distributions.

Classes

Initialize the topical distribution pipeline block.

class src.preprocessing.compute_topical_distributions.TopicalDistribution(pretrained_model_name_or_path, dictionary_name_or_path)[source]¶

Initialize the topical distribution pipeline block.

Parameters:

custom_transform(data, **transform_args)[source]¶

Ensure the input Dataframe has the relevant columns.

Then computes the topical distributions for each document.

Parameters:

Return type:

DataFrame

Returns:

The transformed data.

preprocess_documents(docs)[source]¶

Preprocess a list of documents.

Tokenize, remove stopwords, and lemmatize the documents.

get_topic_dist(doc_bow)[source]¶

Compute the topical distribution for a given document.

Parameters:: doc_bow (list[tuple[int, int]]) – BoW representation of the document.
Return type:: ndarray[Any, dtype[float64]]
Returns:: The topical distribution.