src.preprocessing.extract_important_sentences¶

A TransformationBlock that extracts the most important sentences from the data.

Classes

A TransformationBlock that extracts the most important sentences from the data.

class src.preprocessing.extract_important_sentences.ExtractImportantSentences[source]¶

A TransformationBlock that extracts the most important sentences from the data.

Expects a dataframe with a full_text column, and gives back the most important sentences in a summary column.

custom_transform(data, **transform_args)[source]¶

Extract the most important sentences from the data.

Parameters:

Return type:

DataFrame

Returns:

A dataframe with the most important sentences in a summary column.

merge_whitespace(data)[source]¶

Merge the whitespace in the data.

tokenize_sentences(data)[source]¶

Tokenize the sentences in the data.

Parameters:: data (DataFrame) – a dataframe with a filtered_text column.
Return type:: DataFrame
Returns:: a dataframe where the filtered_text column is a list of sentences.

adjust_summary_size(num_sentences)[source]¶

Adjust dynamically the number of sentences for the summary.

extract_important_sentences(data)[source]¶

Extract the important sentences from the data.

Parameters:: data (DataFrame) – a dataframe with a filtered_text column, which is a list of sentences.
Return type:: DataFrame
Returns:: a dataframe with the most important sentences in the summary column.