src.preprocessing.extract_important_sentences¶
A TransformationBlock that extracts the most important sentences from the data.
Classes
A TransformationBlock that extracts the most important sentences from the data. |
- class src.preprocessing.extract_important_sentences.ExtractImportantSentences[source]¶
A TransformationBlock that extracts the most important sentences from the data.
Expects a dataframe with a full_text column, and gives back the most important sentences in a summary column.
- custom_transform(data, **transform_args)[source]¶
Extract the most important sentences from the data.
- Parameters:
data (
DataFrame
) – A pandas dataframe with a full_text column.transform_args (
Never
) – [UNUSED] Additional keyword arguments.
- Return type:
DataFrame
- Returns:
A dataframe with the most important sentences in a summary column.
- merge_whitespace(data)[source]¶
Merge the whitespace in the data.
- Parameters:
data (
DataFrame
) – a dataframe with a full_text column.- Return type:
DataFrame
- Returns:
the dataframe with merged newlines and spaces.
- tokenize_sentences(data)[source]¶
Tokenize the sentences in the data.
- Parameters:
data (
DataFrame
) – a dataframe with a filtered_text column.- Return type:
DataFrame
- Returns:
a dataframe where the filtered_text column is a list of sentences.