src.preprocessing.create_events¶

Cluster the documents based on time and event similarity.

Classes

Create a narrative graph from the clusters and the memberships.

class src.preprocessing.create_events.CreateEvents(period=4)[source]¶

Create a narrative graph from the clusters and the memberships.

find_most_similar(candidates, target)[source]¶

Find the most similar candidate around a time period of target.

Parameters:

candidates (ndarray[Any, dtype[float64]]) – the embeddings of the candidates.
target (ndarray[Any, dtype[float64]]) – the embedding of the discarded document.

Return type:

int

Returns:

the index of the most similar candidate.

modify_adjacency(data)[source]¶

Transform the adjacency list with the clusters.

Parameters:: data (DataFrame) – pandas dataframe with an “adj_list” column.
Return type:: DataFrame
Returns:: The data with the modified adjacency list.

custom_transform(data, **transform_args)[source]¶

Cluster the documents based on the event and the date similarity.

Parameters:

Return type:

DataFrame

Returns:

The transformed data.