src.preprocessing.create_events

Cluster the documents based on time and event similarity.

Classes

CreateEvents([period])

Create a narrative graph from the clusters and the memberships.

class src.preprocessing.create_events.CreateEvents(period=4)[source]

Create a narrative graph from the clusters and the memberships.

Parameters:

period (int) – period around the discard document

period: int = 4
find_most_similar(candidates, target)[source]

Find the most similar candidate around a time period of target.

Parameters:
  • candidates (ndarray[Any, dtype[float64]]) – the embeddings of the candidates.

  • target (ndarray[Any, dtype[float64]]) – the embedding of the discarded document.

Return type:

int

Returns:

the index of the most similar candidate.

modify_adjacency(data)[source]

Transform the adjacency list with the clusters.

Parameters:

data (DataFrame) – pandas dataframe with an “adj_list” column.

Return type:

DataFrame

Returns:

The data with the modified adjacency list.

custom_transform(data, **transform_args)[source]

Cluster the documents based on the event and the date similarity.

Parameters:
  • data (DataFrame) – The data to transform.

  • transform_args (Never) – [UNUSED] Additional keyword arguments.

Return type:

DataFrame

Returns:

The transformed data.