Undocumented
Function | calculate |
Calculate the cosine similarity between articles in a DataFrame using sklearn library. |
Function | process |
Process similarity pairs based on the given dataframe corpus and similarity dataframe. |
Calculate the cosine similarity between articles in a DataFrame using sklearn library. Args: df (pd.DataFrame): DataFrame containing articles with 'source', 'id', 'title', and 'text' columns. Returns: pd.DataFrame: DataFrame containing the cosine similarity scores between articles.
Process similarity pairs based on the given dataframe corpus and similarity dataframe. Args: df_corpus (pandas.DataFrame): The dataframe corpus containing unique IDs. similarity_df (pandas.DataFrame): The similarity dataframe. Returns: dict: A dictionary containing similarity pairs categorized by source and ID. The structure of the dictionary is as follows: { 'reddit': { 'source_id': [ { 'similar_source': 'similar_source', 'similar_id': 'similar_id', 'similarity': similarity_value }, ... ], ... }, 'arxiv': { 'source_id': [ { 'similar_source': 'similar_source', 'similar_id': 'similar_id', 'similarity': similarity_value }, ... ], ... } }