Utils
- stanscofi.utils.compute_sparsity(df)
Computes the sparsity number of a collaborative filtering dataset
…
Parameters
- dfpandas.DataFrame of shape (n_items, n_users)
the matrix of ratings where unknown matchings are denoted with 0
Returns
- sparsityfloat
the percentage of non missing values in the matrix of ratings
- stanscofi.utils.load_dataset(model_name, save_folder='./', sep_feature='-')
Loads a drug repurposing dataset
…
Parameters
- model_namestr
the name of the dataset to load. Should belong to the following list: [“Gottlieb”, “DNdataset”, “Cdataset”, “LRSSL”, “PREDICT_Gottlieb”, “TRANSCRIPT”, “PREDICT”]
- save_folderstr
the path to the folder where dataset-related files are or will be stored
Returns
- dataset_didictionary
a dictionary where key “ratings” contains the drug-disease matching pandas.DataFrame of shape (n_drugs, n_diseases) (where missing values are denoted by 0), key “users” correspond to the disease pandas.DataFrame of shape (n_disease_features, n_diseases), and “items” correspond to the drug feature pandas.DataFrame of shape (n_drug_features, n_drugs)
- stanscofi.utils.matrix2ratings(df, user_col='user', item_col='item', rating_col='rating')
Converts a matrix into a list of ratings
…
Parameters
- dfpandas.DataFrame of shape (n_items, n_users)
the matrix of ratings in {-1, 1, 0} where unknown matchings are denoted with 0
- user_colstr
column denoting users
- item_colstr
column denoting items
- rating_colstr
column denoting ratings in {-1, 0, 1}
Returns
- ratingspandas.DataFrame of shape (n_ratings, 3)
the list of known ratings where the first column correspond to users, second to items, third to ratings
- stanscofi.utils.merge_ratings(rating_dfs, user_col, item_col, rating_col)
Merges rating lists from several sources by solving conflicts. Conflicting ratings are resolved as follows: if there is at least one negative rating (-1) reported for a (drug, disease) pair, then the final rating is negative (-1); if there is at least one positive rating (1) and no negative rating (-1) reported, then the final rating is positive (1)
…
Parameters
- rating_dfslist of pandas.DataFrame of shape (n_ratings, 3)
the list of rating lists where one column (of name user_col) is associated with users, one column (of name item_col) is associated with items, and one column (of name rating_col) is associated with ratings in {-1, 0, 1}
- user_colstr
column denoting users
- item_colstr
column denoting items
- rating_colstr
column denoting ratings in {-1, 0, 1}
verbose : bool
Returns
- rating_dfpandas.DataFrame of shape (n_ratings, 3)
the list of rating lists where one column (of name user_col) is associated with users, one column (of name item_col) is associated with items, and one column (of name rating_col) is associated with ratings in {-1, 0, 1}
- stanscofi.utils.print_dataset(ratings, user_col, item_col, rating_col)
Prints values of a drug repurposing dataset
…
Parameters
- ratingspandas.DataFrame of shape (n_ratings, 3)
the list of ratings with columns user_col, item_col, rating_col
- user_colstr
column denoting users
- item_colstr
column denoting items
- rating_colstr
column denoting ratings in {-1, 0, 1}
Returns
None
Prints
The number of items/drugs, users/diseases, and the number of positive (1), negative (-1) and unknown (0) matchings.
- stanscofi.utils.ratings2matrix(ratings, user_col, item_col, rating_col)
Converts a list of ratings into a matrix
…
Parameters
- ratingspandas.DataFrame of shape (n_ratings, 3)
the list of known ratings where the first column (user_col) correspond to users, second (item_col) to items, third (rating_col) to ratings in {-1,0,1}
- user_colstr
column denoting users
- item_colstr
column denoting items
- rating_colstr
column denoting ratings in {-1, 0, 1}
Returns
- dfpandas.DataFrame of shape (n_items, n_users)
the matrix of ratings in {-1, 1, 0} where unknown matchings are denoted with 0