Utils

stanscofi.utils.compute_sparsity(df)

Computes the sparsity number of a collaborative filtering dataset

Parameters

dfpandas.DataFrame of shape (n_items, n_users)

the matrix of ratings where unknown matchings are denoted with 0

Returns

sparsityfloat

the percentage of non missing values in the matrix of ratings

stanscofi.utils.load_dataset(model_name, save_folder='./', sep_feature='-')

Loads a drug repurposing dataset

Parameters

model_namestr

the name of the dataset to load. Should belong to the following list: [“Gottlieb”, “DNdataset”, “Cdataset”, “LRSSL”, “PREDICT_Gottlieb”, “TRANSCRIPT”, “PREDICT”]

save_folderstr

the path to the folder where dataset-related files are or will be stored

Returns

dataset_didictionary

a dictionary where key “ratings” contains the drug-disease matching pandas.DataFrame of shape (n_drugs, n_diseases) (where missing values are denoted by 0), key “users” correspond to the disease pandas.DataFrame of shape (n_disease_features, n_diseases), and “items” correspond to the drug feature pandas.DataFrame of shape (n_drug_features, n_drugs)

stanscofi.utils.matrix2ratings(df, user_col='user', item_col='item', rating_col='rating')

Converts a matrix into a list of ratings

Parameters

dfpandas.DataFrame of shape (n_items, n_users)

the matrix of ratings in {-1, 1, 0} where unknown matchings are denoted with 0

user_colstr

column denoting users

item_colstr

column denoting items

rating_colstr

column denoting ratings in {-1, 0, 1}

Returns

ratingspandas.DataFrame of shape (n_ratings, 3)

the list of known ratings where the first column correspond to users, second to items, third to ratings

stanscofi.utils.merge_ratings(rating_dfs, user_col, item_col, rating_col)

Merges rating lists from several sources by solving conflicts. Conflicting ratings are resolved as follows: if there is at least one negative rating (-1) reported for a (drug, disease) pair, then the final rating is negative (-1); if there is at least one positive rating (1) and no negative rating (-1) reported, then the final rating is positive (1)

Parameters

rating_dfslist of pandas.DataFrame of shape (n_ratings, 3)

the list of rating lists where one column (of name user_col) is associated with users, one column (of name item_col) is associated with items, and one column (of name rating_col) is associated with ratings in {-1, 0, 1}

user_colstr

column denoting users

item_colstr

column denoting items

rating_colstr

column denoting ratings in {-1, 0, 1}

verbose : bool

Returns

rating_dfpandas.DataFrame of shape (n_ratings, 3)

the list of rating lists where one column (of name user_col) is associated with users, one column (of name item_col) is associated with items, and one column (of name rating_col) is associated with ratings in {-1, 0, 1}

stanscofi.utils.print_dataset(ratings, user_col, item_col, rating_col)

Prints values of a drug repurposing dataset

Parameters

ratingspandas.DataFrame of shape (n_ratings, 3)

the list of ratings with columns user_col, item_col, rating_col

user_colstr

column denoting users

item_colstr

column denoting items

rating_colstr

column denoting ratings in {-1, 0, 1}

Returns

None

Prints

The number of items/drugs, users/diseases, and the number of positive (1), negative (-1) and unknown (0) matchings.

stanscofi.utils.ratings2matrix(ratings, user_col, item_col, rating_col)

Converts a list of ratings into a matrix

Parameters

ratingspandas.DataFrame of shape (n_ratings, 3)

the list of known ratings where the first column (user_col) correspond to users, second (item_col) to items, third (rating_col) to ratings in {-1,0,1}

user_colstr

column denoting users

item_colstr

column denoting items

rating_colstr

column denoting ratings in {-1, 0, 1}

Returns

dfpandas.DataFrame of shape (n_items, n_users)

the matrix of ratings in {-1, 1, 0} where unknown matchings are denoted with 0