Models

class stanscofi.models.BasicModel(params)

Bases: object

A class used to encode a drug repurposing model

…

Parameters

paramsdict: dictionary which contains method-wise parameters

Attributes

namestr: the name of the model
modeldepends on the implemented method: may contain an instance of a class of sklearn classifiers

…: other attributes might be present depending on the type of model

Methods

__init__(params): Initializes the model with preselected parameters
fit(train_dataset, seed=1234): Preprocesses and fits the model
predict_proba(test_dataset): Outputs properly formatted predictions of the fitted model on test_dataset
predict(scores): Applies the following decision rule: if score<threshold, then return the negative label, otherwise return the positive label
recommend_k_pairs(dataset, k=1, threshold=None): Outputs the top-k (item, user) candidates (or candidates which score is higher than a threshold) in the input dataset
print_scores(scores): Prints out information about scores
print_classification(predictions): Prints out information about predicted labels
preprocessing(train_dataset) [not implemented in BasicModel]: Preprocess the input dataset into something that is an input to the self.model_fit if it exists
model_fit(train_dataset) [not implemented in BasicModel]: Fits the model on train_dataset
model_predict_proba(test_dataset) [not implemented in BasicModel]: Outputs predictions of the fitted model on test_dataset

fit(train_dataset, seed=1234)

Fitting the model on the training dataset.

Not implemented in the BasicModel class.

…

Parameters

train_datasetstanscofi.Dataset: training dataset on which the model should fit
seedint (default: 1234): random seed

model_fit()

Fitting the model on the training dataset.

…

Parameters

……: appropriate inputs to the classifier (vary across algorithms)

model_predict_proba()

Making predictions using the model on the testing dataset.

…

Parameters

……: appropriate inputs to the classifier (vary across algorithms)

…

Returns

scoresarray_like of shape (n_items, n_users): prediction values by the model

predict(scores, threshold=0.5)

Outputs class labels based on the scores, using the following formula: prediction = -1 if (score<threshold) else 1

…

Parameters

scoresCOO-array of shape (n_items, n_users): sparse matrix in COOrdinate format
thresholdfloat: the threshold of classification into the positive class

Returns

predictionsCOO-array of shape (n_items, n_users): sparse matrix in COOrdinate format with values in {-1,1}

predict_proba(test_dataset, default_zero_val=1e-31)

Outputs properly formatted scores (not necessarily in [0,1]!) from the fitted model on test_dataset. Internally calls model_predict() then reformats the scores

…

Parameters

test_datasetstanscofi.Dataset: dataset on which predictions should be made

Returns

scoresCOO-array of shape (n_items, n_users): sparse matrix in COOrdinate format, with nonzero values corresponding to predictions on available pairs in the dataset

preprocessing(dataset, is_training=True)

Preprocessing step, which converts elements of a dataset (ratings matrix, user feature matrix, item feature matrix) into appropriate inputs to the classifier (e.g., X feature matrix for each (user, item) pair, y response vector).

…

Parameters

datasetstanscofi.Dataset: dataset to convert
is_trainingbool: is the preprocessing prior to training (true) or testing (false)?

Returns

……: appropriate inputs to the classifier (vary across algorithms)

print_classification(predictions)

Prints out information about the predicted classes

…

Parameters

predictionsCOO-array: sparse matrix in COOrdinate format

print_scores(scores)

Prints out information about the scores

…

Parameters

scoresCOO-array: sparse matrix in COOrdinate format

recommend_k_pairs(dataset, k=1, threshold=None)

Outputs the top-k (item, user) candidates (or candidates which score is higher than a threshold) in the input dataset

…

Parameters

datasetstanscofi.Dataset: dataset on which predictions should be made
kint or None (default: 1): number of pair candidates to return (with ties)
thresholdfloat or None (default: 0): threshold on candidate scores. If k is not None, k best candidates are returned independently of the value of threshold

…

Parameters

candidateslist of tuples of size 3: list of (item, user, score) candidates (by name as present in the dataset)

class stanscofi.models.LogisticRegression(params)

Bases: BasicModel

Logistic Regression (calls sklearn.linear_model.LogisticRegression internally). It uses the very same parameters as sklearn.linear_model.LogisticRegression, so please refer to help(sklearn.linear_model.LogisticRegression).

…

Parameters

paramsdict: dictionary which contains sklearn.linear_model.LogisticRegression parameters, plus a key called “preprocessing” which determines which preprocessing function (in stanscofi.preprocessing) should be applied to data, plus a key called “subset” which gives the maximum number of features to consider in the model (those features will be the Top-subset in terms of variance across samples)

Attributes

Same as BasicModel class

Methods

Same as BasicModel class preprocessing(train_dataset)

Preprocesses the input dataset into something that is an input to fit

model_fit(train_dataset): Preprocesses and fits the model
model_predict_proba(test_dataset): Outputs predictions of the fitted model on test_dataset

model_fit(X, y)

Fitting the Logistic Regression model on the training dataset.

…

Parameters

Xarray-like of shape (n_ratings, n_pair_features): (user, item) feature matrix (the actual contents of the matrix depends on parameters “preprocessing” and “subset” given as input
yarray-like of shape (n_ratings, ): response vector for each (user, item) pair

model_predict_proba(X)

Making predictions using the Logistic Regression model on the testing dataset.

…

Parameters

Xarray-like of shape (n_ratings, n_pair_features): (user, item) feature matrix (the actual contents of the matrix depends on parameters “preprocessing” and “subset” given as input

preprocessing(dataset, is_training=True)

Preprocessing step, which converts elements of a dataset (ratings matrix, user feature matrix, item feature matrix) into appropriate inputs to the Logistic Regression classifier.

…

Parameters

datasetstanscofi.Dataset: dataset to convert
is_trainingbool: is the preprocessing prior to training (true) or testing (false)?

Returns

args : contains X : array-like of shape (n_ratings, n_pair_features)

(user, item) feature matrix (the actual contents of the matrix depends on parameters “preprocessing” and “subset” given as input

yarray-like of shape (n_ratings, ): response vector for each (user, item) pair

class stanscofi.models.NMF(params)

Bases: BasicModel

Non-negative Matrix Factorization (calls sklearn.decomposition.NMF internally). It uses the very same parameters as sklearn.decomposition.NMF, so please refer to help(sklearn.decomposition.NMF).

…

Parameters

paramsdict: dictionary which contains sklearn.decomposition.NMF parameters

Attributes

Same as BasicModel class

Methods

Same as BasicModel class preprocessing(train_dataset)

Preprocesses the input dataset into something that is an input to fit

model_fit(train_dataset): Preprocesses and fits the model
model_predict_proba(test_dataset): Outputs predictions of the fitted model on test_dataset

model_fit(input)

Fitting the NMF model on the preprocessed training dataset.

…

Parameters

inputarray-like of shape (n_samples,n_features): training data

model_predict_proba(input)

Making predictions using the NMF model on the testing dataset.

…

Parameters

inputarray-like of shape (n_samples,n_features): testing data

…

Returns

result : array-like of shape (n_samples,n_features)

preprocessing(dataset, is_training=True)

Preprocessing step, which converts elements of a dataset (ratings matrix, user feature matrix, item feature matrix) into appropriate inputs to the NMF classifier.

…

Parameters

datasetstanscofi.Dataset: dataset to convert
is_trainingbool: is the preprocessing prior to training (true) or testing (false)?

Returns

args : contains A : array-like of shape (n_users, n_items)

contains the transposed translated association matrix so that all its values are non-negative