Models

class stanscofi.models.BasicModel(params)

Bases: object

A class used to encode a drug repurposing model

Parameters

paramsdict

dictionary which contains method-wise parameters

Attributes

namestr

the name of the model

modeldepends on the implemented method

may contain an instance of a class of sklearn classifiers

other attributes might be present depending on the type of model

Methods

__init__(params)

Initializes the model with preselected parameters

fit(train_dataset, seed=1234)

Preprocesses and fits the model

predict_proba(test_dataset)

Outputs properly formatted predictions of the fitted model on test_dataset

predict(scores)

Applies the following decision rule: if score<threshold, then return the negative label, otherwise return the positive label

recommend_k_pairs(dataset, k=1, threshold=None)

Outputs the top-k (item, user) candidates (or candidates which score is higher than a threshold) in the input dataset

print_scores(scores)

Prints out information about scores

print_classification(predictions)

Prints out information about predicted labels

preprocessing(train_dataset) [not implemented in BasicModel]

Preprocess the input dataset into something that is an input to the self.model_fit if it exists

model_fit(train_dataset) [not implemented in BasicModel]

Fits the model on train_dataset

model_predict_proba(test_dataset) [not implemented in BasicModel]

Outputs predictions of the fitted model on test_dataset

fit(train_dataset, seed=1234)

Fitting the model on the training dataset.

Not implemented in the BasicModel class.

Parameters

train_datasetstanscofi.Dataset

training dataset on which the model should fit

seedint (default: 1234)

random seed

model_fit()

Fitting the model on the training dataset.

<Not implemented in the BasicModel class.>

Parameters

appropriate inputs to the classifier (vary across algorithms)

model_predict_proba()

Making predictions using the model on the testing dataset.

<Not implemented in the BasicModel class.>

Parameters

appropriate inputs to the classifier (vary across algorithms)

Returns

scoresarray_like of shape (n_items, n_users)

prediction values by the model

predict(scores, threshold=0.5)
Outputs class labels based on the scores, using the following formula

prediction = -1 if (score<threshold) else 1

Parameters

scoresCOO-array of shape (n_items, n_users)

sparse matrix in COOrdinate format

thresholdfloat

the threshold of classification into the positive class

Returns

predictionsCOO-array of shape (n_items, n_users)

sparse matrix in COOrdinate format with values in {-1,1}

predict_proba(test_dataset, default_zero_val=1e-31)

Outputs properly formatted scores (not necessarily in [0,1]!) from the fitted model on test_dataset. Internally calls model_predict() then reformats the scores

Parameters

test_datasetstanscofi.Dataset

dataset on which predictions should be made

Returns

scoresCOO-array of shape (n_items, n_users)

sparse matrix in COOrdinate format, with nonzero values corresponding to predictions on available pairs in the dataset

preprocessing(dataset, is_training=True)

Preprocessing step, which converts elements of a dataset (ratings matrix, user feature matrix, item feature matrix) into appropriate inputs to the classifier (e.g., X feature matrix for each (user, item) pair, y response vector).

<Not implemented in the BasicModel class.>

Parameters

datasetstanscofi.Dataset

dataset to convert

is_trainingbool

is the preprocessing prior to training (true) or testing (false)?

Returns

appropriate inputs to the classifier (vary across algorithms)

print_classification(predictions)

Prints out information about the predicted classes

Parameters

predictionsCOO-array

sparse matrix in COOrdinate format

print_scores(scores)

Prints out information about the scores

Parameters

scoresCOO-array

sparse matrix in COOrdinate format

recommend_k_pairs(dataset, k=1, threshold=None)

Outputs the top-k (item, user) candidates (or candidates which score is higher than a threshold) in the input dataset

Parameters

datasetstanscofi.Dataset

dataset on which predictions should be made

kint or None (default: 1)

number of pair candidates to return (with ties)

thresholdfloat or None (default: 0)

threshold on candidate scores. If k is not None, k best candidates are returned independently of the value of threshold

Parameters

candidateslist of tuples of size 3

list of (item, user, score) candidates (by name as present in the dataset)

class stanscofi.models.LogisticRegression(params)

Bases: BasicModel

Logistic Regression (calls sklearn.linear_model.LogisticRegression internally). It uses the very same parameters as sklearn.linear_model.LogisticRegression, so please refer to help(sklearn.linear_model.LogisticRegression).

Parameters

paramsdict

dictionary which contains sklearn.linear_model.LogisticRegression parameters, plus a key called “preprocessing” which determines which preprocessing function (in stanscofi.preprocessing) should be applied to data, plus a key called “subset” which gives the maximum number of features to consider in the model (those features will be the Top-subset in terms of variance across samples)

Attributes

Same as BasicModel class

Methods

Same as BasicModel class preprocessing(train_dataset)

Preprocesses the input dataset into something that is an input to fit

model_fit(train_dataset)

Preprocesses and fits the model

model_predict_proba(test_dataset)

Outputs predictions of the fitted model on test_dataset

model_fit(X, y)

Fitting the Logistic Regression model on the training dataset.

Parameters

Xarray-like of shape (n_ratings, n_pair_features)

(user, item) feature matrix (the actual contents of the matrix depends on parameters “preprocessing” and “subset” given as input

yarray-like of shape (n_ratings, )

response vector for each (user, item) pair

model_predict_proba(X)

Making predictions using the Logistic Regression model on the testing dataset.

Parameters

Xarray-like of shape (n_ratings, n_pair_features)

(user, item) feature matrix (the actual contents of the matrix depends on parameters “preprocessing” and “subset” given as input

preprocessing(dataset, is_training=True)

Preprocessing step, which converts elements of a dataset (ratings matrix, user feature matrix, item feature matrix) into appropriate inputs to the Logistic Regression classifier.

Parameters

datasetstanscofi.Dataset

dataset to convert

is_trainingbool

is the preprocessing prior to training (true) or testing (false)?

Returns

args : contains X : array-like of shape (n_ratings, n_pair_features)

(user, item) feature matrix (the actual contents of the matrix depends on parameters “preprocessing” and “subset” given as input

yarray-like of shape (n_ratings, )

response vector for each (user, item) pair

class stanscofi.models.NMF(params)

Bases: BasicModel

Non-negative Matrix Factorization (calls sklearn.decomposition.NMF internally). It uses the very same parameters as sklearn.decomposition.NMF, so please refer to help(sklearn.decomposition.NMF).

Parameters

paramsdict

dictionary which contains sklearn.decomposition.NMF parameters

Attributes

Same as BasicModel class

Methods

Same as BasicModel class preprocessing(train_dataset)

Preprocesses the input dataset into something that is an input to fit

model_fit(train_dataset)

Preprocesses and fits the model

model_predict_proba(test_dataset)

Outputs predictions of the fitted model on test_dataset

model_fit(input)

Fitting the NMF model on the preprocessed training dataset.

Parameters

inputarray-like of shape (n_samples,n_features)

training data

model_predict_proba(input)

Making predictions using the NMF model on the testing dataset.

Parameters

inputarray-like of shape (n_samples,n_features)

testing data

Returns

result : array-like of shape (n_samples,n_features)

preprocessing(dataset, is_training=True)

Preprocessing step, which converts elements of a dataset (ratings matrix, user feature matrix, item feature matrix) into appropriate inputs to the NMF classifier.

Parameters

datasetstanscofi.Dataset

dataset to convert

is_trainingbool

is the preprocessing prior to training (true) or testing (false)?

Returns

args : contains A : array-like of shape (n_users, n_items)

contains the transposed translated association matrix so that all its values are non-negative