Models
- class stanscofi.models.BasicModel(params)
Bases:
object
A class used to encode a drug repurposing model
…
Parameters
- paramsdict
dictionary which contains method-wise parameters
Attributes
- namestr
the name of the model
- modeldepends on the implemented method
may contain an instance of a class of sklearn classifiers
- …
other attributes might be present depending on the type of model
Methods
- __init__(params)
Initializes the model with preselected parameters
- fit(train_dataset, seed=1234)
Preprocesses and fits the model
- predict_proba(test_dataset)
Outputs properly formatted predictions of the fitted model on test_dataset
- predict(scores)
Applies the following decision rule: if score<threshold, then return the negative label, otherwise return the positive label
- recommend_k_pairs(dataset, k=1, threshold=None)
Outputs the top-k (item, user) candidates (or candidates which score is higher than a threshold) in the input dataset
- print_scores(scores)
Prints out information about scores
- print_classification(predictions)
Prints out information about predicted labels
- preprocessing(train_dataset) [not implemented in BasicModel]
Preprocess the input dataset into something that is an input to the self.model_fit if it exists
- model_fit(train_dataset) [not implemented in BasicModel]
Fits the model on train_dataset
- model_predict_proba(test_dataset) [not implemented in BasicModel]
Outputs predictions of the fitted model on test_dataset
- fit(train_dataset, seed=1234)
Fitting the model on the training dataset.
Not implemented in the BasicModel class.
…
Parameters
- train_datasetstanscofi.Dataset
training dataset on which the model should fit
- seedint (default: 1234)
random seed
- model_fit()
Fitting the model on the training dataset.
<Not implemented in the BasicModel class.>
…
Parameters
- ……
appropriate inputs to the classifier (vary across algorithms)
- model_predict_proba()
Making predictions using the model on the testing dataset.
<Not implemented in the BasicModel class.>
…
Parameters
- ……
appropriate inputs to the classifier (vary across algorithms)
…
Returns
- scoresarray_like of shape (n_items, n_users)
prediction values by the model
- predict(scores, threshold=0.5)
- Outputs class labels based on the scores, using the following formula
prediction = -1 if (score<threshold) else 1
…
Parameters
- scoresCOO-array of shape (n_items, n_users)
sparse matrix in COOrdinate format
- thresholdfloat
the threshold of classification into the positive class
Returns
- predictionsCOO-array of shape (n_items, n_users)
sparse matrix in COOrdinate format with values in {-1,1}
- predict_proba(test_dataset, default_zero_val=1e-31)
Outputs properly formatted scores (not necessarily in [0,1]!) from the fitted model on test_dataset. Internally calls model_predict() then reformats the scores
…
Parameters
- test_datasetstanscofi.Dataset
dataset on which predictions should be made
Returns
- scoresCOO-array of shape (n_items, n_users)
sparse matrix in COOrdinate format, with nonzero values corresponding to predictions on available pairs in the dataset
- preprocessing(dataset, is_training=True)
Preprocessing step, which converts elements of a dataset (ratings matrix, user feature matrix, item feature matrix) into appropriate inputs to the classifier (e.g., X feature matrix for each (user, item) pair, y response vector).
<Not implemented in the BasicModel class.>
…
Parameters
- datasetstanscofi.Dataset
dataset to convert
- is_trainingbool
is the preprocessing prior to training (true) or testing (false)?
Returns
- ……
appropriate inputs to the classifier (vary across algorithms)
- print_classification(predictions)
Prints out information about the predicted classes
…
Parameters
- predictionsCOO-array
sparse matrix in COOrdinate format
- print_scores(scores)
Prints out information about the scores
…
Parameters
- scoresCOO-array
sparse matrix in COOrdinate format
- recommend_k_pairs(dataset, k=1, threshold=None)
Outputs the top-k (item, user) candidates (or candidates which score is higher than a threshold) in the input dataset
…
Parameters
- datasetstanscofi.Dataset
dataset on which predictions should be made
- kint or None (default: 1)
number of pair candidates to return (with ties)
- thresholdfloat or None (default: 0)
threshold on candidate scores. If k is not None, k best candidates are returned independently of the value of threshold
…
Parameters
- candidateslist of tuples of size 3
list of (item, user, score) candidates (by name as present in the dataset)
- class stanscofi.models.LogisticRegression(params)
Bases:
BasicModel
Logistic Regression (calls sklearn.linear_model.LogisticRegression internally). It uses the very same parameters as sklearn.linear_model.LogisticRegression, so please refer to help(sklearn.linear_model.LogisticRegression).
…
Parameters
- paramsdict
dictionary which contains sklearn.linear_model.LogisticRegression parameters, plus a key called “preprocessing” which determines which preprocessing function (in stanscofi.preprocessing) should be applied to data, plus a key called “subset” which gives the maximum number of features to consider in the model (those features will be the Top-subset in terms of variance across samples)
Attributes
Same as BasicModel class
Methods
Same as BasicModel class preprocessing(train_dataset)
Preprocesses the input dataset into something that is an input to fit
- model_fit(train_dataset)
Preprocesses and fits the model
- model_predict_proba(test_dataset)
Outputs predictions of the fitted model on test_dataset
- model_fit(X, y)
Fitting the Logistic Regression model on the training dataset.
…
Parameters
- Xarray-like of shape (n_ratings, n_pair_features)
(user, item) feature matrix (the actual contents of the matrix depends on parameters “preprocessing” and “subset” given as input
- yarray-like of shape (n_ratings, )
response vector for each (user, item) pair
- model_predict_proba(X)
Making predictions using the Logistic Regression model on the testing dataset.
…
Parameters
- Xarray-like of shape (n_ratings, n_pair_features)
(user, item) feature matrix (the actual contents of the matrix depends on parameters “preprocessing” and “subset” given as input
- preprocessing(dataset, is_training=True)
Preprocessing step, which converts elements of a dataset (ratings matrix, user feature matrix, item feature matrix) into appropriate inputs to the Logistic Regression classifier.
…
Parameters
- datasetstanscofi.Dataset
dataset to convert
- is_trainingbool
is the preprocessing prior to training (true) or testing (false)?
Returns
args : contains X : array-like of shape (n_ratings, n_pair_features)
(user, item) feature matrix (the actual contents of the matrix depends on parameters “preprocessing” and “subset” given as input
- yarray-like of shape (n_ratings, )
response vector for each (user, item) pair
- class stanscofi.models.NMF(params)
Bases:
BasicModel
Non-negative Matrix Factorization (calls sklearn.decomposition.NMF internally). It uses the very same parameters as sklearn.decomposition.NMF, so please refer to help(sklearn.decomposition.NMF).
…
Parameters
- paramsdict
dictionary which contains sklearn.decomposition.NMF parameters
Attributes
Same as BasicModel class
Methods
Same as BasicModel class preprocessing(train_dataset)
Preprocesses the input dataset into something that is an input to fit
- model_fit(train_dataset)
Preprocesses and fits the model
- model_predict_proba(test_dataset)
Outputs predictions of the fitted model on test_dataset
- model_fit(input)
Fitting the NMF model on the preprocessed training dataset.
…
Parameters
- inputarray-like of shape (n_samples,n_features)
training data
- model_predict_proba(input)
Making predictions using the NMF model on the testing dataset.
…
Parameters
- inputarray-like of shape (n_samples,n_features)
testing data
…
Returns
result : array-like of shape (n_samples,n_features)
- preprocessing(dataset, is_training=True)
Preprocessing step, which converts elements of a dataset (ratings matrix, user feature matrix, item feature matrix) into appropriate inputs to the NMF classifier.
…
Parameters
- datasetstanscofi.Dataset
dataset to convert
- is_trainingbool
is the preprocessing prior to training (true) or testing (false)?
Returns
args : contains A : array-like of shape (n_users, n_items)
contains the transposed translated association matrix so that all its values are non-negative