@HaomingJiang 2018-02-21T08:34:41.000000Z 字数 4553 阅读 1530

Code Template

Code Template

In order to easily manage our code and project, you should develope the model following this document.

Introduction

For simplicity, currently, you only need to work with two R scripts:
- testmodel.R
- your_model_file_name.R (an example is model_los_poireg.R)

For other scripts:
- server.R and ui.R are for the R shiny visualization application
- model.R is used for provide a unified interface of all models; For now you can just ignore that.
- scorecard.R is used for generate pdf scorecard. (TODO)

Data has been cleaned and saved in .RData format under the data folder:
- Procedure all.RData, the data related to a certain procedure. We only forcus on Colectomy for now.
- Procedure train.RData, the data before 2017, which is used for training the model
- Procedure test.RData, the data after 2017, which is used for testing the model
- Procedure surgeon.RData, the data related to randomly selected surgeon for this procedure. It is used for testing the visualization.
- cleandata.R, the code used for data cleaning
- NSQIP CDW Metadata.csv , the variable dictionary
- NSQIP_CDW_v1.0_DEIDENTIFIED.csv, the raw data

Workflow

Step1

First, you should make a local copy on your own computer.
Then, take a look at testmodel.R

testmodel.R
Please feel free to modify testmodel.R for your own purpose, since it is only a triger script for testing the functionality of your model. It will not be included in the final application.

Step2

Create a R script for your model under the same folder, so that testmodel.R can load it. Please follow our naming convention: model_<metric>_<method>.R, where <metric> is the short name of metric and should be one of these: los (length of stay), read (readmission), vdcost (variable direct cost), or (unplanned return to OR). You may come up with a short name for your method. An example is model_los_poireg.R

Step3

Implement two functions model and model_eval in model_<metric>_<method>.R (If you need other libraries, you can load them at the begining of the code):
1. model

model <- function(x,y,parameter=list(),doeval = FALSE){
    ########################################################
    #                 Model Implementation                 #
    ########################################################
    if(doeval)
    {
    ########################################################
    #                 Model Evaluation                     #
    ########################################################
    }
    return(fit)
}

x is the predictor, and y is the response, parameter is a list which might be useful for controlling your model. If your model do not need any additional control variable, just leave it alone.
For the return value you must return something which will be used and only be used in the second function model_eval.
You can also do some basic analysis in Model Evaluation section, like showing the residue plot or printing model summary. Printing these information will be helpful for building the model. Notice that in the final visualization application the switch of doeval will be turned off. But in testmodel.R, it is turned on.
Actually you can inplment the function anyway you like. There are only three rules:
i. Do not change the input, put any additional things in parameter
ii. Make sure there will not be any additional output, print or plot when doeval = FALSE in the final application
iii. Return the model through the return value, which will be used in the second function model_eval

2. model_eval

model_eval <- function(fitted_model, newx, newy, parameter=list(), doeval = FALSE){
    ########################################################
    #                 Model Implementation                 #
    ########################################################
    if(doeval)
    {
    ########################################################
    #                 Model Evaluation                     #
    ########################################################
    }
    return(evaluation)
}

fitted_model is the return value from the first function, model. newx,newy are the predictors and reponses of the new data set, such as testing data when you evaluate the model and surgeon data when you evaluate the performance of surgeons. doeval and doeval serve the same functions as the ones in model. evaluation is a list containing everything important to the final application, such as the model accuracy, the estimated stay of length, the estimated variance and the confidence interval.

There are also three rules that you have to sticked with:
i. Do not change the input, put any additional things in parameter
ii. Make sure there will not be any additional output, print or plot when doeval = FALSE in the final application
iii. Return the evaluation on new data, which will be used in the final application such as the scorecard

Some Basic Ideas of Models and Evaluation

linear/poisson model for regression (linear model might be easier to analyze)
logistic regression for classification
see residuals vs. reponse distribution to see if the residuals satisfies the iid normal assumption. If you see some irregular pattern, you may consider apply transformation.
If you need apply transformation, you may apply box-cox or see the reference
You can print the summary of the model to see if there exists many insignificant variable. Then you can do variable selection, including step-wise selection and manual selection.
Evaluate model via F-test/R^2/MSE/AUC, etc.