This function TSM (aka. The Smith Method) selects a desired number of features (4 by default) by purposefully dropping highly correlated features, i.e, picking up a set of representative features that can best explain the binary outcomes. In a plain language, it works like the follwoing: The first representative feature is the one that shows the highest AUC (Area Under the ROC Curve) out of all the features. The next representative feature is the one that shows the highest AUC out of the remaing features after dropping highly correlated features with the first representative feature. The third, the fourth, and so on, represenative feature will be picked up as the same way the 2nd is picked up.
Usage
TSM(x, method = "spearman", corr = seq(0.1, 0.7, by = 0.1), k = 4, verbose = F)Arguments
- x
Path to the input file
- method
A Character either
pearson(default) orspearman, which is the same paramtermethodforcor().- corr
A numeric vector for the thresholds of correlation coefficients.
- k
The number of desicred features (default:4)
- verbose
Boolean
Examples
input=read.csv(system.file("extdata","demo_input.csv",package="TSM")) # read the example input
TSM(x=input) # run TSM with default parameters
#> calculating AUC for each features...
#> cor0.1
#> cor0.2
#> cor0.3
#> cor0.4
#> cor0.5
#> cor0.6
#> cor0.7
#> Cor Num features Features
#> 1: 0.4 4 F1,F7,F8,F14
#> 2: 0.5 7 F1,F7,F13,F8,F9,F14,F15
#> 3: 0.7 16 F1,F4,F6,F5,F7,F13,F8,F9,F12,F11,F14,F15,F21,F17,F18,F20
#> 4: 0.6 11 F1,F5,F7,F13,F8,F9,F14,F15,F21,F17,F18
#> 5: 0.1 1 F1
#> 6: 0.2 1 F1
#> 7: 0.3 1 F1
#> Best features AIC BIC AUC AUC(LPOCV)
#> 1: F1,F7,F8,F14 106.9498 120.8872 0.8815629 0.8659951
#> 2: F1,F7,F13,F8 108.8605 122.7980 0.8806471 0.8580586
#> 3: F1,F4,F6,F5 111.8757 125.8131 0.8684371 0.8489011
#> 4: F1,F5,F7,F13 111.5937 125.5312 0.8666056 0.8476801
#> 5: F1 113.1852 118.7602 0.8446276 0.8446276
#> 6: F1 113.1852 118.7602 0.8446276 0.8446276
#> 7: F1 113.1852 118.7602 0.8446276 0.8446276
TSM(x=input, corr=c(0.4, 0.5)) # two correlation coefficients only
#> calculating AUC for each features...
#> cor0.4
#> cor0.5
#> Cor Num features Features Best features AIC BIC
#> 1: 0.4 4 F1,F7,F8,F14 F1,F7,F8,F14 106.9498 120.8872
#> 2: 0.5 7 F1,F7,F13,F8,F9,F14,F15 F1,F7,F13,F8 108.8605 122.7980
#> AUC AUC(LPOCV)
#> 1: 0.8815629 0.8659951
#> 2: 0.8806471 0.8580586
TSM(x=input, method="pearson") # pearson method for cor()
#> calculating AUC for each features...
#> cor0.1
#> cor0.2
#> cor0.3
#> cor0.4
#> cor0.5
#> cor0.6
#> cor0.7
#> Cor Num features Features
#> 1: 0.6 9 F1,F7,F13,F8,F9,F16,F14,F15,F17
#> 2: 0.5 4 F1,F13,F15,F18
#> 3: 0.4 3 F1,F11,F14
#> 4: 0.7 14 F1,F5,F7,F13,F8,F9,F12,F11,F14,F15,F21,F17,F18,F20
#> 5: 0.1 1 F1
#> 6: 0.2 1 F1
#> 7: 0.3 1 F1
#> Best features AIC BIC AUC AUC(LPOCV)
#> 1: F1,F7,F13,F8 108.8605 122.7980 0.8806471 0.8580586
#> 2: F1,F13,F15,F18 108.2224 122.1599 0.8717949 0.8522589
#> 3: F1,F11,F14 109.9812 121.1312 0.8672161 0.8489011
#> 4: F1,F5,F7,F13 111.5937 125.5312 0.8666056 0.8476801
#> 5: F1 113.1852 118.7602 0.8446276 0.8446276
#> 6: F1 113.1852 118.7602 0.8446276 0.8446276
#> 7: F1 113.1852 118.7602 0.8446276 0.8446276