featureSelector
USE:
[BIG,x,Adrem,val] = featureSelector(X,y,nbx,nby,nf,nb,cvsets,k,alpha,gama,flag)
DESCRIPTION:
Is a
feature selection algorithm for CLASSIFICATION and REGRESSION.
The
relevance criterion is J=alpha*AR+gama*ADC where:
Acuracy
Rate (AR) is obtained by crossvalidation with a
k-nearest neighbor classifier (CLASSIFICATION only) and ADC is the Asymmetric
Dependency Coefficient from the information theoretic framework.
The
combinatorial optimization method is "plus l take
away r".
This
function indicates which among the columns of a matrix X are best suited to map
y by ranking them in the decreasing order of their importance.
ARGUMENTS:
X must be
an mxp matrix (p>=2) and y an m rows vector. y may be a continuous response (REGRESSION problems) or
categorical (CLASSIFICATION problems).
nbx and nby are the
number of small regions in which, the probability density
function
for each component of X and y respectively is considered constant. Recommended
values for nbx are 10 to 50 while limits are: 2<nbx<100. nby
must be equal with the number of classes if y is the class index in which each
row of X belongs to.
nf is the number of forward steps(l) while nb is the number of backward steps (r) in the (l,r) search algorithm. Set flag to 0 if details on
computation are not desired.
The
remaining of arguments in this function will be given for each type of
situation:
I)
CLASSIFICATION
nby must be equal with the number of classes i.e.
the number of distinct values in y.
alpha in
the range [0 1] is the importance given to the Accuracy Rate while gama [0 1] is the importance coefficient given to
ADC.
cvstes is the number of crosvalidation
folds, and k the number of neighbors to consider.
II)
REGRESSION
nby should be set according with the same
recommendations for nbx, while alpha must be 0.
VALUES:
BIG is a
structure with a single filed textval. BIG.textval stores all the combinations of features that
were evaluated, while val
contains the value of relevance criterion J for each of those combinations. Adrem is the index of features that were added and removed
at each step in the algorithm. The user should chose as best selected
combination the smallest one for which the value of J is maximum.
EXAMPLES:
CLASSIFICATION
//load the
iris data set. There are 3 classes and 50 samples in each
> irisdata
// Searches
best features of the iris data set by using as features relevance measure the
ADC only and forward selection only, as nb is set to
0.
> [BIG,x,Adrem,val] = featureSelector(X,y,10,3,1,0,3,5,0,1,1)
// do the
same but use as relevance measure the.3-fold crossvalidated
Accuracy Rate of a 5-nearest neighbor classifier
> [BIG,x,Adrem,val] = featureSelector(X,y,10,3,1,0,3,1,1,0,1)
// use as
relevance measure both AR and ADC and use (l,r) search with r<>0
> [BIG,x,Adrem,val] = featureSelector(X,y,10,3,2,1,3,1,0.5,0.5,1)
REGRESSION
//generate
a 5 features matrix X of random numbers. y is
generated in such a way that the importance of features of X in predicting y
decreases from the 1st to 3rd while the remaining 2 are completely
unimportant.
> X=randn(200,5)
> y=(2*X(:,1).^2-X(:,3))./X(:,2)
// Searches
best features using as features relevance measure the ADC and forward selection
only, as nb is set to 0.
> [BIG,x,Adrem,val] = featureSelector(X,y,20,20,1,0,3,5,0,1,1)
// use (l,r) search with r<>0
> [BIG,x,Adrem,val] = featureSelector(X,y,20,20,3,1,3,5,0,1,1)
COMMENTS:
Created by Laurentiu Adi
Tarca. Revised 10.07.2003
This help
was created on 12.03.2004