USE:

featureSelector

USE:

[BIG,x,Adrem,val] = featureSelector(X,y,nbx,nby,nf,nb,cvsets,k,alpha,gama,flag)

DESCRIPTION:

Is a feature selection algorithm for CLASSIFICATION and REGRESSION.

The relevance criterion is J=alpha*AR+gama*ADC where:

Acuracy Rate (AR) is obtained by crossvalidation with a k-nearest neighbor classifier (CLASSIFICATION only) and ADC is the Asymmetric Dependency Coefficient from the information theoretic framework.

The combinatorial optimization method is "plus l take away r".

This function indicates which among the columns of a matrix X are best suited to map y by ranking them in the decreasing order of their importance.

ARGUMENTS:

X must be an mxp matrix (p>=2) and y an m rows vector. y may be a continuous response (REGRESSION problems) or categorical (CLASSIFICATION problems).

nbx and nby are the number of small regions in which, the probability density

function for each component of X and y respectively is considered constant. Recommended values for nbx are 10 to 50 while limits are: 2<nbx<100. nby must be equal with the number of classes if y is the class index in which each row of X belongs to.

nf is the number of forward steps(l) while nb is the number of backward steps (r) in the (l,r) search algorithm. Set flag to 0 if details on computation are not desired.

The remaining of arguments in this function will be given for each type of

situation:

I) CLASSIFICATION

nby must be equal with the number of classes i.e. the number of distinct values in y.

alpha in the range [0 1] is the importance given to the Accuracy Rate while gama [0 1] is the importance coefficient given to ADC.

cvstes is the number of crosvalidation folds, and k the number of neighbors to consider.

II) REGRESSION

nby should be set according with the same recommendations for nbx, while alpha must be 0.

VALUES:

BIG is a structure with a single filed textval. BIG.textval stores all the combinations of features that were evaluated, while val contains the value of relevance criterion J for each of those combinations. Adrem is the index of features that were added and removed at each step in the algorithm. The user should chose as best selected combination the smallest one for which the value of J is maximum.

EXAMPLES:

CLASSIFICATION

//load the iris data set. There are 3 classes and 50 samples in each

> irisdata

// Searches best features of the iris data set by using as features relevance measure the ADC only and forward selection only, as nb is set to 0.

> [BIG,x,Adrem,val] = featureSelector(X,y,10,3,1,0,3,5,0,1,1)

// do the same but use as relevance measure the.3-fold crossvalidated Accuracy Rate of a 5-nearest neighbor classifier

> [BIG,x,Adrem,val] = featureSelector(X,y,10,3,1,0,3,1,1,0,1)

// use as relevance measure both AR and ADC and use (l,r) search with r<>0

> [BIG,x,Adrem,val] = featureSelector(X,y,10,3,2,1,3,1,0.5,0.5,1)

REGRESSION

//generate a 5 features matrix X of random numbers. y is generated in such a way that the importance of features of X in predicting y decreases from the 1st to 3^rd while the remaining 2 are completely unimportant.

> X=randn(200,5)

> y=(2*X(:,1).^2-X(:,3))./X(:,2)

// Searches best features using as features relevance measure the ADC and forward selection only, as nb is set to 0.

> [BIG,x,Adrem,val] = featureSelector(X,y,20,20,1,0,3,5,0,1,1)

// use (l,r) search with r<>0

> [BIG,x,Adrem,val] = featureSelector(X,y,20,20,3,1,3,5,0,1,1)

COMMENTS:

Created by Laurentiu Adi Tarca. Revised 10.07.2003

This help was created on 12.03.2004