itss_ADC

 

 

USE:

[ADC,HX,Hy,binX,biny] = itss_ADC(X,y,nbx,nby)

 

DESCRIPTION:

Computes an Asymmetric Dependency Coefficient (Information theory) between a feature variable X and a response variable y. For details see Computers chem. Engng Vol 22, No. 4/5, pp. 613-626 by D.V. Sridhar

 

ARGUMENTS:

X must be an mxp matrix (p>=1) and y an m rows vector. y may be a continuous response (REGRESSION problems) or categorical (CLASSIFICATION problems).

nbx and nby are the number of small regions in which, the probability density

function for each component of X and y respectively is considered constant. Recommended values for nbx are 10 to 50 while limits are: 2<nbx<100. nby must be equal with the number of classes if y is the class index in which each row of X belongs to.

 

VALUES:

ADC is Asymmetric Dependency Coefficient [0,1] bounded. 0 means now information on y provided by X while 1 means y is completely predictable if X is known.

HX and Hy are the entropies of X and y. binX, biny are for internal use of the tarcafun package.

 

EXAMPLES:

a) REGRESSION

//generate a 5 features matrix X of random numbers. y is generated in such a way that the importance of features of X in predicting y decreases from the 1st to 3rd while the remaining 2 are completely unimportant. 

> X=randn(200,5)

> y=(2*X(:,1).^2-X(:,3))./X(:,2)

// compute ADC for the first feature

> ADC1 = itss_ADC(X(:,1),y,20,20)

// compute ADC for the second feature

> ADC2 = itss_ADC(X(:,2),y,20,20)

// compute ADC for the first feature 2 and 3 simultaneously

> ADC23 = itss_ADC(X(:,2:3),y,20,20)

 

b) CLASSIFICATION

//load the iris data set. There are 3 classes and 50 samples in each

> irisdata

//computed ADC for the first feature

>ADC1 = itss_ADC(X(:,1),y,20,3)

//computed ADC for the all features

>ADC2 = itss_ADC(X,y,20,3)

 

COMMENTS:

Created by Laurentiu Adi Tarca. Revised 10.07.2003

This help was created on 12.03.2004