itss_ADC
USE:
[ADC,HX,Hy,binX,biny] = itss_ADC(X,y,nbx,nby)
DESCRIPTION:
Computes an Asymmetric Dependency Coefficient (Information theory)
between a feature variable X and a response variable y. For details see Computers chem. Engng Vol 22, No. 4/5, pp.
613-626 by D.V. Sridhar
ARGUMENTS:
X must be
an mxp matrix (p>=1) and y an m rows vector. y may be a continuous response (REGRESSION problems) or
categorical (CLASSIFICATION problems).
nbx and nby are the
number of small regions in which, the probability density
function
for each component of X and y respectively is considered constant. Recommended
values for nbx are 10 to 50 while limits are: 2<nbx<100. nby
must be equal with the number of classes if y is the class index in which each
row of X belongs to.
VALUES:
ADC is
Asymmetric Dependency Coefficient [0,1] bounded. 0
means now information on y provided by X while 1 means y is completely
predictable if X is known.
HX and Hy are the entropies of X and y. binX,
biny are for internal use of the tarcafun
package.
EXAMPLES:
a)
REGRESSION
//generate
a 5 features matrix X of random numbers. y is
generated in such a way that the importance of features of X in predicting y
decreases from the 1st to 3rd while the remaining 2 are completely
unimportant.
> X=randn(200,5)
> y=(2*X(:,1).^2-X(:,3))./X(:,2)
// compute
ADC for the first feature
> ADC1 = itss_ADC(X(:,1),y,20,20)
// compute
ADC for the second feature
> ADC2 = itss_ADC(X(:,2),y,20,20)
// compute
ADC for the first feature 2 and 3 simultaneously
> ADC23 = itss_ADC(X(:,2:3),y,20,20)
b)
CLASSIFICATION
//load the
iris data set. There are 3 classes and 50 samples in each
> irisdata
//computed
ADC for the first feature
>ADC1 = itss_ADC(X(:,1),y,20,3)
//computed
ADC for the all features
>ADC2 = itss_ADC(X,y,20,3)
COMMENTS:
Created by Laurentiu Adi
Tarca. Revised 10.07.2003
This help
was created on 12.03.2004