weka.classifiers.functions
Class Logistic

java.lang.Object
  extended byweka.classifiers.Classifier
      extended byweka.classifiers.functions.Logistic
All Implemented Interfaces:
java.lang.Cloneable, OptionHandler, java.io.Serializable, WeightedInstancesHandler

public class Logistic
extends Classifier
implements OptionHandler, WeightedInstancesHandler

Second implementation for building and using a multinomial logistic regression model with a ridge estimator.

There are some modifications, however, compared to the paper of le Cessie and van Houwelingen(1992):
If there are k classes for n instances with m attributes, the parameter matrix B to be calculated will be an m*(k-1) matrix.
The probability for class j except the last class is
Pj(Xi) = exp(XiBj)/((sum[j=1..(k-1)]exp(Xi*Bj))+1)
The last class has probability
1-(sum[j=1..(k-1)]Pj(Xi)) = 1/((sum[j=1..(k-1)]exp(Xi*Bj))+1)
The (negative) multinomial log-likelihood is thus:
L = -sum[i=1..n]{ sum[j=1..(k-1)](Yij * ln(Pj(Xi))) + (1 - (sum[j=1..(k-1)]Yij)) * ln(1 - sum[j=1..(k-1)]Pj(Xi)) } + ridge * (B^2)
In order to find the matrix B for which L is minimised, a Quasi-Newton Method is used to search for the optimized values of the m*(k-1) variables. Note that before we use the optimization procedure, we "squeeze" the matrix B into a m*(k-1) vector. For details of the optimization procedure, please check weka.core.Optimization class.

Although original Logistic Regression does not deal with instance weights, we modify the algorithm a little bit to handle the instance weights.

Reference: le Cessie, S. and van Houwelingen, J.C. (1992). Ridge Estimators in Logistic Regression. Applied Statistics, Vol. 41, No. 1, pp. 191-201.

Missing values are replaced using a ReplaceMissingValuesFilter, and nominal attributes are transformed into numeric attributes using a NominalToBinaryFilter.

Valid options are:

-D
Turn on debugging output.

-R
Set the ridge parameter for the log-likelihood.

-M
Set the maximum number of iterations (default -1, iterates until convergence).

Version:
$Revision: 1.32 $
Author:
Xin Xu (xx5@cs.waikato.ac.nz)
See Also:
Serialized Form

Nested Class Summary
private  class Logistic.OptEng
           
 
Field Summary
private  RemoveUseless m_AttFilter
           
protected  int m_ClassIndex
          The index of the class attribute
protected  double[][] m_Data
          The data saved as a matrix
protected  boolean m_Debug
          Debugging output
protected  double m_LL
          Log-likelihood of the searched model
private  int m_MaxIts
          The maximum number of iterations.
private  NominalToBinary m_NominalToBinary
          The filter used to make attributes numeric.
protected  int m_NumClasses
          The number of the class labels
protected  int m_NumPredictors
          The number of attributes in the model
protected  double[][] m_Par
          The coefficients (optimized parameters) of the model
private  ReplaceMissingValues m_ReplaceMissingValues
          The filter used to get rid of missing values.
protected  double m_Ridge
          The ridge parameter.
 
Constructor Summary
Logistic()
           
 
Method Summary
 void buildClassifier(Instances train)
          Builds the classifier
 java.lang.String debugTipText()
          Returns the tip text for this property
 double[] distributionForInstance(Instance instance)
          Computes the distribution for a given instance
private  double[] evaluateProbability(double[] data)
          Compute the posterior distribution using optimized parameter values and the testing instance.
 boolean getDebug()
          Gets whether debugging output will be printed.
 int getMaxIts()
          Get the value of MaxIts.
 java.lang.String[] getOptions()
          Gets the current settings of the classifier.
 double getRidge()
          Gets the ridge in the log-likelihood.
 java.lang.String globalInfo()
          Returns a string describing this classifier
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options
static void main(java.lang.String[] argv)
          Main method for testing this class.
 java.lang.String maxItsTipText()
          Returns the tip text for this property
 java.lang.String ridgeTipText()
          Returns the tip text for this property
 void setDebug(boolean debug)
          Sets whether debugging output will be printed.
 void setMaxIts(int newMaxIts)
          Set the value of MaxIts.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setRidge(double ridge)
          Sets the ridge in the log-likelihood.
 java.lang.String toString()
          Gets a string describing the classifier.
 
Methods inherited from class weka.classifiers.Classifier
classifyInstance, forName, makeCopies
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_Par

protected double[][] m_Par
The coefficients (optimized parameters) of the model


m_Data

protected double[][] m_Data
The data saved as a matrix


m_NumPredictors

protected int m_NumPredictors
The number of attributes in the model


m_ClassIndex

protected int m_ClassIndex
The index of the class attribute


m_NumClasses

protected int m_NumClasses
The number of the class labels


m_Ridge

protected double m_Ridge
The ridge parameter.


m_AttFilter

private RemoveUseless m_AttFilter

m_NominalToBinary

private NominalToBinary m_NominalToBinary
The filter used to make attributes numeric.


m_ReplaceMissingValues

private ReplaceMissingValues m_ReplaceMissingValues
The filter used to get rid of missing values.


m_Debug

protected boolean m_Debug
Debugging output


m_LL

protected double m_LL
Log-likelihood of the searched model


m_MaxIts

private int m_MaxIts
The maximum number of iterations.

Constructor Detail

Logistic

public Logistic()
Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this classifier

Returns:
a description of the classifier suitable for displaying in the explorer/experimenter gui

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options

Specified by:
listOptions in interface OptionHandler
Overrides:
listOptions in class Classifier
Returns:
an enumeration of all the available options

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options. Valid options are:

-D
Turn on debugging output.

-R ridge
Set the ridge parameter for the log-likelihood.

-M num
Set the maximum number of iterations. (default -1, until convergence)

Specified by:
setOptions in interface OptionHandler
Overrides:
setOptions in class Classifier
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the classifier.

Specified by:
getOptions in interface OptionHandler
Overrides:
getOptions in class Classifier
Returns:
an array of strings suitable for passing to setOptions

debugTipText

public java.lang.String debugTipText()
Returns the tip text for this property

Overrides:
debugTipText in class Classifier
Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setDebug

public void setDebug(boolean debug)
Sets whether debugging output will be printed.

Overrides:
setDebug in class Classifier
Parameters:
debug - true if debugging output should be printed

getDebug

public boolean getDebug()
Gets whether debugging output will be printed.

Overrides:
getDebug in class Classifier
Returns:
true if debugging output will be printed

ridgeTipText

public java.lang.String ridgeTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setRidge

public void setRidge(double ridge)
Sets the ridge in the log-likelihood.

Parameters:
ridge - the ridge

getRidge

public double getRidge()
Gets the ridge in the log-likelihood.

Returns:
the ridge

maxItsTipText

public java.lang.String maxItsTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getMaxIts

public int getMaxIts()
Get the value of MaxIts.

Returns:
Value of MaxIts.

setMaxIts

public void setMaxIts(int newMaxIts)
Set the value of MaxIts.

Parameters:
newMaxIts - Value to assign to MaxIts.

buildClassifier

public void buildClassifier(Instances train)
                     throws java.lang.Exception
Builds the classifier

Specified by:
buildClassifier in class Classifier
Parameters:
train - the training data to be used for generating the boosted classifier.
Throws:
java.lang.Exception - if the classifier could not be built successfully

distributionForInstance

public double[] distributionForInstance(Instance instance)
                                 throws java.lang.Exception
Computes the distribution for a given instance

Overrides:
distributionForInstance in class Classifier
Parameters:
instance - the instance for which distribution is computed
Returns:
the distribution
Throws:
java.lang.Exception - if the distribution can't be computed successfully

evaluateProbability

private double[] evaluateProbability(double[] data)
Compute the posterior distribution using optimized parameter values and the testing instance.

Parameters:
data - the testing instance
Returns:
the posterior probability distribution

toString

public java.lang.String toString()
Gets a string describing the classifier.

Returns:
a string describing the classifer built.

main

public static void main(java.lang.String[] argv)
Main method for testing this class.

Parameters:
argv - should contain the command line arguments to the scheme (see Evaluation)