weka.classifiers.trees.j48
Class C45Split

java.lang.Object
  extended byweka.classifiers.trees.j48.ClassifierSplitModel
      extended byweka.classifiers.trees.j48.C45Split
All Implemented Interfaces:
java.lang.Cloneable, java.io.Serializable

public class C45Split
extends ClassifierSplitModel

Class implementing a C4.5-type split on an attribute.

Version:
$Revision: 1.8 $
Author:
Eibe Frank (eibe@cs.waikato.ac.nz)
See Also:
Serialized Form

Field Summary
private static GainRatioSplitCrit gainRatioCrit
          Static reference to splitting criterion.
private static InfoGainSplitCrit infoGainCrit
          Static reference to splitting criterion.
private  int m_attIndex
          Attribute to split on.
private  int m_complexityIndex
          Desired number of branches.
private  double m_gainRatio
          GainRatio of split.
private  int m_index
          Number of split points.
private  double m_infoGain
          InfoGain of split.
private  int m_minNoObj
          Minimum number of objects in a split.
private  double m_splitPoint
          Value of split point.
private  double m_sumOfWeights
          The sum of the weights of the instances.
 
Fields inherited from class weka.classifiers.trees.j48.ClassifierSplitModel
m_distribution, m_numSubsets
 
Constructor Summary
C45Split(int attIndex, int minNoObj, double sumOfWeights)
          Initializes the split model.
 
Method Summary
 int attIndex()
          Returns index of attribute for which split was generated.
 void buildClassifier(Instances trainInstances)
          Creates a C4.5-type split on the given data.
 double classProb(int classIndex, Instance instance, int theSubset)
          Gets class probability for instance.
 double codingCost()
          Returns coding cost for split (used in rule learner).
 double gainRatio()
          Returns (C4.5-type) gain ratio for the generated split.
private  void handleEnumeratedAttribute(Instances trainInstances)
          Creates split on enumerated attribute.
private  void handleNumericAttribute(Instances trainInstances)
          Creates split on numeric attribute.
 double infoGain()
          Returns (C4.5-type) information gain for the generated split.
 java.lang.String leftSide(Instances data)
          Prints left side of condition..
 double[][] minsAndMaxs(Instances data, double[][] minsAndMaxs, int index)
          Returns the minsAndMaxs of the index.th subset.
 void resetDistribution(Instances data)
          Sets distribution associated with model.
 java.lang.String rightSide(int index, Instances data)
          Prints the condition satisfied by instances in a subset.
 void setSplitPoint(Instances allInstances)
          Sets split point to greatest value in given data smaller or equal to old split point.
 java.lang.String sourceExpression(int index, Instances data)
          Returns a string containing java source code equivalent to the test made at this node.
 double[] weights(Instance instance)
          Returns weights if instance is assigned to more than one subset.
 int whichSubset(Instance instance)
          Returns index of subset instance is assigned to.
 
Methods inherited from class weka.classifiers.trees.j48.ClassifierSplitModel
checkModel, classifyInstance, classProbLaplace, clone, distribution, dumpLabel, dumpModel, numSubsets, sourceClass, split
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

m_complexityIndex

private int m_complexityIndex
Desired number of branches.


m_attIndex

private int m_attIndex
Attribute to split on.


m_minNoObj

private int m_minNoObj
Minimum number of objects in a split.


m_splitPoint

private double m_splitPoint
Value of split point.


m_infoGain

private double m_infoGain
InfoGain of split.


m_gainRatio

private double m_gainRatio
GainRatio of split.


m_sumOfWeights

private double m_sumOfWeights
The sum of the weights of the instances.


m_index

private int m_index
Number of split points.


infoGainCrit

private static InfoGainSplitCrit infoGainCrit
Static reference to splitting criterion.


gainRatioCrit

private static GainRatioSplitCrit gainRatioCrit
Static reference to splitting criterion.

Constructor Detail

C45Split

public C45Split(int attIndex,
                int minNoObj,
                double sumOfWeights)
Initializes the split model.

Method Detail

buildClassifier

public void buildClassifier(Instances trainInstances)
                     throws java.lang.Exception
Creates a C4.5-type split on the given data. Assumes that none of the class values is missing.

Specified by:
buildClassifier in class ClassifierSplitModel
Throws:
java.lang.Exception - if something goes wrong

attIndex

public final int attIndex()
Returns index of attribute for which split was generated.


classProb

public final double classProb(int classIndex,
                              Instance instance,
                              int theSubset)
                       throws java.lang.Exception
Gets class probability for instance.

Overrides:
classProb in class ClassifierSplitModel
Throws:
java.lang.Exception - if something goes wrong

codingCost

public final double codingCost()
Returns coding cost for split (used in rule learner).

Overrides:
codingCost in class ClassifierSplitModel

gainRatio

public final double gainRatio()
Returns (C4.5-type) gain ratio for the generated split.


handleEnumeratedAttribute

private void handleEnumeratedAttribute(Instances trainInstances)
                                throws java.lang.Exception
Creates split on enumerated attribute.

Throws:
java.lang.Exception - if something goes wrong

handleNumericAttribute

private void handleNumericAttribute(Instances trainInstances)
                             throws java.lang.Exception
Creates split on numeric attribute.

Throws:
java.lang.Exception - if something goes wrong

infoGain

public final double infoGain()
Returns (C4.5-type) information gain for the generated split.


leftSide

public final java.lang.String leftSide(Instances data)
Prints left side of condition..

Specified by:
leftSide in class ClassifierSplitModel
Parameters:
data - training set.

rightSide

public final java.lang.String rightSide(int index,
                                        Instances data)
Prints the condition satisfied by instances in a subset.

Specified by:
rightSide in class ClassifierSplitModel
Parameters:
index - of subset
data - training set.

sourceExpression

public final java.lang.String sourceExpression(int index,
                                               Instances data)
Returns a string containing java source code equivalent to the test made at this node. The instance being tested is called "i".

Specified by:
sourceExpression in class ClassifierSplitModel
Parameters:
index - index of the nominal value tested
data - the data containing instance structure info
Returns:
a value of type 'String'

setSplitPoint

public final void setSplitPoint(Instances allInstances)
Sets split point to greatest value in given data smaller or equal to old split point. (C4.5 does this for some strange reason).


minsAndMaxs

public final double[][] minsAndMaxs(Instances data,
                                    double[][] minsAndMaxs,
                                    int index)
Returns the minsAndMaxs of the index.th subset.


resetDistribution

public void resetDistribution(Instances data)
                       throws java.lang.Exception
Sets distribution associated with model.

Overrides:
resetDistribution in class ClassifierSplitModel
Throws:
java.lang.Exception

weights

public final double[] weights(Instance instance)
Returns weights if instance is assigned to more than one subset. Returns null if instance is only assigned to one subset.

Specified by:
weights in class ClassifierSplitModel

whichSubset

public final int whichSubset(Instance instance)
                      throws java.lang.Exception
Returns index of subset instance is assigned to. Returns -1 if instance is assigned to more than one subset.

Specified by:
whichSubset in class ClassifierSplitModel
Throws:
java.lang.Exception - if something goes wrong