weka.classifiers.bayes
Class NaiveBayesMultinomial

java.lang.Object
  extended byweka.classifiers.Classifier
      extended byweka.classifiers.bayes.NaiveBayesMultinomial
All Implemented Interfaces:
java.lang.Cloneable, OptionHandler, java.io.Serializable, WeightedInstancesHandler

public class NaiveBayesMultinomial
extends Classifier
implements WeightedInstancesHandler

The core equation for this classifier: P[Ci|D] = (P[D|Ci] x P[Ci]) / P[D] (Bayes rule) where Ci is class i and D is a document

See Also:
Serialized Form

Field Summary
(package private)  Instances headerInfo
           
private  double[] lnFactorialCache
           
private  int numAttributes
           
private  int numClasses
           
private  double[] probOfClass
           
private  double[][] probOfWordGivenClass
           
 
Fields inherited from class weka.classifiers.Classifier
m_Debug
 
Constructor Summary
NaiveBayesMultinomial()
           
 
Method Summary
 void buildClassifier(Instances instances)
          Generates the classifier.
 double[] distributionForInstance(Instance instance)
          Calculates the class membership probabilities for the given test instance.
 java.lang.String globalInfo()
          Returns a string describing this classifier
 double lnFactorial(int n)
          Fast computation of ln(n!)
static void main(java.lang.String[] argv)
          Main method for testing this class.
private  double probOfDocGivenClass(Instance inst, int classIndex)
          log(N!)
 java.lang.String toString()
           
 
Methods inherited from class weka.classifiers.Classifier
classifyInstance, debugTipText, forName, getDebug, getOptions, listOptions, makeCopies, setDebug, setOptions
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

probOfWordGivenClass

private double[][] probOfWordGivenClass

probOfClass

private double[] probOfClass

numAttributes

private int numAttributes

numClasses

private int numClasses

lnFactorialCache

private double[] lnFactorialCache

headerInfo

Instances headerInfo
Constructor Detail

NaiveBayesMultinomial

public NaiveBayesMultinomial()
Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this classifier

Returns:
a description of the classifier suitable for displaying in the explorer/experimenter gui

buildClassifier

public void buildClassifier(Instances instances)
                     throws java.lang.Exception
Generates the classifier.

Specified by:
buildClassifier in class Classifier
Parameters:
instances - set of instances serving as training data
Throws:
java.lang.Exception - if the classifier has not been generated successfully

distributionForInstance

public double[] distributionForInstance(Instance instance)
                                 throws java.lang.Exception
Calculates the class membership probabilities for the given test instance.

Overrides:
distributionForInstance in class Classifier
Parameters:
instance - the instance to be classified
Returns:
predicted class probability distribution
Throws:
java.lang.Exception - if there is a problem generating the prediction

probOfDocGivenClass

private double probOfDocGivenClass(Instance inst,
                                   int classIndex)
log(N!) + (for all the words)(log(Pi^ni) - log(ni!)) where N is the total number of words Pi is the probability of obtaining word i ni is the number of times the word at index i occurs in the document

Parameters:
inst - The instance to be classified
classIndex - The index of the class we are calculating the probability with respect to
Returns:
The log of the probability of the document occuring given the class

lnFactorial

public double lnFactorial(int n)
Fast computation of ln(n!) for non-negative ints negative ints are passed on to the general gamma-function based version in weka.core.SpecialFunctions if the current n value is higher than any previous one, the cache is extended and filled to cover it the common case is reduced to a simple array lookup

Parameters:
n - the integer
Returns:
ln(n!)

toString

public java.lang.String toString()

main

public static void main(java.lang.String[] argv)
Main method for testing this class.

Parameters:
argv - the options