weka.associations
Class Apriori

java.lang.Object
  extended byweka.associations.Associator
      extended byweka.associations.Apriori
All Implemented Interfaces:
java.lang.Cloneable, OptionHandler, java.io.Serializable

public class Apriori
extends Associator
implements OptionHandler

Class implementing an Apriori-type algorithm. Iteratively reduces the minimum support until it finds the required number of rules with the given minimum confidence.

Reference: R. Agrawal, R. Srikant (1994). Fast algorithms for mining association rules in large databases . Proc International Conference on Very Large Databases, pp. 478-499. Santiage, Chile: Morgan Kaufmann, Los Altos, CA.

Valid options are:

-N required number of rules
The required number of rules (default: 10).

-T type of metric by which to sort rules
0 = confidence | 1 = lift | 2 = leverage | 3 = Conviction.

-C minimum confidence of a rule
The minimum confidence of a rule (default: 0.9).

-D delta for minimum support
The delta by which the minimum support is decreased in each iteration (default: 0.05).

-U upper bound for minimum support
The upper bound for minimum support. Don't explicitly look for rules with more than this level of support.

-M lower bound for minimum support
The lower bound for the minimum support (default = 0.1).

-S significance level
If used, rules are tested for significance at the given level. Slower (default = no significance testing).

-R
If set then columns that contain all missing values are removed from the data. -I
If set the itemsets found are also output (default = no).

Version:
$Revision: 1.15 $
Author:
Eibe Frank (eibe@cs.waikato.ac.nz), Mark Hall (mhall@cs.waikato.ac.nz)
See Also:
Serialized Form

Field Summary
protected static int CONFIDENCE
          Metric types.
protected static int CONVICTION
           
protected static int LEVERAGE
           
protected static int LIFT
           
protected  FastVector[] m_allTheRules
          The list of all generated rules.
protected  int m_cycles
          Number of cycles used before required number of rules was one.
protected  double m_delta
          Delta by which m_minSupport is decreased in each iteration.
protected  FastVector m_hashtables
          The same information stored in hash tables.
protected  Instances m_instances
          The instances (transactions) to be used for generating the association rules.
protected  double m_lowerBoundMinSupport
          The lower bound for the minimum support.
protected  FastVector m_Ls
          The set of all sets of itemsets L.
protected  int m_metricType
          The selected metric type.
protected  double m_minMetric
          The minimum metric score.
protected  double m_minSupport
          The minimum support.
protected  int m_numRules
          The maximum number of rules that are output.
protected  boolean m_outputItemSets
          Output itemsets found?
protected  boolean m_removeMissingCols
           
protected  double m_significanceLevel
          Significance level for optional significance test.
protected  double m_upperBoundMinSupport
          The upper bound on the support
protected  boolean m_verbose
          Report progress iteratively
static Tag[] TAGS_SELECTION
           
 
Constructor Summary
Apriori()
          Constructor that allows to sets default values for the minimum confidence and the maximum number of rules the minimum confidence.
 
Method Summary
 void buildAssociations(Instances instances)
          Method that generates all large itemsets with a minimum support, and from these all association rules with a minimum confidence.
 java.lang.String deltaTipText()
          Returns the tip text for this property
private  void findLargeItemSets(Instances instances)
          Method that finds all large itemsets for the given set of instances.
private  void findRulesBruteForce()
          Method that finds all association rules and performs significance test.
private  void findRulesQuickly()
          Method that finds all association rules.
 double getDelta()
          Get the value of delta.
 double getLowerBoundMinSupport()
          Get the value of lowerBoundMinSupport.
 SelectedTag getMetricType()
          Get the metric type
 double getMinMetric()
          Get the value of minConfidence.
 int getNumRules()
          Get the value of numRules.
 java.lang.String[] getOptions()
          Gets the current settings of the Apriori object.
 boolean getRemoveAllMissingCols()
          Returns whether columns containing all missing values are to be removed
 double getSignificanceLevel()
          Get the value of significanceLevel.
 double getUpperBoundMinSupport()
          Get the value of upperBoundMinSupport.
 java.lang.String globalInfo()
          Returns a string describing this associator
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
 java.lang.String lowerBoundMinSupportTipText()
          Returns the tip text for this property
static void main(java.lang.String[] options)
          Main method for testing this class.
 java.lang.String metricTypeTipText()
          Returns the tip text for this property
 java.lang.String minMetricTipText()
          Returns the tip text for this property
 java.lang.String numRulesTipText()
          Returns the tip text for this property
 java.lang.String removeAllMissingColsTipText()
          Returns the tip text for this property
private  Instances removeMissingColumns(Instances instances)
          Removes columns that are all missing from the data
 void resetOptions()
          Resets the options to the default values.
 void setDelta(double v)
          Set the value of delta.
 void setLowerBoundMinSupport(double v)
          Set the value of lowerBoundMinSupport.
 void setMetricType(SelectedTag d)
          Set the metric type for ranking rules
 void setMinMetric(double v)
          Set the value of minConfidence.
 void setNumRules(int v)
          Set the value of numRules.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setRemoveAllMissingCols(boolean r)
          Remove columns containing all missing values.
 void setSignificanceLevel(double v)
          Set the value of significanceLevel.
 void setUpperBoundMinSupport(double v)
          Set the value of upperBoundMinSupport.
 java.lang.String significanceLevelTipText()
          Returns the tip text for this property
 java.lang.String toString()
          Outputs the size of all the generated sets of itemsets and the rules.
 java.lang.String upperBoundMinSupportTipText()
          Returns the tip text for this property
 
Methods inherited from class weka.associations.Associator
forName, makeCopies
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_minSupport

protected double m_minSupport
The minimum support.


m_upperBoundMinSupport

protected double m_upperBoundMinSupport
The upper bound on the support


m_lowerBoundMinSupport

protected double m_lowerBoundMinSupport
The lower bound for the minimum support.


CONFIDENCE

protected static final int CONFIDENCE
Metric types.

See Also:
Constant Field Values

LIFT

protected static final int LIFT
See Also:
Constant Field Values

LEVERAGE

protected static final int LEVERAGE
See Also:
Constant Field Values

CONVICTION

protected static final int CONVICTION
See Also:
Constant Field Values

TAGS_SELECTION

public static final Tag[] TAGS_SELECTION

m_metricType

protected int m_metricType
The selected metric type.


m_minMetric

protected double m_minMetric
The minimum metric score.


m_numRules

protected int m_numRules
The maximum number of rules that are output.


m_delta

protected double m_delta
Delta by which m_minSupport is decreased in each iteration.


m_significanceLevel

protected double m_significanceLevel
Significance level for optional significance test.


m_cycles

protected int m_cycles
Number of cycles used before required number of rules was one.


m_Ls

protected FastVector m_Ls
The set of all sets of itemsets L.


m_hashtables

protected FastVector m_hashtables
The same information stored in hash tables.


m_allTheRules

protected FastVector[] m_allTheRules
The list of all generated rules.


m_instances

protected Instances m_instances
The instances (transactions) to be used for generating the association rules.


m_outputItemSets

protected boolean m_outputItemSets
Output itemsets found?


m_removeMissingCols

protected boolean m_removeMissingCols

m_verbose

protected boolean m_verbose
Report progress iteratively

Constructor Detail

Apriori

public Apriori()
Constructor that allows to sets default values for the minimum confidence and the maximum number of rules the minimum confidence.

Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this associator

Returns:
a description of the evaluator suitable for displaying in the explorer/experimenter gui

resetOptions

public void resetOptions()
Resets the options to the default values.


removeMissingColumns

private Instances removeMissingColumns(Instances instances)
                                throws java.lang.Exception
Removes columns that are all missing from the data

Parameters:
instances - the instances
Returns:
a new set of instances with all missing columns removed
Throws:
java.lang.Exception

buildAssociations

public void buildAssociations(Instances instances)
                       throws java.lang.Exception
Method that generates all large itemsets with a minimum support, and from these all association rules with a minimum confidence.

Specified by:
buildAssociations in class Associator
Parameters:
instances - the instances to be used for generating the associations
Throws:
java.lang.Exception - if rules can't be built successfully

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options. Valid options are:

-N required number of rules
The required number of rules (default: 10).

-T type of metric by which to sort rules
0 = confidence | 1 = lift | 2 = leverage | 3 = Conviction.

-C minimum metric score of a rule
The minimum confidence of a rule (default: 0.9).

-D delta for minimum support
The delta by which the minimum support is decreased in each iteration (default: 0.05). -U upper bound for minimum support
The upper bound for minimum support. Don't explicitly look for rules with more than this level of support.

-M lower bound for minimum support
The lower bound for the minimum support (default = 0.1).

-S significance level
If used, rules are tested for significance at the given level. Slower (default = no significance testing).

-I
If set the itemsets found are also output (default = no).

-V
If set then progress is reported iteratively during execution.

-R
If set then columns that contain all missing values are removed from the data.

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the Apriori object.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions

toString

public java.lang.String toString()
Outputs the size of all the generated sets of itemsets and the rules.


removeAllMissingColsTipText

public java.lang.String removeAllMissingColsTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setRemoveAllMissingCols

public void setRemoveAllMissingCols(boolean r)
Remove columns containing all missing values.

Parameters:
r - true if cols are to be removed.

getRemoveAllMissingCols

public boolean getRemoveAllMissingCols()
Returns whether columns containing all missing values are to be removed

Returns:
true if columns are to be removed.

upperBoundMinSupportTipText

public java.lang.String upperBoundMinSupportTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getUpperBoundMinSupport

public double getUpperBoundMinSupport()
Get the value of upperBoundMinSupport.

Returns:
Value of upperBoundMinSupport.

setUpperBoundMinSupport

public void setUpperBoundMinSupport(double v)
Set the value of upperBoundMinSupport.

Parameters:
v - Value to assign to upperBoundMinSupport.

lowerBoundMinSupportTipText

public java.lang.String lowerBoundMinSupportTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getLowerBoundMinSupport

public double getLowerBoundMinSupport()
Get the value of lowerBoundMinSupport.

Returns:
Value of lowerBoundMinSupport.

setLowerBoundMinSupport

public void setLowerBoundMinSupport(double v)
Set the value of lowerBoundMinSupport.

Parameters:
v - Value to assign to lowerBoundMinSupport.

getMetricType

public SelectedTag getMetricType()
Get the metric type

Returns:
the type of metric to use for ranking rules

metricTypeTipText

public java.lang.String metricTypeTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setMetricType

public void setMetricType(SelectedTag d)
Set the metric type for ranking rules

Parameters:
d - the type of metric

minMetricTipText

public java.lang.String minMetricTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getMinMetric

public double getMinMetric()
Get the value of minConfidence.

Returns:
Value of minConfidence.

setMinMetric

public void setMinMetric(double v)
Set the value of minConfidence.

Parameters:
v - Value to assign to minConfidence.

numRulesTipText

public java.lang.String numRulesTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getNumRules

public int getNumRules()
Get the value of numRules.

Returns:
Value of numRules.

setNumRules

public void setNumRules(int v)
Set the value of numRules.

Parameters:
v - Value to assign to numRules.

deltaTipText

public java.lang.String deltaTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getDelta

public double getDelta()
Get the value of delta.

Returns:
Value of delta.

setDelta

public void setDelta(double v)
Set the value of delta.

Parameters:
v - Value to assign to delta.

significanceLevelTipText

public java.lang.String significanceLevelTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getSignificanceLevel

public double getSignificanceLevel()
Get the value of significanceLevel.

Returns:
Value of significanceLevel.

setSignificanceLevel

public void setSignificanceLevel(double v)
Set the value of significanceLevel.

Parameters:
v - Value to assign to significanceLevel.

findLargeItemSets

private void findLargeItemSets(Instances instances)
                        throws java.lang.Exception
Method that finds all large itemsets for the given set of instances.

Throws:
java.lang.Exception - if an attribute is numeric

findRulesBruteForce

private void findRulesBruteForce()
                          throws java.lang.Exception
Method that finds all association rules and performs significance test.

Throws:
java.lang.Exception - if an attribute is numeric

findRulesQuickly

private void findRulesQuickly()
                       throws java.lang.Exception
Method that finds all association rules.

Throws:
java.lang.Exception - if an attribute is numeric

main

public static void main(java.lang.String[] options)
Main method for testing this class.