weka.associations
Class ItemSet

java.lang.Object
  extended byweka.associations.ItemSet
All Implemented Interfaces:
java.io.Serializable

public class ItemSet
extends java.lang.Object
implements java.io.Serializable

Class for storing a set of items. Item sets are stored in a lexicographic order, which is determined by the header information of the set of instances used for generating the set of items. All methods in this class assume that item sets are stored in lexicographic order.

Version:
$Revision: 1.8 $
Author:
Eibe Frank (eibe@cs.waikato.ac.nz)
See Also:
Serialized Form

Field Summary
protected  int m_counter
          Counter for how many transactions contain this item set.
protected  int[] m_items
          The items stored as an array of of ints.
protected  int m_totalTransactions
          The total number of transactions
 
Constructor Summary
ItemSet(int totalTrans)
          Constructor
 
Method Summary
static double confidenceForRule(ItemSet premise, ItemSet consequence)
          Outputs the confidence for a rule.
 boolean containedBy(Instance instance)
          Checks if an instance contains an item set.
 double convictionForRule(ItemSet premise, ItemSet consequence, int premiseCount, int consequenceCount)
          Outputs the conviction for a rule.
static FastVector deleteItemSets(FastVector itemSets, int minSupport, int maxSupport)
          Deletes all item sets that don't have minimum support.
 boolean equals(java.lang.Object itemSet)
          Tests if two item sets are equal.
 FastVector[] generateRules(double minConfidence, FastVector hashtables, int numItemsInSet)
          Generates all rules for an item set.
 FastVector[] generateRulesBruteForce(double minMetric, int metricType, FastVector hashtables, int numItemsInSet, int numTransactions, double significanceLevel)
          Generates all significant rules for an item set.
static java.util.Hashtable getHashtable(FastVector itemSets, int initialSize)
          Return a hashtable filled with the given item sets.
 int hashCode()
          Produces a hash code for a item set.
 double leverageForRule(ItemSet premise, ItemSet consequence, int premiseCount, int consequenceCount)
          Outputs the leverage for a rule.
 double liftForRule(ItemSet premise, ItemSet consequence, int consequenceCount)
          Outputs the lift for a rule.
static FastVector mergeAllItemSets(FastVector itemSets, int size, int totalTrans)
          Merges all item sets in the set of (k-1)-item sets to create the (k)-item sets and updates the counters.
private  FastVector[] moreComplexRules(FastVector[] rules, int numItemsInSet, int numItemsInConsequence, double minConfidence, FastVector hashtables)
          Generates rules with more than one item in the consequence.
static FastVector pruneItemSets(FastVector toPrune, java.util.Hashtable kMinusOne)
          Prunes a set of (k)-item sets using the given (k-1)-item sets.
static void pruneRules(FastVector[] rules, double minConfidence)
          Prunes a set of rules.
static FastVector singletons(Instances instances)
          Converts the header info of the given set of instances into a set of item sets (singletons).
 ItemSet subtract(ItemSet toSubtract)
          Subtracts an item set from another one.
 int support()
          Outputs the support for an item set.
 java.lang.String toString(Instances instances)
          Returns the contents of an item set as a string.
 void upDateCounter(Instance instance)
          Updates counter of item set with respect to given transaction.
static void upDateCounters(FastVector itemSets, Instances instances)
          Updates counters for a set of item sets and a set of instances.
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

m_items

protected int[] m_items
The items stored as an array of of ints.


m_counter

protected int m_counter
Counter for how many transactions contain this item set.


m_totalTransactions

protected int m_totalTransactions
The total number of transactions

Constructor Detail

ItemSet

public ItemSet(int totalTrans)
Constructor

Parameters:
totalTrans - the total number of transactions in the data
Method Detail

confidenceForRule

public static double confidenceForRule(ItemSet premise,
                                       ItemSet consequence)
Outputs the confidence for a rule.

Parameters:
premise - the premise of the rule
consequence - the consequence of the rule
Returns:
the confidence on the training data

liftForRule

public double liftForRule(ItemSet premise,
                          ItemSet consequence,
                          int consequenceCount)
Outputs the lift for a rule. Lift is defined as:
confidence / prob(consequence)

Parameters:
premise - the premise of the rule
consequence - the consequence of the rule
consequenceCount - how many times the consequence occurs independent of the premise
Returns:
the lift on the training data

leverageForRule

public double leverageForRule(ItemSet premise,
                              ItemSet consequence,
                              int premiseCount,
                              int consequenceCount)
Outputs the leverage for a rule. Leverage is defined as:
prob(premise & consequence) - (prob(premise) * prob(consequence))

Parameters:
premise - the premise of the rule
consequence - the consequence of the rule
premiseCount - how many times the premise occurs independent of the consequent
consequenceCount - how many times the consequence occurs independent of the premise
Returns:
the leverage on the training data

convictionForRule

public double convictionForRule(ItemSet premise,
                                ItemSet consequence,
                                int premiseCount,
                                int consequenceCount)
Outputs the conviction for a rule. Conviction is defined as:
prob(premise) * prob(!consequence) / prob(premise & !consequence)

Parameters:
premise - the premise of the rule
consequence - the consequence of the rule
premiseCount - how many times the premise occurs independent of the consequent
consequenceCount - how many times the consequence occurs independent of the premise
Returns:
the conviction on the training data

containedBy

public final boolean containedBy(Instance instance)
Checks if an instance contains an item set.

Parameters:
instance - the instance to be tested
Returns:
true if the given instance contains this item set

deleteItemSets

public static FastVector deleteItemSets(FastVector itemSets,
                                        int minSupport,
                                        int maxSupport)
Deletes all item sets that don't have minimum support.

Parameters:
itemSets - the set of item sets to be pruned
minSupport - the minimum number of transactions to be covered
Returns:
the reduced set of item sets

equals

public final boolean equals(java.lang.Object itemSet)
Tests if two item sets are equal.

Parameters:
itemSet - another item set
Returns:
true if this item set contains the same items as the given one

generateRules

public final FastVector[] generateRules(double minConfidence,
                                        FastVector hashtables,
                                        int numItemsInSet)
Generates all rules for an item set.

Parameters:
minConfidence - the minimum confidence the rules have to have
hashtables - containing all(!) previously generated item sets
numItemsInSet - the size of the item set for which the rules are to be generated
Returns:
all the rules with minimum confidence for the given item set

generateRulesBruteForce

public final FastVector[] generateRulesBruteForce(double minMetric,
                                                  int metricType,
                                                  FastVector hashtables,
                                                  int numItemsInSet,
                                                  int numTransactions,
                                                  double significanceLevel)
                                           throws java.lang.Exception
Generates all significant rules for an item set.

Parameters:
minMetric - the minimum metric (confidence, lift, leverage, improvement) the rules have to have
metricType - (confidence=0, lift, leverage, improvement)
hashtables - containing all(!) previously generated item sets
numItemsInSet - the size of the item set for which the rules are to be generated
Returns:
all the rules with minimum metric for the given item set
Throws:
java.lang.Exception - if something goes wrong

getHashtable

public static java.util.Hashtable getHashtable(FastVector itemSets,
                                               int initialSize)
Return a hashtable filled with the given item sets.

Parameters:
itemSets - the set of item sets to be used for filling the hash table
initialSize - the initial size of the hashtable
Returns:
the generated hashtable

hashCode

public final int hashCode()
Produces a hash code for a item set.

Returns:
a hash code for a set of items

mergeAllItemSets

public static FastVector mergeAllItemSets(FastVector itemSets,
                                          int size,
                                          int totalTrans)
Merges all item sets in the set of (k-1)-item sets to create the (k)-item sets and updates the counters.

Parameters:
itemSets - the set of (k-1)-item sets
size - the value of (k-1)
Returns:
the generated (k)-item sets

pruneItemSets

public static FastVector pruneItemSets(FastVector toPrune,
                                       java.util.Hashtable kMinusOne)
Prunes a set of (k)-item sets using the given (k-1)-item sets.

Parameters:
toPrune - the set of (k)-item sets to be pruned
kMinusOne - the (k-1)-item sets to be used for pruning
Returns:
the pruned set of item sets

pruneRules

public static void pruneRules(FastVector[] rules,
                              double minConfidence)
Prunes a set of rules.

Parameters:
rules - a two-dimensional array of lists of item sets. The first list of item sets contains the premises, the second one the consequences.
minConfidence - the minimum confidence the rules have to have

singletons

public static FastVector singletons(Instances instances)
                             throws java.lang.Exception
Converts the header info of the given set of instances into a set of item sets (singletons). The ordering of values in the header file determines the lexicographic order.

Parameters:
instances - the set of instances whose header info is to be used
Returns:
a set of item sets, each containing a single item
Throws:
java.lang.Exception - if singletons can't be generated successfully

subtract

public final ItemSet subtract(ItemSet toSubtract)
Subtracts an item set from another one.

Parameters:
toSubtract - the item set to be subtracted from this one.
Returns:
an item set that only contains items form this item sets that are not contained by toSubtract

support

public final int support()
Outputs the support for an item set.

Returns:
the support

toString

public final java.lang.String toString(Instances instances)
Returns the contents of an item set as a string.

Parameters:
instances - contains the relevant header information
Returns:
string describing the item set

upDateCounter

public final void upDateCounter(Instance instance)
Updates counter of item set with respect to given transaction.

Parameters:
instance - the instance to be used for ubdating the counter

upDateCounters

public static void upDateCounters(FastVector itemSets,
                                  Instances instances)
Updates counters for a set of item sets and a set of instances.

Parameters:
itemSets - the set of item sets which are to be updated
instances - the instances to be used for updating the counters

moreComplexRules

private final FastVector[] moreComplexRules(FastVector[] rules,
                                            int numItemsInSet,
                                            int numItemsInConsequence,
                                            double minConfidence,
                                            FastVector hashtables)
Generates rules with more than one item in the consequence.

Parameters:
rules - all the rules having (k-1)-item sets as consequences
numItemsInSet - the size of the item set for which the rules are to be generated
numItemsInConsequence - the value of (k-1)
minConfidence - the minimum confidence a rule has to have
hashtables - the hashtables containing all(!) previously generated item sets
Returns:
all the rules having (k)-item sets as consequences