weka.experiment
Class PairedTTester

java.lang.Object
  extended byweka.experiment.PairedTTester
All Implemented Interfaces:
OptionHandler
Direct Known Subclasses:
PairedCorrectedTTester

public class PairedTTester
extends java.lang.Object
implements OptionHandler

Calculates T-Test statistics on data stored in a set of instances.

Valid options from the command-line are:

-D num,num2...
The column numbers that uniquely specify a dataset. (default last)

-R num
The column number containing the run number. (default last)

-F num
The column number containing the fold number. (default none)

-S num
The significance level for T-Tests. (default 0.05)

-G num,num2...
The column numbers that uniquely specify one result generator (eg: scheme name plus options). (default last)

Version:
$Revision: 1.17 $
Author:
Len Trigg (trigg@cs.waikato.ac.nz)

Nested Class Summary
protected  class PairedTTester.Dataset
           
protected  class PairedTTester.DatasetSpecifiers
           
protected  class PairedTTester.Resultset
           
 
Field Summary
protected  int[] m_DatasetKeyColumns
          An array containing the indexes of just the selected columns
protected  Range m_DatasetKeyColumnsRange
          The range of columns that specify a unique "dataset" (eg: scheme plus configuration)
protected  PairedTTester.DatasetSpecifiers m_DatasetSpecifiers
          The list of dataset specifiers
protected  int m_FoldColumn
          The option setting for the fold number column (-1 means none)
protected  Instances m_Instances
          The set of instances we will analyse
protected  boolean m_latexOutput
          Produce tables in latex format
protected  int[] m_ResultsetKeyColumns
          An array containing the indexes of just the selected columns
protected  Range m_ResultsetKeyColumnsRange
          The range of columns that specify a unique result set (eg: scheme plus configuration)
protected  FastVector m_Resultsets
          Stores a vector for each resultset holding all instances in each set
protected  boolean m_ResultsetsValid
          Indicates whether the instances have been partitioned
protected  int m_RunColumn
          The index of the column containing the run number
protected  int m_RunColumnSet
          The option setting for the run number column (-1 means last)
protected  boolean m_ShowStdDevs
          Indicates whether standard deviations should be displayed
protected  double m_SignificanceLevel
          The significance level for comparisons
 
Constructor Summary
PairedTTester()
           
 
Method Summary
 PairedStats calculateStatistics(Instance datasetSpecifier, int resultset1Index, int resultset2Index, int comparisonColumn)
          Computes a paired t-test comparison for a specified dataset between two resultsets.
 Range getDatasetKeyColumns()
          Get the value of DatasetKeyColumns.
 int getFoldColumn()
          Get the value of FoldColumn.
 Instances getInstances()
          Get the value of Instances.
 int getNumDatasets()
          Gets the number of datasets in the resultsets
 int getNumResultsets()
          Gets the number of resultsets in the data.
 java.lang.String[] getOptions()
          Gets current settings of the PairedTTester.
 boolean getProduceLatex()
          Get whether latex is output
 Range getResultsetKeyColumns()
          Get the value of ResultsetKeyColumns.
 java.lang.String getResultsetName(int index)
          Gets a string descriptive of the specified resultset.
 int getRunColumn()
          Get the value of RunColumn.
 boolean getShowStdDevs()
          Returns true if standard deviations have been requested.
 double getSignificanceLevel()
          Get the value of SignificanceLevel.
 java.lang.String header(int comparisonColumn)
          Creates a "header" string describing the current resultsets.
 java.util.Enumeration listOptions()
          Lists options understood by this object.
static void main(java.lang.String[] args)
          Test the class from the command line.
 java.lang.String multiResultsetFull(int baseResultset, int comparisonColumn)
          Creates a comparison table where a base resultset is compared to the other resultsets.
private  java.lang.String multiResultsetFullLatex(int baseResultset, int comparisonColumn, int maxWidthMean, int maxWidthStdDev)
          Generates a comparison table in latex table format
private  java.lang.String multiResultsetFullPlainText(int baseResultset, int comparisonColumn, int maxWidthMean, int maxWidthStdDev)
          Generates a comparison table in latex table format
 java.lang.String multiResultsetRanking(int comparisonColumn)
           
 java.lang.String multiResultsetSummary(int comparisonColumn)
          Carries out a comparison between all resultsets, counting the number of datsets where one resultset outperforms the other.
 int[][] multiResultsetWins(int comparisonColumn)
          Carries out a comparison between all resultsets, counting the number of datsets where one resultset outperforms the other.
protected  void prepareData()
          Separates the instances into resultsets and by dataset/run.
 java.lang.String resultsetKey()
          Creates a key that maps resultset numbers to their descriptions.
 void setDatasetKeyColumns(Range newDatasetKeyColumns)
          Set the value of DatasetKeyColumns.
 void setFoldColumn(int newFoldColumn)
          Set the value of FoldColumn.
 void setInstances(Instances newInstances)
          Set the value of Instances.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setProduceLatex(boolean l)
          Set whether latex is output
 void setResultsetKeyColumns(Range newResultsetKeyColumns)
          Set the value of ResultsetKeyColumns.
 void setRunColumn(int newRunColumn)
          Set the value of RunColumn.
 void setShowStdDevs(boolean s)
          Set whether standard deviations are displayed or not.
 void setSignificanceLevel(double newSignificanceLevel)
          Set the value of SignificanceLevel.
protected  java.lang.String templateString(Instance template)
          Returns a string descriptive of the key column values for the "datasets
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

m_Instances

protected Instances m_Instances
The set of instances we will analyse


m_RunColumn

protected int m_RunColumn
The index of the column containing the run number


m_RunColumnSet

protected int m_RunColumnSet
The option setting for the run number column (-1 means last)


m_FoldColumn

protected int m_FoldColumn
The option setting for the fold number column (-1 means none)


m_SignificanceLevel

protected double m_SignificanceLevel
The significance level for comparisons


m_DatasetKeyColumnsRange

protected Range m_DatasetKeyColumnsRange
The range of columns that specify a unique "dataset" (eg: scheme plus configuration)


m_DatasetKeyColumns

protected int[] m_DatasetKeyColumns
An array containing the indexes of just the selected columns


m_DatasetSpecifiers

protected PairedTTester.DatasetSpecifiers m_DatasetSpecifiers
The list of dataset specifiers


m_ResultsetKeyColumnsRange

protected Range m_ResultsetKeyColumnsRange
The range of columns that specify a unique result set (eg: scheme plus configuration)


m_ResultsetKeyColumns

protected int[] m_ResultsetKeyColumns
An array containing the indexes of just the selected columns


m_Resultsets

protected FastVector m_Resultsets
Stores a vector for each resultset holding all instances in each set


m_ResultsetsValid

protected boolean m_ResultsetsValid
Indicates whether the instances have been partitioned


m_ShowStdDevs

protected boolean m_ShowStdDevs
Indicates whether standard deviations should be displayed


m_latexOutput

protected boolean m_latexOutput
Produce tables in latex format

Constructor Detail

PairedTTester

public PairedTTester()
Method Detail

templateString

protected java.lang.String templateString(Instance template)
Returns a string descriptive of the key column values for the "datasets

Parameters:
template - the template
Returns:
a value of type 'String'

setProduceLatex

public void setProduceLatex(boolean l)
Set whether latex is output

Parameters:
l - true if tables are to be produced in Latex format

getProduceLatex

public boolean getProduceLatex()
Get whether latex is output

Returns:
true if Latex is to be output

setShowStdDevs

public void setShowStdDevs(boolean s)
Set whether standard deviations are displayed or not.

Parameters:
s - true if standard deviations are to be displayed

getShowStdDevs

public boolean getShowStdDevs()
Returns true if standard deviations have been requested.

Returns:
true if standard deviations are to be displayed.

prepareData

protected void prepareData()
                    throws java.lang.Exception
Separates the instances into resultsets and by dataset/run.

Throws:
java.lang.Exception - if the TTest parameters have not been set.

getNumDatasets

public int getNumDatasets()
Gets the number of datasets in the resultsets

Returns:
the number of datasets in the resultsets

getNumResultsets

public int getNumResultsets()
Gets the number of resultsets in the data.

Returns:
the number of resultsets in the data

getResultsetName

public java.lang.String getResultsetName(int index)
Gets a string descriptive of the specified resultset.

Parameters:
index - the index of the resultset
Returns:
a descriptive string for the resultset

calculateStatistics

public PairedStats calculateStatistics(Instance datasetSpecifier,
                                       int resultset1Index,
                                       int resultset2Index,
                                       int comparisonColumn)
                                throws java.lang.Exception
Computes a paired t-test comparison for a specified dataset between two resultsets.

Parameters:
datasetSpecifier - the dataset specifier
resultset1Index - the index of the first resultset
resultset2Index - the index of the second resultset
comparisonColumn - the column containing values to compare
Returns:
the results of the paired comparison
Throws:
java.lang.Exception - if an error occurs

resultsetKey

public java.lang.String resultsetKey()
Creates a key that maps resultset numbers to their descriptions.

Returns:
a value of type 'String'

header

public java.lang.String header(int comparisonColumn)
Creates a "header" string describing the current resultsets.

Parameters:
comparisonColumn - a value of type 'int'
Returns:
a value of type 'String'

multiResultsetWins

public int[][] multiResultsetWins(int comparisonColumn)
                           throws java.lang.Exception
Carries out a comparison between all resultsets, counting the number of datsets where one resultset outperforms the other.

Parameters:
comparisonColumn - the index of the comparison column
Returns:
a 2d array where element [i][j] is the number of times resultset j performed significantly better than resultset i.
Throws:
java.lang.Exception - if an error occurs

multiResultsetSummary

public java.lang.String multiResultsetSummary(int comparisonColumn)
                                       throws java.lang.Exception
Carries out a comparison between all resultsets, counting the number of datsets where one resultset outperforms the other. The results are summarized in a table.

Parameters:
comparisonColumn - the index of the comparison column
Returns:
the results in a string
Throws:
java.lang.Exception - if an error occurs

multiResultsetRanking

public java.lang.String multiResultsetRanking(int comparisonColumn)
                                       throws java.lang.Exception
Throws:
java.lang.Exception

multiResultsetFullLatex

private java.lang.String multiResultsetFullLatex(int baseResultset,
                                                 int comparisonColumn,
                                                 int maxWidthMean,
                                                 int maxWidthStdDev)
Generates a comparison table in latex table format

Parameters:
baseResultset - the index of the base resultset
comparisonColumn - the index of the column to compare over
maxWidthMean - width for the mean
maxWidthStdDev - width for the standard deviation
Returns:
the comparison table string

multiResultsetFullPlainText

private java.lang.String multiResultsetFullPlainText(int baseResultset,
                                                     int comparisonColumn,
                                                     int maxWidthMean,
                                                     int maxWidthStdDev)
Generates a comparison table in latex table format

Parameters:
baseResultset - the index of the base resultset
comparisonColumn - the index of the column to compare over
maxWidthMean - width for the mean
maxWidthStdDev - width for the standard deviation
Returns:
the comparison table string

multiResultsetFull

public java.lang.String multiResultsetFull(int baseResultset,
                                           int comparisonColumn)
                                    throws java.lang.Exception
Creates a comparison table where a base resultset is compared to the other resultsets. Results are presented for every dataset.

Parameters:
baseResultset - the index of the base resultset
comparisonColumn - the index of the column to compare over
Returns:
the comparison table string
Throws:
java.lang.Exception - if an error occurs

listOptions

public java.util.Enumeration listOptions()
Lists options understood by this object.

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of Options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options. Valid options are:

-D num,num2...
The column numbers that uniquely specify a dataset. (default last)

-R num
The column number containing the run number. (default last)

-F num
The column number containing the fold number. (default none)

-S num
The significance level for T-Tests. (default 0.05)

-G num,num2...
The column numbers that uniquely specify one result generator (eg: scheme name plus options). (default last)

-V
Show standard deviations

-L
Produce comparison tables in Latex table format

Specified by:
setOptions in interface OptionHandler
Parameters:
options - an array containing options to set.
Throws:
java.lang.Exception - if invalid options are given

getOptions

public java.lang.String[] getOptions()
Gets current settings of the PairedTTester.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings containing current options.

getResultsetKeyColumns

public Range getResultsetKeyColumns()
Get the value of ResultsetKeyColumns.

Returns:
Value of ResultsetKeyColumns.

setResultsetKeyColumns

public void setResultsetKeyColumns(Range newResultsetKeyColumns)
Set the value of ResultsetKeyColumns.

Parameters:
newResultsetKeyColumns - Value to assign to ResultsetKeyColumns.

getSignificanceLevel

public double getSignificanceLevel()
Get the value of SignificanceLevel.

Returns:
Value of SignificanceLevel.

setSignificanceLevel

public void setSignificanceLevel(double newSignificanceLevel)
Set the value of SignificanceLevel.

Parameters:
newSignificanceLevel - Value to assign to SignificanceLevel.

getDatasetKeyColumns

public Range getDatasetKeyColumns()
Get the value of DatasetKeyColumns.

Returns:
Value of DatasetKeyColumns.

setDatasetKeyColumns

public void setDatasetKeyColumns(Range newDatasetKeyColumns)
Set the value of DatasetKeyColumns.

Parameters:
newDatasetKeyColumns - Value to assign to DatasetKeyColumns.

getRunColumn

public int getRunColumn()
Get the value of RunColumn.

Returns:
Value of RunColumn.

setRunColumn

public void setRunColumn(int newRunColumn)
Set the value of RunColumn.

Parameters:
newRunColumn - Value to assign to RunColumn.

getFoldColumn

public int getFoldColumn()
Get the value of FoldColumn.

Returns:
Value of FoldColumn.

setFoldColumn

public void setFoldColumn(int newFoldColumn)
Set the value of FoldColumn.

Parameters:
newFoldColumn - Value to assign to FoldColumn.

getInstances

public Instances getInstances()
Get the value of Instances.

Returns:
Value of Instances.

setInstances

public void setInstances(Instances newInstances)
Set the value of Instances.

Parameters:
newInstances - Value to assign to Instances.

main

public static void main(java.lang.String[] args)
Test the class from the command line.

Parameters:
args - contains options for the instance ttests