weka.attributeSelection
Class PrincipalComponents

java.lang.Object
  extended byweka.attributeSelection.ASEvaluation
      extended byweka.attributeSelection.AttributeEvaluator
          extended byweka.attributeSelection.UnsupervisedAttributeEvaluator
              extended byweka.attributeSelection.PrincipalComponents
All Implemented Interfaces:
AttributeTransformer, OptionHandler, java.io.Serializable

public class PrincipalComponents
extends UnsupervisedAttributeEvaluator
implements AttributeTransformer, OptionHandler

Class for performing principal components analysis/transformation.

Valid options are:

-N
Don't normalize the input data.

-R
Retain enough pcs to account for this proportion of the variance.

-T
Transform through the PC space and back to the original space.

Version:
$Revision: 1.25 $
Author:
Mark Hall (mhall@cs.waikato.ac.nz), Gabi Schmidberger (gabi@cs.waikato.ac.nz)
See Also:
Serialized Form

Field Summary
private  Remove m_attribFilter
          used to remove the class column if a class column is set
private  Remove m_attributeFilter
           
private  int m_classIndex
          Class index
private  double[][] m_correlation
          Correlation matrix for the original data
private  double m_coverVariance
          the amount of varaince to cover in the original data when retaining the best n PC's
private  double[] m_eigenvalues
          Eigenvalues for the corresponding eigenvectors
private  double[][] m_eigenvectors
          Will hold the unordered linear transformations of the (normalized) original data
private  double[][] m_eTranspose
          holds the transposed eigenvectors for converting back to the original space
private  boolean m_hasClass
          Data has a class set
private  NominalToBinary m_nominalToBinFilter
           
private  boolean m_normalize
          normalize the input data?
private  Normalize m_normalizeFilter
           
private  int m_numAttribs
          Number of attributes
private  int m_numInstances
          Number of instances
private  Instances m_originalSpaceFormat
          The header for data transformed back to the original space
private  int m_outputNumAtts
          The number of attributes in the pc transformed data
private  ReplaceMissingValues m_replaceMissingFilter
          Filters for original data
private  int[] m_sortedEigens
          Sorted eigenvalues
private  double m_sumOfEigenValues
          sum of the eigenvalues
private  Instances m_trainCopy
          Keep a copy for the class attribute (if set)
private  Instances m_trainInstances
          The data to transform analyse/transform
private  boolean m_transBackToOriginal
          transform the data through the pc space and back to the original space ?
private  Instances m_transformedFormat
          The header for the transformed data format
 
Constructor Summary
PrincipalComponents()
           
 
Method Summary
private  void buildAttributeConstructor(Instances data)
           
 void buildEvaluator(Instances data)
          Initializes principal components and performs the analysis
 Instance convertInstance(Instance instance)
          Transform an instance in original (unormalized) format.
private  Instance convertInstanceToOriginal(Instance inst)
          Convert a pc transformed instance back to the original space
 double evaluateAttribute(int att)
          Evaluates the merit of a transformed attribute.
private  void fillCorrelation()
          Fill the correlation matrix
 boolean getNormalize()
          Gets whether or not input data is to be normalized
 java.lang.String[] getOptions()
          Gets the current settings of PrincipalComponents
 boolean getTransformBackToOriginal()
          Gets whether the data is to be transformed back to the original space.
 double getVarianceCovered()
          Gets the proportion of total variance to account for when retaining principal components
 java.lang.String globalInfo()
          Returns a string describing this attribute transformer
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(java.lang.String[] argv)
          Main method for testing this class
private  java.lang.String matrixToString(double[][] matrix)
          Return a matrix as a String
 java.lang.String normalizeTipText()
          Returns the tip text for this property
private  java.lang.String principalComponentsSummary()
          Return a summary of the analysis
private  void resetOptions()
          Reset to defaults
 void setNormalize(boolean n)
          Set whether input data will be normalized.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
private  Instances setOutputFormat()
          Set the format for the transformed data
private  Instances setOutputFormatOriginal()
          Set up the header for the PC->original space dataset
 void setTransformBackToOriginal(boolean b)
          Sets whether the data should be transformed back to the original space
 void setVarianceCovered(double vc)
          Sets the amount of variance to account for when retaining principal components
 java.lang.String toString()
          Returns a description of this attribute transformer
 java.lang.String transformBackToOriginalTipText()
          Returns the tip text for this property
 Instances transformedData()
          Gets the transformed training data.
 Instances transformedHeader()
          Returns just the header for the transformed data (ie. an empty set of instances.
 java.lang.String varianceCoveredTipText()
          Returns the tip text for this property
 
Methods inherited from class weka.attributeSelection.ASEvaluation
forName, makeCopies, postProcess
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_trainInstances

private Instances m_trainInstances
The data to transform analyse/transform


m_trainCopy

private Instances m_trainCopy
Keep a copy for the class attribute (if set)


m_transformedFormat

private Instances m_transformedFormat
The header for the transformed data format


m_originalSpaceFormat

private Instances m_originalSpaceFormat
The header for data transformed back to the original space


m_hasClass

private boolean m_hasClass
Data has a class set


m_classIndex

private int m_classIndex
Class index


m_numAttribs

private int m_numAttribs
Number of attributes


m_numInstances

private int m_numInstances
Number of instances


m_correlation

private double[][] m_correlation
Correlation matrix for the original data


m_eigenvectors

private double[][] m_eigenvectors
Will hold the unordered linear transformations of the (normalized) original data


m_eigenvalues

private double[] m_eigenvalues
Eigenvalues for the corresponding eigenvectors


m_sortedEigens

private int[] m_sortedEigens
Sorted eigenvalues


m_sumOfEigenValues

private double m_sumOfEigenValues
sum of the eigenvalues


m_replaceMissingFilter

private ReplaceMissingValues m_replaceMissingFilter
Filters for original data


m_normalizeFilter

private Normalize m_normalizeFilter

m_nominalToBinFilter

private NominalToBinary m_nominalToBinFilter

m_attributeFilter

private Remove m_attributeFilter

m_attribFilter

private Remove m_attribFilter
used to remove the class column if a class column is set


m_outputNumAtts

private int m_outputNumAtts
The number of attributes in the pc transformed data


m_normalize

private boolean m_normalize
normalize the input data?


m_coverVariance

private double m_coverVariance
the amount of varaince to cover in the original data when retaining the best n PC's


m_transBackToOriginal

private boolean m_transBackToOriginal
transform the data through the pc space and back to the original space ?


m_eTranspose

private double[][] m_eTranspose
holds the transposed eigenvectors for converting back to the original space

Constructor Detail

PrincipalComponents

public PrincipalComponents()
Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this attribute transformer

Returns:
a description of the evaluator suitable for displaying in the explorer/experimenter gui

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options. Valid options are:

-N
Don't normalize the input data.

-R
Retain enough pcs to account for this proportion of the variance.

-T
Transform through the PC space and back to the original space.

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

resetOptions

private void resetOptions()
Reset to defaults


normalizeTipText

public java.lang.String normalizeTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setNormalize

public void setNormalize(boolean n)
Set whether input data will be normalized.

Parameters:
n - true if input data is to be normalized

getNormalize

public boolean getNormalize()
Gets whether or not input data is to be normalized

Returns:
true if input data is to be normalized

varianceCoveredTipText

public java.lang.String varianceCoveredTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setVarianceCovered

public void setVarianceCovered(double vc)
Sets the amount of variance to account for when retaining principal components

Parameters:
vc - the proportion of total variance to account for

getVarianceCovered

public double getVarianceCovered()
Gets the proportion of total variance to account for when retaining principal components

Returns:
the proportion of variance to account for

transformBackToOriginalTipText

public java.lang.String transformBackToOriginalTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setTransformBackToOriginal

public void setTransformBackToOriginal(boolean b)
Sets whether the data should be transformed back to the original space

Parameters:
b - true if the data should be transformed back to the original space

getTransformBackToOriginal

public boolean getTransformBackToOriginal()
Gets whether the data is to be transformed back to the original space.

Returns:
true if the data is to be transformed back to the original space

getOptions

public java.lang.String[] getOptions()
Gets the current settings of PrincipalComponents

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions()

buildEvaluator

public void buildEvaluator(Instances data)
                    throws java.lang.Exception
Initializes principal components and performs the analysis

Specified by:
buildEvaluator in class ASEvaluation
Parameters:
data - the instances to analyse/transform
Throws:
java.lang.Exception - if analysis fails

buildAttributeConstructor

private void buildAttributeConstructor(Instances data)
                                throws java.lang.Exception
Throws:
java.lang.Exception

transformedHeader

public Instances transformedHeader()
                            throws java.lang.Exception
Returns just the header for the transformed data (ie. an empty set of instances. This is so that AttributeSelection can determine the structure of the transformed data without actually having to get all the transformed data through getTransformedData().

Specified by:
transformedHeader in interface AttributeTransformer
Returns:
the header of the transformed data.
Throws:
java.lang.Exception - if the header of the transformed data can't be determined.

transformedData

public Instances transformedData()
                          throws java.lang.Exception
Gets the transformed training data.

Specified by:
transformedData in interface AttributeTransformer
Returns:
the transformed training data
Throws:
java.lang.Exception - if transformed data can't be returned

evaluateAttribute

public double evaluateAttribute(int att)
                         throws java.lang.Exception
Evaluates the merit of a transformed attribute. This is defined to be 1 minus the cumulative variance explained. Merit can't be meaningfully evaluated if the data is to be transformed back to the original space.

Specified by:
evaluateAttribute in class AttributeEvaluator
Parameters:
att - the attribute to be evaluated
Returns:
the merit of a transformed attribute
Throws:
java.lang.Exception - if attribute can't be evaluated

fillCorrelation

private void fillCorrelation()
Fill the correlation matrix


principalComponentsSummary

private java.lang.String principalComponentsSummary()
Return a summary of the analysis

Returns:
a summary of the analysis.

toString

public java.lang.String toString()
Returns a description of this attribute transformer

Returns:
a String describing this attribute transformer

matrixToString

private java.lang.String matrixToString(double[][] matrix)
Return a matrix as a String

Parameters:
matrix - that is decribed as a string
Returns:
a String describing a matrix

convertInstanceToOriginal

private Instance convertInstanceToOriginal(Instance inst)
                                    throws java.lang.Exception
Convert a pc transformed instance back to the original space

Throws:
java.lang.Exception

convertInstance

public Instance convertInstance(Instance instance)
                         throws java.lang.Exception
Transform an instance in original (unormalized) format. Convert back to the original space if requested.

Specified by:
convertInstance in interface AttributeTransformer
Parameters:
instance - an instance in the original (unormalized) format
Returns:
a transformed instance
Throws:
java.lang.Exception - if instance cant be transformed

setOutputFormatOriginal

private Instances setOutputFormatOriginal()
                                   throws java.lang.Exception
Set up the header for the PC->original space dataset

Throws:
java.lang.Exception

setOutputFormat

private Instances setOutputFormat()
                           throws java.lang.Exception
Set the format for the transformed data

Returns:
a set of empty Instances (header only) in the new format
Throws:
java.lang.Exception - if the output format can't be set

main

public static void main(java.lang.String[] argv)
Main method for testing this class

Parameters:
argv - should contain the command line arguments to the evaluator/transformer (see AttributeSelection)