weka.filters.unsupervised.attribute
Class RandomProjection

java.lang.Object
  extended byweka.filters.Filter
      extended byweka.filters.unsupervised.attribute.RandomProjection
All Implemented Interfaces:
OptionHandler, java.io.Serializable, UnsupervisedFilter

public class RandomProjection
extends Filter
implements UnsupervisedFilter, OptionHandler

Reduces the dimensionality of the data by projecting it onto a lower dimensional subspace using a random matrix with columns of unit length (It will reduce the number of attributes in the data while preserving much of its variation like PCA, but at a much less computational cost).
It first applies the NominalToBinary filter to convert all attributes to numeric before reducing the dimension. It preserves the class attribute.

Valid filter-specific options are:

-N
The number of dimensions (attributes) the data should be reduced to (exclusive of the class attribute).

-P
The percentage of dimensions (attributes) the data should be reduced to (exclusive of the class attribute). This -N option is ignored if this option is present or is greater than zero.

-D
The distribution to use for calculating the random matrix.

  • 1 - Sparse distribution of: (default)
    sqrt(3)*{+1 with prob(1/6), 0 with prob(2/3), -1 with prob(1/6)}
  • 2 - Sparse distribution of:
    {+1 with prob(1/2), -1 with prob(1/2)}
  • 3 - Gaussian distribution
  • -M
    Replace missing values using the ReplaceMissingValues filter -R
    Specify the random seed for the random number generator for calculating the random matrix.

    Version:
    1.0 - 22 July 2003 - Initial version (Ashraf M. Kibriya)
    Author:
    Ashraf M. Kibriya (amk14@cs.waikato.ac.nz)
    See Also:
    Serialized Form

    Field Summary
    private static int GAUSSIAN
              The types of distributions that can be used for calculating the random matrix
    private  int m_distribution
              Stores the distribution to use for calculating the random matrix
    private  int m_k
              Stores the number of dimensions to reduce the data to
    private  boolean m_OutputFormatDefined
              Keeps track of output format if it is defined or not
    private  double m_percent
              Stores the dimensionality the data should be reduced to as percentage of the original dimension
    private  boolean m_replaceMissing
              Should the missing values be replaced using unsupervised.ReplaceMissingValues filter
    private  long m_rndmSeed
              Stores the random seed used to generate the random matrix
    private  boolean m_useGaussian
              Is the random matrix will be computed using Gaussian distribution or not
    private  Filter ntob
              The NominalToBinary filter applied to the data before this filter
    private  java.util.Random r
              The random number generator used for generating the random matrix
    private  Filter replaceMissing
              The ReplaceMissingValues filter
    private  double[][] rmatrix
              The random matrix
    private static int SPARSE1
              The types of distributions that can be used for calculating the random matrix
    private static int SPARSE2
              The types of distributions that can be used for calculating the random matrix
    private static double sqrt3
               
    static Tag[] TAGS_DSTRS_TYPE
               
    private static int[] vals
               
    private static int[] vals2
               
    private static int[] weights
               
    private static int[] weights2
               
     
    Fields inherited from class weka.filters.Filter
    m_NewBatch
     
    Constructor Summary
    RandomProjection()
               
     
    Method Summary
     boolean batchFinished()
              Signify that this batch of input to the filter is finished.
    private  Instance convertInstance(Instance currentInstance)
              converts a single instance to the required format
     java.lang.String distributionTipText()
              Returns the tip text for this property
     SelectedTag getDistribution()
              Returns the current distribution that'll be used for calculating the random matrix
     int getNumberOfAttributes()
              Gets the current number of attributes (dimensionality) to which the data will be reduced to.
     java.lang.String[] getOptions()
              Gets the current settings of the filter.
     double getPercent()
              Gets the percent the attributes (dimensions) of the data will be reduced to
     long getRandomSeed()
              Gets the random seed of the random number generator
     boolean getReplaceMissingValues()
              Gets the current setting for using ReplaceMissingValues filter
     java.lang.String globalInfo()
              Returns a string describing this filter
     boolean input(Instance instance)
              Input an instance for filtering.
     java.util.Enumeration listOptions()
              Returns an enumeration describing the available options.
    static void main(java.lang.String[] argv)
              Main method for testing this class.
     java.lang.String numberOfAttributesTipText()
              Returns the tip text for this property
     java.lang.String percentTipText()
              Returns the tip text for this property
     java.lang.String randomSeedTipText()
              Returns the tip text for this property
     java.lang.String replaceMissingValuesTipText()
              Returns the tip text for this property
    private  double rndmNum(boolean useDstrWithZero)
              returns a double x such that x = sqrt(3) * { -1 with prob. 1/6, 0 with prob. 2/3, 1 with prob. 1/6 }
     void setDistribution(SelectedTag newDstr)
              Sets the distribution to use for calculating the random matrix
     boolean setInputFormat(Instances instanceInfo)
              Sets the format of the input instances.
     void setNumberOfAttributes(int newAttNum)
              Sets the number of attributes (dimensions) the data should be reduced to
     void setOptions(java.lang.String[] options)
              Parses the options for this object.
    private  void setOutputFormat()
              Sets the output format
     void setPercent(double newPercent)
              Sets the percent the attributes (dimensions) of the data should be reduced to
     void setRandomSeed(long seed)
              Sets the random seed of the random number generator
     void setReplaceMissingValues(boolean t)
              Sets either to use replace missing values filter or not
    private  int weightedDistribution(int[] weights)
              Calculates a weighted distribution
     
    Methods inherited from class weka.filters.Filter
    batchFilterFile, bufferInput, copyStringValues, copyStringValues, filterFile, flushInput, getInputFormat, getInputStringIndex, getOutputFormat, getOutputStringIndex, getStringIndices, inputFormat, inputFormatPeek, isOutputFormatDefined, numPendingOutput, output, outputFormat, outputFormatPeek, outputPeek, push, resetQueue, setOutputFormat, useFilter
     
    Methods inherited from class java.lang.Object
    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
     

    Field Detail

    m_k

    private int m_k
    Stores the number of dimensions to reduce the data to


    m_percent

    private double m_percent
    Stores the dimensionality the data should be reduced to as percentage of the original dimension


    m_useGaussian

    private boolean m_useGaussian
    Is the random matrix will be computed using Gaussian distribution or not


    SPARSE1

    private static final int SPARSE1
    The types of distributions that can be used for calculating the random matrix

    See Also:
    Constant Field Values

    SPARSE2

    private static final int SPARSE2
    The types of distributions that can be used for calculating the random matrix

    See Also:
    Constant Field Values

    GAUSSIAN

    private static final int GAUSSIAN
    The types of distributions that can be used for calculating the random matrix

    See Also:
    Constant Field Values

    TAGS_DSTRS_TYPE

    public static final Tag[] TAGS_DSTRS_TYPE

    m_distribution

    private int m_distribution
    Stores the distribution to use for calculating the random matrix


    m_replaceMissing

    private boolean m_replaceMissing
    Should the missing values be replaced using unsupervised.ReplaceMissingValues filter


    m_OutputFormatDefined

    private boolean m_OutputFormatDefined
    Keeps track of output format if it is defined or not


    ntob

    private Filter ntob
    The NominalToBinary filter applied to the data before this filter


    replaceMissing

    private Filter replaceMissing
    The ReplaceMissingValues filter


    m_rndmSeed

    private long m_rndmSeed
    Stores the random seed used to generate the random matrix


    rmatrix

    private double[][] rmatrix
    The random matrix


    r

    private java.util.Random r
    The random number generator used for generating the random matrix


    weights

    private static final int[] weights

    vals

    private static final int[] vals

    weights2

    private static final int[] weights2

    vals2

    private static final int[] vals2

    sqrt3

    private static final double sqrt3
    Constructor Detail

    RandomProjection

    public RandomProjection()
    Method Detail

    listOptions

    public java.util.Enumeration listOptions()
    Returns an enumeration describing the available options.

    Specified by:
    listOptions in interface OptionHandler
    Returns:
    an enumeration of all the available options.

    setOptions

    public void setOptions(java.lang.String[] options)
                    throws java.lang.Exception
    Parses the options for this object. Valid options are:

    -N
    The number of dimensions (attributes) the data should be reduced to (exclusive of the class attribute).

    -P
    The percentage of dimensions (attributes) the data should be reduced to (exclusive of the class attribute). This -N option is ignored if this option is present or is greater than zero.

    -D
    The distribution to use for calculating the random matrix.

  • 1 - Sparse distribution of: (default)
    sqrt(3)*{+1 with prob(1/6), 0 with prob(2/3), -1 with prob(1/6)}
  • 2 - Sparse distribution of:
    {+1 with prob(1/2), -1 with prob(1/2)}
  • 3 - Gaussian distribution
  • -M
    Replace missing values using the ReplaceMissingValues filter -R
    Specify the random seed for the random number generator for calculating the random matrix. *

    Specified by:
    setOptions in interface OptionHandler
    Parameters:
    options - the list of options as an array of strings
    Throws:
    java.lang.Exception - if an option is not supported

    getOptions

    public java.lang.String[] getOptions()
    Gets the current settings of the filter.

    Specified by:
    getOptions in interface OptionHandler
    Returns:
    an array of strings suitable for passing to setOptions

    globalInfo

    public java.lang.String globalInfo()
    Returns a string describing this filter

    Returns:
    a description of the filter suitable for displaying in the explorer/experimenter gui

    numberOfAttributesTipText

    public java.lang.String numberOfAttributesTipText()
    Returns the tip text for this property

    Returns:
    tip text for this property suitable for displaying in the explorer/experimenter gui

    setNumberOfAttributes

    public void setNumberOfAttributes(int newAttNum)
    Sets the number of attributes (dimensions) the data should be reduced to


    getNumberOfAttributes

    public int getNumberOfAttributes()
    Gets the current number of attributes (dimensionality) to which the data will be reduced to.


    percentTipText

    public java.lang.String percentTipText()
    Returns the tip text for this property

    Returns:
    tip text for this property suitable for displaying in the explorer/experimenter gui

    setPercent

    public void setPercent(double newPercent)
    Sets the percent the attributes (dimensions) of the data should be reduced to


    getPercent

    public double getPercent()
    Gets the percent the attributes (dimensions) of the data will be reduced to


    randomSeedTipText

    public java.lang.String randomSeedTipText()
    Returns the tip text for this property

    Returns:
    tip text for this property suitable for displaying in the explorer/experimenter gui

    setRandomSeed

    public void setRandomSeed(long seed)
    Sets the random seed of the random number generator


    getRandomSeed

    public long getRandomSeed()
    Gets the random seed of the random number generator


    distributionTipText

    public java.lang.String distributionTipText()
    Returns the tip text for this property

    Returns:
    tip text for this property suitable for displaying in the explorer/experimenter gui

    setDistribution

    public void setDistribution(SelectedTag newDstr)
    Sets the distribution to use for calculating the random matrix


    getDistribution

    public SelectedTag getDistribution()
    Returns the current distribution that'll be used for calculating the random matrix


    replaceMissingValuesTipText

    public java.lang.String replaceMissingValuesTipText()
    Returns the tip text for this property

    Returns:
    tip text for this property suitable for displaying in the explorer/experimenter gui

    setReplaceMissingValues

    public void setReplaceMissingValues(boolean t)
    Sets either to use replace missing values filter or not


    getReplaceMissingValues

    public boolean getReplaceMissingValues()
    Gets the current setting for using ReplaceMissingValues filter


    setInputFormat

    public boolean setInputFormat(Instances instanceInfo)
                           throws java.lang.Exception
    Sets the format of the input instances.

    Overrides:
    setInputFormat in class Filter
    Parameters:
    instanceInfo - an Instances object containing the input instance structure (any instances contained in the object are ignored - only the structure is required).
    Returns:
    true if the outputFormat may be collected immediately
    Throws:
    java.lang.Exception - if the input format can't be set successfully

    input

    public boolean input(Instance instance)
                  throws java.lang.Exception
    Input an instance for filtering.

    Overrides:
    input in class Filter
    Parameters:
    instance - the input instance
    Returns:
    true if the filtered instance may now be collected with output().
    Throws:
    java.lang.IllegalStateException - if no input format has been set
    java.lang.Exception - if the input instance was not of the correct format or if there was a problem with the filtering.

    batchFinished

    public boolean batchFinished()
                          throws java.lang.Exception
    Signify that this batch of input to the filter is finished.

    Overrides:
    batchFinished in class Filter
    Returns:
    true if there are instances pending output
    Throws:
    java.lang.NullPointerException - if no input structure has been defined,
    java.lang.Exception - if there was a problem finishing the batch.

    setOutputFormat

    private void setOutputFormat()
    Sets the output format


    convertInstance

    private Instance convertInstance(Instance currentInstance)
    converts a single instance to the required format


    rndmNum

    private double rndmNum(boolean useDstrWithZero)
    returns a double x such that x = sqrt(3) * { -1 with prob. 1/6, 0 with prob. 2/3, 1 with prob. 1/6 }


    weightedDistribution

    private int weightedDistribution(int[] weights)
    Calculates a weighted distribution


    main

    public static void main(java.lang.String[] argv)
    Main method for testing this class.

    Parameters:
    argv - should contain arguments to the filter: use -h for help