de.zimek.proteinfeatures
Class ArffFileCreator

java.lang.Object
  extended byde.zimek.proteinfeatures.ArffFileCreator

public class ArffFileCreator
extends java.lang.Object

Main class for creating protein features in arff format. The features are created according to the feature extracting classes specified in a given property file. The property file can specify following properties:

 SCOPLEVEL
 CATHLEVEL
 ATTRIBUTES
 MINSUPPORT
 MVALUES
 AAINDICES
 MOTIF_MAP
 
SCOPLEVEL can get assigned either one of CLASS, FOLD, SUPERFAMILY, or FAMILY.
Alternatively CATHLEVEL can get assigned either one of CLASS, ARCHITECTURE, TOPOLOGY, or HOMOLOGOUS_SUPERFAMILY.
ATTRIBUTES can get assigned a comma separated list of de.zimek.proteinfeatures.attributeAssigner.ArffAttributeAssigners.
MINSUPPORT can specify the minimum number of required instances for a class.
MVALUES can get assigned a comma separated list of m-values for all de.zimek.proteinfeatures.attributeAssigner.AutoCorrelationFunctions in the specified list of de.zimek.proteinfeatures.attributeAssigner.ArffAttributeAssigners (same order).
AAINDICES can get assigned a comma separated list of aaIndices implementing de.zimek.proteinfeatures.aaindex.AAIndex for all AutoCorrelationFunction in the specified list of de.zimek.proteinfeatures.attributeAssigner.ArffAttributeAssigners (same order).
MOTIF_MAP can get assigned a comma separated list of filenames of files providing a motif map for all de.zimek.proteinfeatures.attributeAssigner.MotifAssigners in the specified list of de.zimek.proteinfeatures.attributeAssigner.ArffAttributeAssigners (same order).

Author:
Arthur Zimek

Field Summary
static java.lang.String AAINDEX_PACKAGE
          The name of the package for aaindex assigners.
private  ArffAttributeAssigner[] arffAttributeAssigner
          The ArffAttributeAssigners that are to create features.
static java.lang.String ATTRIBUTE_PROPERTY
          The property-name for attributes.
private  ID[] coveredClasses
          THe ids of covered classes.
static java.lang.String DEFAULT_PACKAGE
          The default package to search for specified classes.
static java.lang.String DEFAULT_PARSER_PACKAGE
          The default parser package.
private  int level
          The level of hierarchy.
private  int minSupport
          Default minimum support.
static java.lang.String MOTIF_PROPERTY
          The property-name for a motif map.
private  java.lang.String relation
          Name of the relation.
static java.lang.String[] SUB_PACKAGES
          The subpackages for group descriptors.
 
Constructor Summary
ArffFileCreator(java.util.Properties properties)
          Initializes the ArffFileCreator according to the settings in the given properties.
 
Method Summary
protected  java.lang.String concatenate(java.lang.String[] subclasses)
          Returns a String describing a set of the given array of class-names.
protected  Protein[] filterClasses(Protein[] proteins)
          Returns only the proteins having at least minimumSupport.
protected  java.lang.String getCoveredClassesArff()
          Returns the covered classes as arff-formatted Attribute-String.
protected  java.lang.String getCoveredHierarchy()
          Returns the covered hierarchy as a single String.
 java.lang.String getDataLine(Protein protein)
          Returns a String providing the data-entry-line in arff-format for the given protein, at last the class according to the current set class-level in scop- or cath-notation and a newline.
static void main(java.lang.String[] args)
          Main routine to create an arff file providing the features as specified in a given property file.
 void printArffFile(Protein[] proteins, java.io.PrintStream out)
          Prints an arff file for the given proteins according to the current settings to the given PrintStream.
 void setCoveredClasses(Protein[] proteins)
          Sets the covered classes according to the classes occuring in the given array of proteins.
 void setMinSupport(int minSupport)
          Sets the minSupport.
protected  java.lang.String subclasses(ID[] classes, int level)
          Returns a String describing the set of subclasses splitted at the given level.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ATTRIBUTE_PROPERTY

public static final java.lang.String ATTRIBUTE_PROPERTY
The property-name for attributes.

See Also:
Constant Field Values

MOTIF_PROPERTY

public static final java.lang.String MOTIF_PROPERTY
The property-name for a motif map.

See Also:
Constant Field Values

DEFAULT_PACKAGE

public static final java.lang.String DEFAULT_PACKAGE
The default package to search for specified classes.


SUB_PACKAGES

public static final java.lang.String[] SUB_PACKAGES
The subpackages for group descriptors.


DEFAULT_PARSER_PACKAGE

public static final java.lang.String DEFAULT_PARSER_PACKAGE
The default parser package.


AAINDEX_PACKAGE

public static final java.lang.String AAINDEX_PACKAGE
The name of the package for aaindex assigners.


arffAttributeAssigner

private ArffAttributeAssigner[] arffAttributeAssigner
The ArffAttributeAssigners that are to create features.


level

private int level
The level of hierarchy.


coveredClasses

private ID[] coveredClasses
THe ids of covered classes.


relation

private java.lang.String relation
Name of the relation.


minSupport

private int minSupport
Default minimum support.

Constructor Detail

ArffFileCreator

public ArffFileCreator(java.util.Properties properties)
                throws java.lang.IllegalArgumentException
Initializes the ArffFileCreator according to the settings in the given properties.

Parameters:
properties - properties providing the properties for ArffFileCreator
Throws:
java.lang.IllegalArgumentException
Method Detail

printArffFile

public void printArffFile(Protein[] proteins,
                          java.io.PrintStream out)
Prints an arff file for the given proteins according to the current settings to the given PrintStream.

Parameters:
proteins - the proteins to print an arff file for
out - the PrintStream where to print

setCoveredClasses

public void setCoveredClasses(Protein[] proteins)
Sets the covered classes according to the classes occuring in the given array of proteins.

Parameters:
proteins - the proteins whose classes are to be covered

filterClasses

protected Protein[] filterClasses(Protein[] proteins)
Returns only the proteins having at least minimumSupport. Adjusts coveredClasses, thus make sure to call setCoveredClasses(Protein[] proteins) before calling this method.

Parameters:
proteins -
Returns:
Protein[]

setMinSupport

public void setMinSupport(int minSupport)
Sets the minSupport.

Parameters:
minSupport - The minSupport to set

getCoveredHierarchy

protected java.lang.String getCoveredHierarchy()
Returns the covered hierarchy as a single String. Make sure to setCoveredClasses(Protein[] proteins) before deriving the hierarchy.

Returns:
String

subclasses

protected java.lang.String subclasses(ID[] classes,
                                      int level)
Returns a String describing the set of subclasses splitted at the given level.

Parameters:
classes - the classes to be described as hierarchy
level - the desired depth of the hierarchy
Returns:
String description of the hierarchy of given classes for given level

concatenate

protected java.lang.String concatenate(java.lang.String[] subclasses)
Returns a String describing a set of the given array of class-names.

Parameters:
subclasses - the class-names to be collected in a set
Returns:
String a description of the set of classes suitable as part of a hierarchy definition

getCoveredClassesArff

protected java.lang.String getCoveredClassesArff()
Returns the covered classes as arff-formatted Attribute-String. Make sure to invoke setCoveredClasses(Protein[] proteins) before invoking this method.

Returns:
String an arff formatted attribute String as class attribute

getDataLine

public java.lang.String getDataLine(Protein protein)
Returns a String providing the data-entry-line in arff-format for the given protein, at last the class according to the current set class-level in scop- or cath-notation and a newline.

Parameters:
protein - the protein to create the feature entry for
Returns:
String the feature line in arff format

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
Main routine to create an arff file providing the features as specified in a given property file.

Parameters:
args -
Throws:
java.lang.Exception