edu.isi.pegasus.planner.refiner
Class ReductionEngine

java.lang.Object
  extended by edu.isi.pegasus.planner.refiner.Engine
      extended by edu.isi.pegasus.planner.refiner.ReductionEngine
All Implemented Interfaces:
Refiner

public class ReductionEngine
extends Engine
implements Refiner

Reduction engine for Planner 2. Given a ADAG it looks up the replica catalog and determines which output files are in the Replica Catalog, and on the basis of these ends up reducing the dag.

Version:
$Revision: 3471 $
Author:
Karan Vahi, Gaurang Mehta

Field Summary
private  Vector mAddJobsDeleted
          the jobs which are deleted due to the application of the Reduction algorithm.
private  Vector mAllDeletedJobs
          all deleted jobs.
private  Set mFilesInRC
          the files whose locations are returned from the ReplicaCatalog
private  Vector mOrgDagRelations
          the dag relations of the orginal dag
private  Vector mOrgJobsInRC
          the jobs which are found to be in the Replica Catalog.
private  ADag mOriginalDag
          the original dag object which needs to be reduced on the basis of the results returned from the Replica Catalog
private  ADag mReducedDag
          the reduced dag object which is returned.
private  ADag mWorkflow
          The workflow object being worked upon.
private  XMLProducer mXMLStore
          The XML Producer object that records the actions.
 
Fields inherited from class edu.isi.pegasus.planner.refiner.Engine
mBag, mLogger, mLogMsg, mOutputPool, mPoolFile, mPOptions, mProps, mRLIUrl, mSiteStore, mTCFile, mTCHandle, mTCMode, REGISTRATION_UNIVERSE, TRANSFER_UNIVERSE
 
Fields inherited from interface edu.isi.pegasus.planner.refiner.Refiner
VERSION
 
Constructor Summary
ReductionEngine(ADag orgDag, PegasusBag bag)
          The constructor
 
Method Summary
private  DagInfo constructNewDagInfo(DagInfo dagInfo, Vector vDelJobs)
          Constructs a DagInfo object for the decomposed Dag on the basis of the jobs which are deleted from the DAG by the reduction algorithm Note : We are plainly copying the inputFiles and the outputFiles.
private  Vector constructNewSubInfos(Vector vSubInfos, Vector vDelJobs)
          constructs the Vector of subInfo objects corresponding to the reduced ADAG.
private  void firstPass(Vector vDelJobs)
          If a job is deleted it marks all the relations related to that job as deleted
private  Vector getChildren(String node)
          Gets all the children of a particular node.
 List<Job> getDeletedJobs()
          This returns all the jobs deleted from the workflow after the reduction algorithm has run.
 List<Job> getDeletedLeafJobs()
          This returns all the deleted jobs that happen to be leaf nodes.
private  Vector getJobsInRC(Vector vSubInfos, Set filesInRC)
          This determines the jobs which are in the RC corresponding to the files found in the Replica Catalog.
private  Vector getNodeParents(String node)
          Gets all the parents of a particular node.
 ADag getWorkflow()
          Returns a reference to the workflow that is being refined by the refiner.
 XMLProducer getXMLProducer()
          Returns a reference to the XMLProducer, that generates the XML fragment capturing the actions of the refiner.
 ADag makeRedDagObject(ADag orgDag, Vector vDelJobs)
          makes the Reduced Dag object which corresponding to the deleted jobs which are specified.
 ADag reduceDag(ReplicaCatalogBridge rcb)
          Reduces the workflow on the basis of the existence of lfn's in the replica catalog.
private  void secondPass()
          In the second pass we find all the parents of the nodes which have been found to be in the RC.
 
Methods inherited from class edu.isi.pegasus.planner.refiner.Engine
addVector, appendArrayList, loadProperties, printVector, stringInList, stringInPegVector, stringInVector, vectorToString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

mOriginalDag

private ADag mOriginalDag
the original dag object which needs to be reduced on the basis of the results returned from the Replica Catalog


mOrgDagRelations

private Vector mOrgDagRelations
the dag relations of the orginal dag


mReducedDag

private ADag mReducedDag
the reduced dag object which is returned.


mOrgJobsInRC

private Vector mOrgJobsInRC
the jobs which are found to be in the Replica Catalog. These are the jobs whose output files are at some location in the Replica Catalog. This does not include the jobs which are deleted by applying the reduction algorithm


mAddJobsDeleted

private Vector mAddJobsDeleted
the jobs which are deleted due to the application of the Reduction algorithm. These do not include the jobs whose output files are in the RC. These are the ones which are deleted due to cascade delete


mAllDeletedJobs

private Vector mAllDeletedJobs
all deleted jobs. This is mOrgJobsInRC + mAddJobsDeleted.


mFilesInRC

private Set mFilesInRC
the files whose locations are returned from the ReplicaCatalog


mXMLStore

private XMLProducer mXMLStore
The XML Producer object that records the actions.


mWorkflow

private ADag mWorkflow
The workflow object being worked upon.

Constructor Detail

ReductionEngine

public ReductionEngine(ADag orgDag,
                       PegasusBag bag)
The constructor

Parameters:
orgDag - The original Dag object
bag - the bag of initialization objects.
Method Detail

getWorkflow

public ADag getWorkflow()
Returns a reference to the workflow that is being refined by the refiner.

Specified by:
getWorkflow in interface Refiner
Returns:
ADAG object.

getXMLProducer

public XMLProducer getXMLProducer()
Returns a reference to the XMLProducer, that generates the XML fragment capturing the actions of the refiner. This is used for provenace purposes.

Specified by:
getXMLProducer in interface Refiner
Returns:
XMLProducer

reduceDag

public ADag reduceDag(ReplicaCatalogBridge rcb)
Reduces the workflow on the basis of the existence of lfn's in the replica catalog. The existence of files, is determined via the bridge.

Parameters:
rcb - instance of the replica catalog bridge.
Returns:
the reduced dag

getJobsInRC

private Vector getJobsInRC(Vector vSubInfos,
                           Set filesInRC)
This determines the jobs which are in the RC corresponding to the files found in the Replica Catalog. A job is said to be in the RC if all the outfiles for that job are found to be in the RC. A job in RC can be removed from the Dag and the Dag correspondingly reduced.

Parameters:
vSubInfos - Vector of Job objects corresponding to all the jobs of a Abstract Dag
filesInRC - Set of String objects corresponding to the logical filenames of files which are found to be in the Replica Catalog
Returns:
a Vector of jobNames (Strings)
See Also:
org.griphyn.cPlanner.classes.Job

firstPass

private void firstPass(Vector vDelJobs)
If a job is deleted it marks all the relations related to that job as deleted

Parameters:
vDelJobs - the vector containing the names of the deleted jobs whose relations we want to nullify

secondPass

private void secondPass()
In the second pass we find all the parents of the nodes which have been found to be in the RC. Corresponding to each parent, we find the corresponding siblings for that deleted job. If all the siblings are deleted, we can delete that parent.


getNodeParents

private Vector getNodeParents(String node)
Gets all the parents of a particular node.

Parameters:
node - the name of the job whose parents are to be found.
Returns:
Vector corresponding to the parents of the node.

getChildren

private Vector getChildren(String node)
Gets all the children of a particular node.

Parameters:
node - the name of the node whose children we want to find.
Returns:
Vector containing the children of the node.

getDeletedJobs

public List<Job> getDeletedJobs()
This returns all the jobs deleted from the workflow after the reduction algorithm has run.

Returns:
List containing the Job of deleted leaf jobs.

getDeletedLeafJobs

public List<Job> getDeletedLeafJobs()
This returns all the deleted jobs that happen to be leaf nodes. This entails that the output files of these jobs be transferred from the location returned by the Replica Catalog to the pool specified. This is a subset of mAllDeletedJobs Also to determine the deleted leaf jobs it refers the original dag, not the reduced dag.

Returns:
List containing the Job of deleted leaf jobs.

makeRedDagObject

public ADag makeRedDagObject(ADag orgDag,
                             Vector vDelJobs)
makes the Reduced Dag object which corresponding to the deleted jobs which are specified. Note : We are plainly copying the inputFiles and the outputFiles. After reduction this changes but since we need those only to look up the RC, which we have done.

Parameters:
orgDag - the original Dag
vDelJobs - the Vector containing the names of the jobs whose SubInfos and Relations we want to remove.
Returns:
the reduced dag, which doesnot have the deleted jobs

constructNewDagInfo

private DagInfo constructNewDagInfo(DagInfo dagInfo,
                                    Vector vDelJobs)
Constructs a DagInfo object for the decomposed Dag on the basis of the jobs which are deleted from the DAG by the reduction algorithm Note : We are plainly copying the inputFiles and the outputFiles. After reduction this changes but since we need those only to look up the RC, which we have done.

Parameters:
dagInfo - the object which is reduced on the basis of vDelJobs
vDelJobs - Vector containing the logical file names of jobs which are to be deleted
Returns:
the DagInfo object corresponding to the Decomposed Dag

constructNewSubInfos

private Vector constructNewSubInfos(Vector vSubInfos,
                                    Vector vDelJobs)
constructs the Vector of subInfo objects corresponding to the reduced ADAG. It also modifies the strargs to remove them up of markup and display correct paths to the filenames

Parameters:
vSubInfos - the Job object including the jobs which are not needed after the execution of the reduction algorithm
vDelJobs - the jobs which are deleted by the reduction algo as their output files are in the Replica Catalog
Returns:
the Job objects except the ones for the deleted jobs


Copyright © 2011 The University of Southern California. All Rights Reserved.