edu.isi.pegasus.planner.refiner
Class DataReuseEngine

java.lang.Object
  extended by edu.isi.pegasus.planner.refiner.Engine
      extended by edu.isi.pegasus.planner.refiner.DataReuseEngine
All Implemented Interfaces:
Refiner

public class DataReuseEngine
extends Engine
implements Refiner

The data reuse engine reduces the workflow on the basis of existing output files of the workflow found in the Replica Catalog. The algorithm works in two passes.

In the first pass , we determine all the jobs whose output files exist in the Replica Catalog. An output file with the transfer flag set to false is treated equivalent to the file existing in the Replica Catalog , if

  - the output file is not an input to any of the children of the job X
  
In the second pass, we remove the job whose output files exist in the Replica Catalog and try to cascade the deletion upwards to the parent jobs. We start the breadth first traversal of the workflow bottom up. A node is marked for deletion if -
  ( It is already marked for deletion in pass 1
      OR
      ( ALL of it's children have been marked for deletion
        AND
        Node's output files have transfer flags set to false
      )
  )
 

Version:
$Revision: 3471 $
Author:
Karan Vahi

Nested Class Summary
 class DataReuseEngine.BooleanBag
          A bag implementation that cam be used to hold a boolean value associated with the graph node
 
Field Summary
private  List<Job> mAllDeletedJobs
          List of all deleted jobs during workflow reduction.
private  ADag mWorkflow
          The workflow object being worked upon.
private  XMLProducer mXMLStore
          The XML Producer object that records the actions.
 
Fields inherited from class edu.isi.pegasus.planner.refiner.Engine
mBag, mLogger, mLogMsg, mOutputPool, mPoolFile, mPOptions, mProps, mRLIUrl, mSiteStore, mTCFile, mTCHandle, mTCMode, REGISTRATION_UNIVERSE, TRANSFER_UNIVERSE
 
Fields inherited from interface edu.isi.pegasus.planner.refiner.Refiner
VERSION
 
Constructor Summary
DataReuseEngine(ADag orgDag, PegasusBag bag)
          The constructor
 
Method Summary
protected  Graph cascadeDeletionUpwards(Graph workflow, List<GraphNode> originalJobsInRC)
          Cascade the deletion of the jobs upwards in the workflow.
 List<Job> getDeletedJobs()
          This returns all the jobs deleted from the workflow after the reduction algorithm has run.
 List<Job> getDeletedLeafJobs()
          This returns all the deleted jobs that happen to be leaf nodes.
private  List<GraphNode> getJobsInRC(Graph workflow, Set filesInRC)
          Returns all the jobs whose output files exist in the Replica Catalog.
 ADag getWorkflow()
          Returns a reference to the workflow that is being refined by the refiner.
 XMLProducer getXMLProducer()
          Returns a reference to the XMLProducer, that generates the XML fragment capturing the actions of the refiner.
 ADag reduceWorkflow(ADag workflow, ReplicaCatalogBridge rcb)
          Reduces the workflow on the basis of the existence of lfn's in the replica catalog.
 Graph reduceWorkflow(Graph workflow, ReplicaCatalogBridge rcb)
          Reduces the workflow on the basis of the existence of lfn's in the replica catalog.
protected  boolean transferOutput(GraphNode node)
          Returns whether a user wants output transferred for a node or not.
 
Methods inherited from class edu.isi.pegasus.planner.refiner.Engine
addVector, appendArrayList, loadProperties, printVector, stringInList, stringInPegVector, stringInVector, vectorToString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

mAllDeletedJobs

private List<Job> mAllDeletedJobs
List of all deleted jobs during workflow reduction.


mXMLStore

private XMLProducer mXMLStore
The XML Producer object that records the actions.


mWorkflow

private ADag mWorkflow
The workflow object being worked upon.

Constructor Detail

DataReuseEngine

public DataReuseEngine(ADag orgDag,
                       PegasusBag bag)
The constructor

Parameters:
orgDag - The original Dag object
bag - the bag of initialization objects.
Method Detail

getWorkflow

public ADag getWorkflow()
Returns a reference to the workflow that is being refined by the refiner.

Specified by:
getWorkflow in interface Refiner
Returns:
ADAG object.

getXMLProducer

public XMLProducer getXMLProducer()
Returns a reference to the XMLProducer, that generates the XML fragment capturing the actions of the refiner. This is used for provenace purposes.

Specified by:
getXMLProducer in interface Refiner
Returns:
XMLProducer

reduceWorkflow

public ADag reduceWorkflow(ADag workflow,
                           ReplicaCatalogBridge rcb)
Reduces the workflow on the basis of the existence of lfn's in the replica catalog. The existence of files, is determined via the bridge.

Parameters:
workflow - the workflow to be reduced.
rcb - instance of the replica catalog bridge.
Returns:
the reduced dag

reduceWorkflow

public Graph reduceWorkflow(Graph workflow,
                            ReplicaCatalogBridge rcb)
Reduces the workflow on the basis of the existence of lfn's in the replica catalog. The existence of files, is determined via the bridge.

Parameters:
workflow - the workflow to be reduced.
rcb - instance of the replica catalog bridge.
Returns:
the reduced dag. The input workflow object is returned reduced.

getDeletedJobs

public List<Job> getDeletedJobs()
This returns all the jobs deleted from the workflow after the reduction algorithm has run.

Returns:
List containing the Job of deleted leaf jobs.

getDeletedLeafJobs

public List<Job> getDeletedLeafJobs()
This returns all the deleted jobs that happen to be leaf nodes. This entails that the output files of these jobs be transferred from the location returned by the Replica Catalog to the pool specified. This is a subset of mAllDeletedJobs Also to determine the deleted leaf jobs it refers the original dag, not the reduced dag.

Returns:
List containing the Job of deleted leaf jobs.

getJobsInRC

private List<GraphNode> getJobsInRC(Graph workflow,
                                    Set filesInRC)
Returns all the jobs whose output files exist in the Replica Catalog. An output file with the transfer flag set to false is treated equivalent to the file being in the Replica Catalog , if - the output file is not an input to any of the children of the job X

Parameters:
workflow - the workflow object
filesInRC - Set of String objects corresponding to the logical filenames of files that are found to be in the Replica Catalog.
Returns:
a List of GraphNodes with their Boolean bag value set to true.
See Also:
org.griphyn.cPlanner.classes.Job

cascadeDeletionUpwards

protected Graph cascadeDeletionUpwards(Graph workflow,
                                       List<GraphNode> originalJobsInRC)
Cascade the deletion of the jobs upwards in the workflow. We start a breadth first traversal of the workflow bottom up. A node is marked for deletion if -
  ( It is already marked for deletion
      OR
      ( ALL of it's children have been marked for deletion
        AND
        Node's output files have transfer flags set to false
      )
  )
 

Parameters:
workflow - the worfklow to be deduced
originalJobsInRC - list of nodes found to be in the Replica Catalog.

transferOutput

protected boolean transferOutput(GraphNode node)
Returns whether a user wants output transferred for a node or not. If no output files are associated , true will be returned

Parameters:
node - the GraphNode
Returns:
boolean


Copyright © 2011 The University of Southern California. All Rights Reserved.