lor.tasks package¶
Submodules¶
lor.tasks.fs module¶
Utility tasks for the local filesystem
-
class
lor.tasks.fs.
EnsureExistsOnLocalFilesystemTask
(*args, **kwargs)¶ Bases:
luigi.task.ExternalTask
-
description
= 'Ensure a file/dir exists on the local filesystem'¶
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
- If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
path
= <luigi.parameter.Parameter object>¶
-
run
()¶ The task run method, to be overridden in a subclass.
See Task.run
-
lor.tasks.general module¶
General utility tasks
-
class
lor.tasks.general.
AlwaysRunsTask
(*args, **kwargs)¶ Bases:
luigi.task.ExternalTask
-
complete
()¶ If the task has any outputs, return
True
if all outputs exist. Otherwise, returnFalse
.However, you may freely override this method with custom logic.
-
description
= 'A trivial task that always runs successfully'¶
-
-
class
lor.tasks.general.
DictPluckTask
(*args, **kwargs)¶ Bases:
luigi.task.ExternalTask
-
description
= 'Pluck a single output from a task that produces a dict of outputs'¶
-
key
= <luigi.parameter.Parameter object>¶
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
- If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a Subclasses can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
upstream_task
= <luigi.parameter.Parameter object>¶
-
lor.tasks.hdfs module¶
Utility tasks for HDFS
-
class
lor.tasks.hdfs.
ClusterDeployTask
(*args, **kwargs)¶ Bases:
luigi.task.Task
-
description
= 'Deploy a local file/dir to the cluster'¶
-
destination
()¶
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
- If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
run
()¶ The task run method, to be overridden in a subclass.
See Task.run
-
source
()¶
-
-
class
lor.tasks.hdfs.
DownloadFromHdfsTask
(*args, **kwargs)¶ Bases:
luigi.task.Task
-
description
= 'Move a file/dir from HDFS to the local filesystem'¶
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
- If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
output_path
= <luigi.parameter.Parameter object>¶
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a Subclasses can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
run
()¶ The task run method, to be overridden in a subclass.
See Task.run
-
upstream_task
= <luigi.parameter.TaskParameter object>¶
-
-
class
lor.tasks.hdfs.
EnsureExistsOnHdfsTask
(*args, **kwargs)¶ Bases:
luigi.task.ExternalTask
-
description
= 'Ensure a file/dir exists on HDFS'¶
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
- If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
path
= <luigi.parameter.Parameter object>¶
-
run
()¶ The task run method, to be overridden in a subclass.
See Task.run
-
-
class
lor.tasks.hdfs.
MoveToHdfsTask
(*args, **kwargs)¶ Bases:
luigi.task.Task
Move the output of a task (assuming it’s a LocalTarget) onto HDFS
-
cache_invalidator
= <luigi.parameter.Parameter object>¶
-
description
= 'Move the output of a task to HDFS'¶
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
- If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a Subclasses can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
run
()¶ The task run method, to be overridden in a subclass.
See Task.run
-
upstream_task
= <luigi.parameter.TaskParameter object>¶
-
lor.tasks.tar module¶
-
class
lor.tasks.tar.
TarballTask
(*args, **kwargs)¶ Bases:
luigi.contrib.external_program.ExternalProgramTask
A task that puts another task’s output (assuming it outputs a FileTarget) into a tarball)
-
describe
= "Package a task's output into an uncompressed tarball."¶
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
- If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
output_path
= <luigi.parameter.Parameter object>¶
-
program_args
()¶ Override this method to map your task parameters to the program arguments
Returns: list to pass as args
tosubprocess.Popen
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a Subclasses can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
upstream_task
= <luigi.parameter.TaskParameter object>¶
-