Scheduling of Data Processing Jobs

Data processing jobs often consist of large numbers of tasks, which stress different resources and often also fluctuate in their resource demands. Therefore, a better resource utilization and lower runtimes are possible through an optimized resource allocation, scheduling, and placement of tasks. Moreover, co-locating processing tasks with complementary resource demands in shared infrastructures can further increase the resource utilization and job throughput. We, therefore, aim to answer the following questions for different data processing workloads with our research: What kind of resource should be allocated for a job and its tasks? Which job should be run next when resources become available? Where should a specific task be placed in a particular infrastructure? Should certain tasks be co-located onto shared resources? To answer these questions, we use monitoring data, profiling runs, different performance models, as well as scoring and optimization methods.

Ongoing Research

We currently work on multiple topics in this area: