山东大学计算机学院闵越,Scheduling for Grid Computing 龚 斌 山东大学计算机科学与技术学院 山东省高性能计算中心...

Slide 1

Scheduling for Grid Computing

Slide 2

Reference Fangpeng Dong and Selim G.Akl Scheduling Algorithms for Grid Computing : State of the Art and Open Problems Yanmin ZHU : A Survey on Grid Scheduling Systems Peter Gradwell : Overview of Grid Scheduling Systems Alain Andrieux et al : Open Issues in Grid Scheduling Jia yu and Rajkumar Buyya : A Taxonomy of Workflow Systems for Grid Computing

Slide 3

Grid Internet Internet E-mail 1998, The Grid: Blueprint for a New Computing Infrastructure. Ian Foster :

Slide 4

The Definition of Grid A type of parallel and distributed system that enables the sharing, selection, and aggregation of geographically distributed autonomous and heterogeneous resources dynamically at runtime depending on their availability, capability, performance, cost and users quality-of-service requirements

Slide 5

Characteristics of Grid Computing Exploiting underutilized resources Distributed supercomputing capability Virtual organization for collaboration Resource balancing Reliability

Slide 6

Class of Grid Computing Function: Computing Grid Data Grid Service Grid Size: IntraGrid ExtraGrid InterGrid

Slide 7

Traditional Parallel Scheduling Systems System: SMP : Cluster CC-NUMA: SGI Scheduling Systems: OpenPBS, LSF, SGE Loadlevel, Condor,etc

Slide 8

Slide 9

Cluster Scheduling

Slide 10

The Assumption Underlying Tradition Systems All resources reside within a single administrative domain. To provide a single system image, the scheduler controls all of the resources. The resource pool is invariant. Contention caused by incoming application can be managed by the scheduler according to some policies, so that its impact on the performance that the site can provide to each application can be well predicted. Computation and their data reside in the same site or data staging is a highly predictable process, usually from a predetermined source to a predetermined destination, which can be viewed as constant overhead.

Slide 11

Characteristics of Cluster Scheduling Homogeneity of resource and application Dedicated resource Centralized scheduling architecture High-speed interconnection network Monotonic performance goal

Slide 12

Slide 13

The Terms of Grid Scheduling A task is an atomic unit to be scheduled by the scheduler and assigned to a resource. The properties of a task are parameters like CPU/memory requirement, deadline, priority, etc. A job (or metatask, or application) is a set of atomic tasks that will be carried out on a set of resources. Job can have a recursive structure, meaning that jobs are composed of sub-jobs and /or tasks, and sub-jobs can themselves be decomposed further into atomic tasks. A resource is something that is required to carry out an operation, for example: a processor for data processing, a data storage device, or a network link for data transporting. A site (or node) is an autonomous entity composed of one or multiple resources. A task scheduling is the mapping of tasks to a selected group of resources which may be distributed in administrative domains.

Slide 14

Three Stages of Scheduling Process Resource discovering and filtering Resource selecting and scheduling according to certain objectives Job submission

Slide 15

Stages of SuperScheduling Resource Discovery Authorization Filtering Application requirement definition Minimal requirement filtering System Selection Gathering information (query) Select the system (s) to run on Run Job (optional) Make an advance reservation Submit job to resources Preparation Tasks Monitor progress (maybe go back to System Selection) Find out J is done Completion tasks

Slide 16

Grid Scheduling framework Application Model Extracts the characteristics of applications to be scheduled. Resource Model Describes the characteristics of the underlying resources in Grid systems. Performance Model Responsible for behavior of a specific job on a specific computation resource. Scheduling Policy Responsible for deciding how applications should be executed and how resources should be utilized.

Slide 17

Applications Classification Batch vs. Interactive Real-time vs. Non real-time Priority

Slide 18

Resources Classification Time-shared vs. Non time-shared Dedicated vs. Non-dedicated Preemptive vs. Non-preemptive

Slide 19

Performance Estimation Simulation Analytical Modeling Historical Data On-line Learning Hybrid

Slide 20

Scheduling Policy Application-centric Execution Time : the time duration spent executing the job Wait Time : the time duration spent waiting in the ready queue Speedup : the ratio of time spent executing the job on the original platform to time spent executing the job on the Grid. Turnaround Time : also called response time. It is defined as the sum of waiting time and executing time. Job Slowdown : it is defined as the ratio of the response time of a job to its actual run time. System-centric Throughput : the number of jobs completed in one unit of time, such as per hour or per day. Utilization : the percent of time a resource is busy. Flow Time : the flow time of a set of jobs is the sum of completion time of all jobs. Average Application performance.

Slide 21

Scheduling Strategy Performance-driven Market-driven Trust-driven Security policy Accumulated reputation Self-defense capability Attack history Site vulnerability

Slide 22

A logical Grid scheduling architecture Broken lines : resource or application information flows Real lines : task or task scheduling command flows

Slide 23

Grid Scheduler Grid Scheduler (GS) receives application from Grid users, select feasible resources for these application according to acquired information from the Grid Information Service module, and finally generates application-to-resource mappings based on certain objective functions and predicted resource performance. GS usually cannot control Grid resources directly, but work like broker or agents Metascheduler, SuperScheduler Is not an indispensable component in the Grid infrastructure. Not included in the Globus Tookit In reality multiple such schedulers might be deployed, and organized to form different structures (centralized, hierarchical and decentralized) according to different concerns, such as performance or scalability.

Slide 24

Grid Information Service (GIS) To provide such information to Grid schedulers GIS is responsible for collecting and predicting the resource state information, such as CPU capacities, memory size, network bandwidth, software availabilities and load of a site in a particular period. GIS can answer queries for resource information or push information subscribers Globus : Monitoring and Discovery System (MDS) Application profiling (AP) is used to extract properties of applications Analogical Benchmarking (AB) provides a measure of how well a resource can perform a given type of job.

Slide 25

Launching and Monitoring (LM) Binder Implements a finally-determined schedule by submitting applications to selected resources, staging input data and executables if necessary, and monitoring the execution of the applications Globus :Grid Resource Allocation and Management, GRAM

Slide 26

Local Resource Manager (LRM) Is mainly responsible for two jobs: local scheduling inside a resource domain, where not only jobs from exterior Grid users, but also jobs from the domains local users are executed, and reporting resource information to GIS. Open PBS, Condor LSF SGE etc NWS : Network Weather Service, Hawkeye, Ganglia

Slide 27

Evaluation Criteria for Grid Scheduling Systems Application Performance Promotion System Performance Promotion Scheduling Efficiency Reliability Scalability Applicability to Application and Grid Environment

Slide 28

Scheduler Organization Centralized Decentralized Hierarchical

Slide 29

Centralized Scheduling

Slide 30

Decentralized Scheduling

Slide 31

Hierarchical Scheduling

Slide 32

Existing Grid Scheduling Systems Information Collection Systems MDS (Meta Directory Service) NWS (Network Weather Service) Condor Condor-G AppLeS Nimrod-G GRaDS Etc

Slide 33

Characteristics of scheduling for Grid Computing Heterogeneity and Autonomy Does not have full control of the resources Hard to estimate the exact cost of executing a task on different sites. Is required to be adaptive to different local policies Performance Dynamism Grid resources are not dedicated to a Grid application Performance fluctuation, compared with traditional system Some methods: QoS negotiation, resource reservation, rescheduling Resource Selection and Computation-Data Separation In tradition systems, executable codes of application and input/output data are usually in the same site, or the input sources and output destinations determined before the application is submitted, The cost of data staging can be neglected. Application Diversity

Slide 34

Grid Scheduling Algorithms The Complexity of

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值