不看死不瞑目的文档 :Solaris container - workload Management

Running multiple applications on a single computer system without a means to control how applications use system resources can lead to unpredictable service levels. By default, the Solaris OS treats every resource request with equal priority. If there is enough of the resource available the request is granted. If the demand for the resource exceeds the total capacity available, the Solaris OS adapts by restricting access to the resource. The action taken to restrict access depends on the type of resource. For example, should demand for CPU time exceed the CPU time available, the scheduler reacts by adjusting the priorities of processes in order to change the distribution of the CPU time. The scheduler operates on threads and has no concept of applications, let alone their relative importance from a business perspective. An unimportant CPU-bound application can victimize other, more important applications by placing high demand for CPU resources on the system.

Other resources, such as the total number of processes on the system, have a fixed upper bound. Once the limit is reached, no more of this resource can be used. A runaway process that keeps creating new processes can prevent new useful work from being started. Other than specifying the system-wide upper limit, there is no way to limit the number of processes that may be created by an application or a set of applications.

What is needed is a way to control resource usage based on workloads. A workload is an aggregation of all processes of an application, or group of applications, that makes sense from a business perspective. Instead of managing resource usage at the process level, it should be possible to manage resource usage at the workload level. This allows the implementation of policies such as the Sales application shall be granted at least 30% of CPU resources as part of a service level agreement. The Solaris OS resource management features make it possible to treat workloads in this way by:

l        Restricting access to specific resources

l        Offering resources to workloads on a preferential basis

l        Isolating workloads from each other

The first step in managing resource usage by workloads is identifying or classifying the components, such as processes, that make up the workload. The next step is measuring the resource consumption of these workloads. Finally, by applying constraints on the use of resources the workloads can be controlled. The constraints applied follow from the policies defined for the workloads based on business requirements.

A possible policy could be that an important workload should always be granted a minimum amount of CPU time even on an overloaded system. Another policy could be that a workload is only granted access to the CPU if there are no other workloads requiring CPU resources.

Projects

The first step in managing resource usage involves identifying the workloads running on the system. Possible approaches include identifying workloads by username or process name. While simple, this poses a challenge when multiple instances of the same application are running on the system for different workloads, such as a sales application database and a marketing application database. Unless the database application provides a way to run the instances as different users, it is imposible to attribute resource usage to a specific workload based solely on userid. In addition, aggregation of multiple related applications, such as database servers, application servers and Web servers for a business application on one system is not possible.

The Solaris OS provides a facility called projects to identify workloads. The project serves as an administrative tag used to group related work in a manner deemed useful by the system administrator. System administrators can, for example, create one project for the sales application and another project for the marketing application. By placing all processes related to the sales application in the sales project and the processes for the marketing application in the marketing project, the administrator can separate, and ultimately control, the workloads in a way that makes sense to the business.

A user that is a member of more than one project can run processes in multiple projects at the same time, making it possible for users to participate in several workloads simultaneously. All processes started by a process inherit the project of the parent process. As a result, switching to a new project in a startup script runs all child processes in the new project.

Using Projects to Define Workloads

In the example of the sales and marketing applications, the system administrator can create two new projects, one for the sales application and one for the marketing application. The application startup scripts must be modified to switch to the desired project as part of the application startup. The sales application startup script switches to the sales project, and the marketing application switches to the marketing project. This results in both applications running in different projects while still using the same userid. Adding another application, such as a Web server, to the sales application workload requires adding the Web server user to the sales project and modifying the Web server startup script to switch to the sales project. With the introduction of the Service Management Facility (SMF) in the Solaris 10 OS, administrators can assign the project in which to run the application or service through service properties in the SMF repository.

The Project Database

Projects are defined in the project database. The project database can be a local file or in a name service such as NIS or LDAP. By putting the project database in NIS or LDAP, the project definition can be shared across multiple systems. Each entry in the project database consists of the following fields:

_ name, the name of the project

_ id, the projects unique numerical ID

_ comment, the description of the project

_ user list, a list of users allowed in the project

_ group list, a list of groups allowed in the project

_ attributes, a list of project attributes, such as resource controls

A freshly installed system always contains a local project database /etc/project containing five standard projects:

_ system, used for all system processes and daemons

_ user.root, used for all processes run by root

_ noproject, for processes specific to IP quality of service (IPQoS)

_ default, for users not matching any other project (a catch-all project)

_ group.staff, for all users in the group staff

A user or group can be a member of one or more projects. The user and group lists in the project database determine in what projects a user or group of users can execute processes. These lists can contain wildcards to allow for flexible definitions, such as all members of group staff excluding user bob. Users can switch to any project of which they are a member. Until the user changes the project in which to execute a process, all processes run in the users default project. The user and group lists only define the project(s) in which a user or group is allowed to execute processes. It does not define a default project for the user or group. The default project for a user is determined by the system at login time. See the man page for getprojent(3C) for the exact algorithm used.

Commands

The following commands are available to administer projects:

Command

Description

projadd(1M)

Adds a new project to the local project database

projmod(1M)

Modifies a project entry in the local project database

projdel(1M)

Deletes a project entry from the local project database

projects(1)

Displays project membership for a user

newtask(1)

Switches to a project

Several standard Solaris OS commands include project related options, and can be used to view or manipulate processes based on their project membership:

Command

Option

id(1M)

-p

ipcs(1)

-J

pgrep(1)

-J -T

pkill(1)

-J -T

poolbind(1M)

-i project

prctl(1)

-i project

priocntl(1M)

-i project

prstat(1M)

-j -J -k -T

ps(1)

-o projid project taskid

useradd(1M)

-p

For example, the prstat -J command lists all processes and projects on the system and displays a per project total. See the man pages for more information on these commands and the options related to projects.

Extended Accounting

Once workloads are identified and labeled using projects, the next step in managing resource usage involves measuring workload resource consumption. While current consumption can be measured using the prstat(1M) command to obtain real-time snapshot of resource usage, it does not provide the capability to look at historical data.

The traditional accounting mechanism is process based and predates the introduction of projects. It is therefore unable to provide resource usage statistics based on workloads. The extended accounting facility allows collection of statistics at the process level, the task level or both. Accounting at the task level aggregates the resource usage of its member processes, thereby reducing the required disk space for accounting data. A task is a group of related processes executing in the same project as a result of a newtask(1) command. An accounting record is written at the completion of a process or task. Interim accounting records can be written for tasks, and can be used to provide accurate daily accounting for long running jobs that span multiple days.

Every process that runs in the system is associated with a project and a task. By labeling all resource usage records with the project for which the work was done, the extended accounting facility can provide data on the resource consumption of workloads. This data can be used for reporting, capacity planning or charge back schemes.

Unlike the traditional System V accounting mechanism that is based on fixed size, fixed semantic records, the extended accounting facility uses a flexible and extensible file format for accounting data. Files in this format can be read or written using the C language API provided by libexacct(3LIB). This API abstracts the accounting file and offers functions to read and write records and fields in the file without the need for knowledge of the physical layout. This makes it possible to add new record or field types to the file between releases, even during system operation, without impacting existing applications that use extended accounting files. A Perl interface for libexacct is available to ease the creation of custom reporting tools.

Commands

The following commands are available to administer the extended accounting facility.

Command

Description

acctadm(1M)

Configure extended accounting

wracct(1M)

Write extended accounting records for active processes and tasks

The Fair Share Scheduler

Running multiple workloads on the same system can lead to a situation where one workload monopolizes CPU resources and impacts other workloads. This may result in important workloads not receiving sufficient CPU resources to complete their work. It is desirable to have a mechanism by which system administrators can prioritize access to CPU resources based on the importance of the workload.

The policy of the default scheduler in the Solaris OS is to give every process relatively equal access to CPU resources. Since it has no knowledge of workloads, the default scheduler cannot prioritize CPU allocation based on workload importance. The Solaris OS offers an alternative scheduler that is aware of workloads and can prioritize CPU allocation with respect to workload importance.

CPU Shares

The Fair Share Scheduler (FSS) controls allocation of CPU resources using CPU shares. The importance of a workload is expressed by the number of shares the system administrator allocates to the project representing the workload. The Fair Share Scheduler ensures that CPU resources are distributed among active projects based on the number of shares assigned

to each project (Figure 3-1).

A CPU share defines a relative entitlement of the CPU resources available to a project on the system. It is important to note that CPU shares are not the same as CPU percentages. Shares define the relative importance of projects with respect to other projects. If project A is deemed twice as important as project B, project A should be assigned twice as many shares as project B. The actual number of shares assigned is largely irrelevant two shares for project A versus one share for project B yields the same results as 18 shares for project A versus nine shares for project B. In both cases, Project A is entitled to twice the amount of CPU resources as project B. The importance of project A relative to project B can be increased by assigning more shares to project A while retaining the same number of shares for project B.

FIGURE 3-1 The Fair Share Scheduler ensures applications get the CPU resources to which they are entitled.

The Fair Share Scheduler calculates the proportion of CPU resources allocated to a project by dividing the shares for the project by the total number of shares of active projects. An active project is a project with at least one process using CPU resources. Shares for idle projects, such as those without active processes, are not used in the calculations. For example, consider projects A, B and C with two, one and four shares respectively. If projects A, B and C are active, then project A is entitled to , project B is entitled to , and project C is entitled to of CPU resources. If project A is idle, project B is entitled to of CPU resources, and project C is entitled to of CPU resources (Figure 3-2). Note that even though the actual CPU entitlement for project B and C increases, the proportion between project B and C stays the same (1:4).

It is important to note that the Fair Share Scheduler only limits CPU usage if there is competition for CPU resources. If there is only one active project on the system, it can use 100% of CPU resources, regardless of the number of shares it holds. CPU cycles are never wasted. If a project does not use all the CPU resources it is entitled to because it has no work to do, the remaining CPU resources are distributed between other active projects.

FIGURE 3-2 The Fair Share Scheduler distributes CPU resources among active projects based on the number of CPU shares

CPU Shares Configuration

CPU shares are configured through the project.cpu-shares resource control in the project database. Every project can be assigned a project.cpu-shares resource control. Projects without this resource control are assigned one share by the system. The system project is used for all system processes and daemons, and is special in that it has unlimited shares. Projects with zero shares assigned are only allowed to run when no other projects with nonzero shares are active.

Users can be a member of multiple projects and CPU usage is controlled by the number of shares of the project in which the user executes. As a result, a user can be entitled to different amounts of CPU resources at the same time. Note that a process can only be in one project at a time, so having different amounts of CPU resources at the same time means that processes owned by this user reside in different projects.

To place a CPU usage limit on a single user, create a project with the appropriate number of shares that contains only that user. This project should be the default project for this user, and the user should not be a member of any other projects to prevent the user from switching to another project.

The CPU shares can be adjusted dynamically using the prctl(1M) command. These changes are valid until the next system boot. To make the changes permanent, update the project.cpu-shares resource control in the project database.

Resource Controls

Resource usage of workloads can be controlled by placing bounds on resource usage. These bounds can be used to prevent a workload from over-consuming a particular resource and interfering with other workloads. The Solaris OS provides a resource controls facility to implement constraints on resource consumption. This facility is an extension of the traditional UNIX resource limit facility (rlimit). The rlimit facility can be used to set limits on the resource usage of processes, such as the maximum CPU time used, the maximum file size, the maximum core file size, and more. However, as the rlimit facility is process-based, its use for constraining workloads is rather limited. The resource controls facility in the Solaris OS extends process-based limits by adding resource limits at the task and project level. The number of resource limits that can be set is also expanded to give system administrators more control over resource consumption by processes, tasks and projects on the system.

Administering Resource Controls

Resource controls are configured through the project database. The last field of the project entry is used to set resource controls. A resource control in the project entry is a name-value pair. The name denotes the type of limit, while the value is a list of attributes for the control. Multiple resource controls can be added to a single project entry by separating the resource controls with a semicolon. The list of attributes for a resource control consists of a privilege

level, a threshold, and an action.

The privilege level determines which users can modify the threshold value. Three privilege levels are provided:

_ basic, the owner of the calling process can change the threshold

_ privileged, only privileged (superuser) users can change the threshold

_ system, the threshold is fixed for the lifetime of the operating system instance

Every resource control has at least a system value, which represents how much of the resource the current implementation of the operating system is able to provide. A resource control can have at most one basic value and any number of privileged values.

The action defines the steps to be taken when the threshold is exceeded. Three actions are possible:

_ deny, deny resource requests for an amount that is greater than the threshold

_ signal, send the specified signal to the process exceeding the threshold value

_ none, perform no action when the threshold is exceeded

 

Note

Changes made in the project database are only applied when a new process, task or project starts. Existing processes, tasks and projects do not see these changes. The prctl(1M) and rctladm(1M)commands can be used to change resource controls on active entities.

Available Resource Controls

The following table identifies the resource controls available in the Solaris 10 OS.

Resource Control

Description

process.max-port-events

Maximum allowable number of events per event port

process.max-msg-messages

Maximum number of messages on a message queue

process.max-msg-qbytes

Maximum number of bytes of messages on a message queue

process.max-sem-ops

Maximum number of semaphore operations allowed per semop call

process.max-sem-nsems

Maximum number of semaphores allowed per semaphore set

process.max-address-space

Maximum amount of address space available to this process

process.max-file-descriptor

Maximum file descriptor index available to this process

process.max-core-size

Maximum size of a core file created by this process

process.max-stack-size

Maximum stack memory segment available to this process

process.max-data-size

Maximum heap memory available to this process

process.max-file-size

Maximum file offset available for writing by this process

process.max-cpu-time

Maximum CPU time available to this process

task.max-cpu-time

Maximum CPU time available to this tasks processes

task.max-lwps

Maximum number of LWPs simultaneously available to taskss processes

project.max-contracts

Maximum number of contracts allowed in a project

project.max-device-locked-memory

Total amount of locked memory allowed in a project

project.max-port-ids

Maximum allowable number of event ports

project.max-shm-memory

Total amount of shared memory allowed for a project

project.max-shm-ids

Maximum number of shared memory IDs allowed for a project

project.max-msg-ids

Maximum number of message queue IDs allowed for a project

project.max-sem-ids

Maximum number of semaphore IDs allowed for a project

project.max-crypto-memory

Total amount of kernel memory that can be used by libpkcs11 for hardware crypto acceleration

project.max-tasks

Maximum number of tasks allowable in a project

project.max-lwps

Maximum number of LWPs simultaneously available to a project

project.cpu-shares

Number of CPU shares granted to a project for use with the FSS

zone.max-lwps

Maximum number of LWPs simultaneously available to zones processes

zone.cpu-shares

Number of CPU shares granted to a zone for use with the FSS

Determining Thresholds

The resource consumption of processes is often unknown, so choosing a useful and safe threshold for a resource control can be a difficult task. Selecting an arbitrary threshold can lead to unexpected application failure modes. While some required information could be extracted from extended accounting information, there is a simpler way. The resource controls facility provides a global log action that sends a message to syslog when a threshold is exceeded.

First, a resource control with the threshold value to be verified must be set. The action should be set to none to ensure the resource is not denied if the threshold is exceeded. This allows the process to run unconstrained. Next, the global syslog action for the resource control must be enabled. When the application exceeds the threshold for that resource control, a message that the resource control threshold has been exceeded is logged to syslog. By changing the threshold until the warning no longer appears during normal use of the application, a reasonable setting for the resource control can be determined. After determining the value for the resource control, the action should be changed to deny, to ensure the threshold is enforced by the system.

Commands

The following commands are available for administering resource controls. More information can be found in the man pages for each command.

Command

Description

prctl(1M)

Get or set resource controls on a running process, task or project

rctladm(1M)

Display or modify global state of system resource controls

 

 
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值