... tbl bof.t | ditroff -ms
.ifn .nr LL 8i
.TL
Operating Systems for Supercomputers
.br
.ps -4
Birds of a Feather Session
.br
Supercomputing '89, Reno, NV.
.AU
Daniel J. Kopetzky\u\(dg\d
John Riganati\u\(dg\d
Dieter Fuss\u\(dd\d
.AI
\u\(dg\d Supercomputing Research Center
16100 Science Drive
Bowie, MD. 20715

\u\(dd\d Lawrence Livermore National Laboratory
P. O. Box 808, L-669
Livermore, CA 94551
.AB
This paper summarizes the discussions held
in a birds-of-a-feather session on the topic
of future operating system support for supercomputers.
The group collectively worked to identify current
problem areas and identify potential future ones.
.AE
.NH
Background
.PP
Advanced systems software development for supercomputers
in the 90's may be viewed as falling primarily into two areas
\(em distributed computing and parallel processing.
The driving forces for distributed computing come from a desire
to increase user productivity (for example, with a friendly workstation
environment), maximize efficient use of resources, and enhance resource sharing; the
driving forces for parallel processing come from a desire to increase
performance, especially for studying physical phenomena and other large-scale
calculations or to decrease cost at a constant performance.
.PP
Requirements for advanced systems software development for distributed computing and
parallel processing may be structured into at least four areas:
the operating system, the application run-time environment, tools and utilities, and
command language interface. A "birds-of-a-feather" session held and
Supercomputing\ `89 in Reno NV on 16 November, 1989 concentrated on only the
operating system.
.NH
Topics Presented
.PP
A set of four topic areas were presented.
End users require
.I
support
.R
for program creation, debugging, and tuning.
.I
Communication Management
.R
has become an important topic in a heterogeneous computing environment
populated with machines that range from supercomputers
to workstations.
.I
Processor Management
.R
encompasses the problems of choosing from multiple computers that could run
a task.
.I
Storage Management
.R
covers how main memory is used, file system capabilities, and data archiving.
.PP
The group was asked to view them as a starting point
to which they could amplify or add their own concerns.
.ne 3.5i
.NH
"...illities" That We Want
.PP
.mk a
.TS
doublebox;
 c fB
c.
Desired Capabilities
=
Ease of use
_
Performance
Manageability
_
Expandability
Flexibility
Interoperability
Maintainability
_
Portability
Standardization
_
Reliability
Recoverability
_
Security
.TE
.mk b
.sp |\nau
.ift .in 1.6i
.ifn .in 2.5i
The list to the left indicates some of the capabilities
that are desired in a computing system.
Making systems easy to use is a quest for reducing the people cost
associated with running, and programming at a supercomputing center.

Jobs are assigned to supercomputers because of the machine's performance.
Predicting and measuring the performance of tasks is necessary.
System managers should be able to create different levels of service.

We must be able to plan the growth of a center. That may take the
form of expanding an existing machine or adding new ones. A variety
of machines must be supported.

Programs may have useful lifetimes that exceed that of a machine.
Mechanisms and guidelines to increase portability are needed.
.in 0
.sp !\nbu
Long running computation will encounter machine failures.
System support for minimizing lost work is needed.
some tasks may require a dynamic allocation of computing
resources from a distributed computing environment.

A security policy must be provided to insulate users from each other.
These features need to be maintained while
permitting global access to a computing center.
Furthermore, some tasks may require a dynamic allocation of computing
resources from a distributed computing environment.
.ne 2.75i
.NH
User Support
.PP
.mk a
.TS
doublebox;
l.
Debugging
\h'2m'Multi process
\h'2m'Network
\h'2m'Heterogeneous
_
Performance
\h'2m'Prediction
\h'2m'Measurement
\h'2m'Production
_
Seamless Environment
.TE
.mk b
.sp |\nau
.ift .in 1.7i
.ifn .in 2.5i
Multiple process tasks adds a new dimension for bugs to inhabit.
With the state of the computation spread across several processors
cleanly stopping the task and repeatability are difficult to achieve.

Placing the component computations on processors separated by a network
adds another layer of complexity and diminishes the amount
of direct hardware control over a program.

Debugging in a heterogeneous system adds a further complication of trying
to map different hardware capabilities to a common view.
.sp |\nbu
.in 0

The performance of a program should be planned from its initial design.
Programmers need enough information to be able to predict the cost
of the system services that they use.

The actual performance of a code needs to be measured and those
results correlated with the predicted performance.

To insure that programs continue to meet their expected
performance profile measurements must continue throughout production
code runs.
.ne 3i
.NH
Communication Management
.PP
.mk a
.TS
doublebox;
l.
Virtual Resources
\h'2m'Naming
\h'2m'Disk
_
Intertask
\h'2m'On machine
\h'2m'Cross machine
\h'2m'To workstations
_
Gigabit links
\h'2m'Protocols
\h'2m'Global networks
.TE
.mk b
.sp |\nau
.ift .in 1.6i
.ifn .in 2.5i
Computing centers will have multiple supercomputers.
High speed networks are needed to transfer information between those
machines and from the machines to the front end and workstations of the users.

The peripheral devices can be shared across several machines. The NFS
disk sharing protocol is one example of information sharing used in the
mini-computer world. Special attention needs to be given to problems caused by
high interconnecting high performance systems. Naming conventions
are needed to identify these distributed resources without requiring each program
to maintain explicit knowledge of the location of the resources.
.sp |\nbu
.in 0

Standards for building groups of communicating parallel processes will allow
a single task to be spread across several processors. Different solutions to
this problem may be needed for the special cases of groups of processors running
in a shared memory machine and for communication between machines made by
different manufacturers.

Network protocols designed for high speed networks may be needed to augment standard
protocols that were developed primaryly for long haul, high error rate, communication.
Wide area regional network access to supercomputer centers are required by some
user communities.
.ne 4i
.NH
Processor Management
.PP
.mk a
.TS
doublebox;
l.
Authentication
\h'2m'Accounting
\h'2m'Access Control
_
Usage Control
\h'2m'Priority
\h'2m'Researcher's iface
\h'2m'Administrator's iface
_
Load Balancing
\h'2m'CPU Selection
\h'2m'Distributed syscalls
\h'2m'Process migration
\h'2m'Checkpoint/restart
_
Scheduling
\h'2m'Long running jobs
\h'2m'Batch
\h'2m'Interactive
.TE
.mk b
.sp |\nau
.ift .in 1.9i
.ifn .in 2.5i
The high cost of supercomputing usually means that machines must be shared
across cost accounting boundaries. Access control mechanisms are used to
constrain users into using a subset of a center's resources.
These mechanisms must be implemented such that they do not interfere with
the goals of distributed computing.

Researchers need to be able to control their computing costs.
Priority schemes allow a trade off between the cost and completion time.
Whereas researchers probably want to conserve funds administrators
want to minimize idle (unpaid for) resources. This may lead to significant
differences between resource control mechanisms for those two groups.

In a heterogeneous computing environment it may be the case that some tasks
can execute (with differing performance metrics) on any of a variety of
machines.
It was mentioned that a system which permits distribution of system calls
can facilitate load sharing.
Process migration allows tasks to adapt to a changing pattern of resource
availability.
.sp |\nbu
.in 0

The scheduling of work needs to be considered with a view of the entire computing
center. The presence of high-speed long haul networks may even permit
load shifting from one center to another. A balance is needed between
machine efficient batch jobs and interactive computation.
Finally tasks that take a large amount of wall time, exceeding weeks of supercomputer
time, will require special scheduling consideration.
.ne 3i
.NH
Memory Management
.PP
.mk a
.TS
doublebox;
l.
Main Memory
\h'2m'Coordination
\h'2m'Swapping & Paging
_
I/O Scaling
\h'2m'Cache control
\h'2m'Intelligent disk
_
File System
\h'2m'RAID disk
\h'2m'Data compression
\h'2m'Beyond gigabytes
_
Archiving
.TE
.mk b
.sp |\nau
.ift .in 2i
.ifn .in 2.3i
Supercomputers have enormous main memory systems. Supercomputer
applications seem to want all of that memory for themselves.
Algorithms for allocation of main memory and system calls to
address memory as a negotiable resource are needed to grow
from the mini-computer environment where a task can be moved to
secondary storage in a few seconds.

Over time processor performance improvements have outstriped those
in the storage area. Complex I/O systems that grow with
increased system capabilities may be needed to preserve a balance between
computation and I/O.
.sp |\nbu
.in 0

Newer technology will force change in the ways that storage is used.
Data compression techniques have been used to increase the recording
capacity of disk drives. Perhaps these same techniques can extend
network and channel bandwidth by moving data in a compressed form.
Individual data sets will grow as computing capacity increases

The retention of data for years will need standards that outlive
the machines that originally generated the data.
.ne 3.5i
.NH
Resource Control and Administration
.PP
.mk a
.TS
doublebox;
l.
Brokering
\h'2m'CPU cycles
\h'2m'Processor type
\h'2m'Disk storage
\h'2m'Disk bandwidth
_
System Updates
\h'2m'Internal OS interfaces
\h'2m'Customization guides
\h'2m'Site configuration
_
Documentation
\h'2m'Vendor
\h'2m'User generated
.TE
.mk b
.sp |\nau
.ift .in 2i
.ifn .in 2.7i
The identification of available excess resources and their assignment
to interested parties can be accomplished through the actions of a
software system called a resource broker. This lessens the need for individual
processors to get involved in strategic resource assignment.

Because supercomputing installations generally differ portions of the system
need to be tailored to each site's requirements. Some times
this includes inserting extensions into the base systems's service repertoire.
Guidelines for this type of modification are needed to ease the burden
of re-integration when the vendor updates system software.

The presence of high-quality online documentation helps educate the user
community and should yield better utilization of existing software and
hardware resources. Attention needs to be given to providing the same
quality documentation delivery system to be used by local authors.