Services

Included in this document :
Improving the Measures of RAS | Standards | Processes | Technology

Measuring System Management Performance

Management of a mission-critical production computing environment is all about the goals of Reliability, Availablilty, and Serviceability (RAS). These have been the benchmarks for evaluating system management practices since the mainframe environments of the 1960s.

The goals of RAS were addressed in the mainframe environment primarily by one manufacturer dictating standards (both good and bad). This problem is far more complex in the open systems world where there are many manufacturers and standards bodies, and further exacerbated by the sheer scale of the distributed environments which we must manage. Such flexibility requires careful management. We must devise a model for the flexible, disciplined management of distributed computing environments. In order to attain the goals of RAS, we must seek to maximise predictability.

The quest for predictability is fought on three fronts; Standards, Processes and Technology.

Standards
With Standards, we seek to improve predictability through consistency. By improving the conceptual integrity of a site, we reduce downtime due to trouble-shooting and entropy. We keep hosts, subsystems and software versions in sync in order to support a less diverse environment.

Processes
With Processes, we seek to improve predictability by ensuring that system maintenance activities follow known paths which include quality assurance steps such as peer review, impact analysis and deployment planning. This is predictability through planning.

Technology
With Technology, we seek to improve the manner in which we manage our environment. From simple tools which automate functions and hence improve consistency of results, through tools which seek to reduce the management effort in real terms through a paradigm shift in the way we perform higher level tasks.

So, where RAS are the measures (we can measure quantities such as Mean Time Between Failures and Application Availablity), SPT are the strategies by which we seek to improve those measures. Moreover, by allocating a cost to downtime, we can quantify and hence evaluate the effectiveness of our strategies.

The SysAdmin Group provides experience, methodologies and tools to address Standards, Processes and Technology, and hence improve the measures of Reliability, Availability, and Serviceablilty.

Improving the Measures of RAS

The measures of RAS are important ones. Downtime costs a company real money, and lots of it. There are all the wages you have paid for people to sit around waiting, subcontractor's fees, utility services, and that's all before you look at the cost in terms of slipped schedules, missed deadlines, and missed opportunities.

    Some companies count this cost at half a million dollars a day.
In our modern, interconnected world, users won't put up with unexpected downtime due to a disk crash, and a company can lose not just productivity, but business opportunities if a central computer or their web site is down or inaccurate.

The way out is simple; systems management disciplines. The creation of a Standard Operating Environment (SOE), controlled management processes, and the implementation of a formal systems management model.

At The SysAdmin Group, we provide the Standards, Processes and Technologies to implement a better way. We implement the infrastructure of the systems administration profession.

We can develop an SOE appropriate for your organisation, develop key processes such as Change Management, Production Acceptance and Problem Management processes, and train your existing staff on the use and benefits of these.

We also provide advanced toolkits which form the framework for controlling the functionality and performance of your hosts and network.

Standards

System administration, in essence, is the integration of diverse vendor products into a working business solution. Five corporations could all buy identical hardware and software, and yet each site will be entirely unique. System administration is the point where all the disparate needs and characteristics of the hardware, software, network, users, corporate policies and business environment collide.

System administration is all about intricacy - the complex interplay of large interrelated components. System Administrators deal with volumes of highly technical information, hundreds of products, thousands of commands, and the individual, changing needs of corporations, departments, users and customers.

There is a trade-off that corporations must face. The more diverse the environment they support, the greater the potential that may exist for personal productivity, but also exponentially greater is the complexity of the environment which must be supported.

Standards, and the definition of a workable Standard Operating Environment (SOE) is an essential part of controlling this complexity. A balance must be struck between individual freedom and the number of products supported.

On a smaller scale, the SOE should dictate things such as the standard filesystem layout of servers, user and host naming conventions, the location of shared information, and a myriad of other factors, all of which contribute to predictability through consistency.

Processes

There are three key processes which form the basis for the quality management of a modern production computing environment;

  1. Change Management (CM) Process.
    The Change Management process is intended to maximise the availability of the existing production environment. It forces the careful planning, peer review, post-change testing and user acceptance of any changes to the production environment.

    No matter how "stable" a production environment is in theory, in practice there is always ongoing change taking place which does not affect or augment the underlying functionality of the system. The CM process ensures the integrity of such changes by forcing careful planning and impact analysis of any proposed change. It guides support staff through a controlled learning exercise with respect to the potential impact of a change on one or more elements of the production environment.

  2. Production Acceptance (PA) Process.
    By contrast to the CM process, the Production Acceptance process is intended to guide support staff through a controlled learning exercise with respect to the introduction of a new element into the production environment.

    The PA process provides a framework for introducing change into the production environment in a manner which is controlled, predictable and auditable. This process seeks to ensure maximum availability of systems and maximum customer satisfaction with a minimum amount of ongoing intervention by support staff. This involves learning what it means to manage and support the new product, and then introducing it into production in a controlled manner.

  3. Problem Management (Helpdesk) Process.
    No matter how we might try, and even with the help of the CM and the PA processes, things will always go awry. In such a case, it is a user who will often notice the problem first.

    There needs to be a clearly defined process for the accepting, handling and tracking of all user complaints, requests and suggestions such that they can be reacted to according to defined criteria, and in a quality assured manner.

The PM process should not only ensure timely response to user problems, but should provide valuable statistics to management on the progress and effectiveness of support staff, and the satisfaction of the user base.

Technology

Ultimately, system administration is about automation. System adminstrator's do not want to spend their days solving the same problems over and over. They want to automate their role, and spend their efforts being proactive and extending the system's functionality rather than just fire fighting.

It is important that such automation happens in a systemic, self-documenting fashion, or else this effort will have to be repeated, and the automation will merely be a transformation of the problem, rather than a solution.

Automation, thus, must be thought of as the encapsulation of local system administration knowledge and procedures into tools.

Central to the successful automation of duties is a framework within which to capture this knowledge, so that the changing requirements of the environment will not cause this work to be wasted.

SysAdmin supply the GHOST Framework and Toolkit, to assist you towards this goal.



Home | Services | Clients | White Papers | About Us | Site Map | Contact Us
©1998 The SysAdmin Group Pty Ltd. All Rights Reserved.