Executive Summary, Space Shuttle Independent Assessment team, Report to Associate Administrator, Office of Space Flight

Section 1 Executive Summary

The Shuttle program is one of the most complex engineering activities undertaken anywhere in the world at the present time. The Space Shuttle Independent Assessment Team (SIAT) was chartered in September 1999 by NASA to provide an independent review of the Space Shuttle sub-systems and maintenance practices. During the period from October through December 1999, the team led by Dr. McDonald and comprised of NASA, contractor, and DOD experts reviewed NASA practices, Space Shuttle anomalies, as well as civilian and military aerospace experience.

In performing the review, much of a very positive nature was observed by the SIAT, not the least of which was the skill and dedication of the workforce. It is in the unfortunate nature of this type of review that the very positive elements are either not mentioned or dwelt upon. This very complex program has undergone a massive change in structure in the last few years with the transition to a slimmed down, contractor-run operation, the Shuttle Flight Operations Contract (SFOC). This has been accomplished with significant cost savings and without a major incident. This report has identified significant problems that must be addressed to maintain an effective program. These problems are described in each of the Issues, Findings or Observations summarized below, and unless noted, appear to be systemic in nature and not confined to any one Shuttle sub-system or element. Specifics are given in the body of the report, along with recommendations to improve the present systems.

Issue 1

NASA must support the Space Shuttle Program with the resources and staffing necessary to prevent the erosion of flight-safety critical processes.

Human rated space transportation implies significant inherent risk. Over the course of the Shuttle Program, now nearing its 20 th year, processes, procedures and training have continuously been improved and implemented to make the system safer. The SIAT has a major concern, reflected in nearly all of the subsequent "Issues", that this critical feature of the Shuttle Program is being eroded. Although the reasons for this erosion are varied, it appears to the SIAT that a major common factor among them is the reduction in allocated resources and appropriate staff that ensure these critical processes and procedures are being rigorously implemented and continually improved.

The SIAT feels strongly that workforce augmentation must be realized principally with NASA personnel rather than with contract personnel. The findings show that there are important technical areas that are staffed "one-deep". The SSP should assess not only the quantity of personnel needed to maintain and operate the Shuttle at anticipated future flight rates, but also the quality of the workforce required in terms of experience and special skills. In the recent fleet wiring investigation, work force skill shortages created the need to use Quality Assurance personnel inexperienced in wiring issues to perform critical inspections. Note that increasing the work force carries risk with it until the added work force acquires the necessary experience.

Issue 2

The past success of the Shuttle program does not preclude the existence of problems in processes and procedures that could be significantly improved.

The SIAT believes that another factor in the erosion referred to in Issue 1 is success-engendered safety optimism. The SIAT noted several examples of what could be termed an inappropriate level of comfort with certain apparently successful "acceptance of risk" decisions made by the program. One example was the number of flights with pinned liquid oxygen injectors flown without prior hot-fire testing that did not experience pin ejection before the STS-93 pin ejection rupture incident. These successful flights created a false sense of security that pinning an injector could be treated as a standard repair. There were 19 incidences of pin ejection that did not result in nozzle rupture prior to STS-93 and this created an environment that led to the acceptance of risk. Similarly the wire damage that led to the short on STS-93 is suspected to have been caused 4 to 5 years prior to the flight. The SSP must rigorously guard against the tendency to accept risk solely because of prior success.

Issue 3

The SSPís risk management strategy and methods must be commensurate with the 'one strike and you are out' environment of Shuttle operations.

While the Shuttle has a very extensive Risk Management process, the SIAT was very concerned with what it perceived as Risk Management process erosion created by the desire to reduce costs. This is inappropriate in an area that the SIAT believes should be under continuous examination for improvement in effectiveness with cost reduction being secondary. Specific SIAT findings address concerns such as: moving from NASA oversight to insight; increasing implementation of self-inspection; reducing Safety and Mission Assurance functions and personnel; managing risk by relying on system redundancy and abort modes; and the use of only rudimentary trending and qualitative risk assessment techniques. It seemed clear to the SIAT that oversight processes of considerable value, including Safety and Mission Assurance, and Quality Assurance, have been diluted or removed from the program. The SIAT feels strongly that NASA Safety and Mission Assurance should be restored to the process in its previous role of an independent oversight body, and not be simply a "safety auditor." The SIAT also believes that the Aerospace Safety Advisory Panel membership should turnover more frequently to ensure an independent perspective. Technologies of significant potential use for enhancing Shuttle safety are rapidly advancing and require expert representation on the Aerospace Safety Advisory Panel. While system redundancy is a very sound element of the program, it should not be relied upon as a primary risk management strategy; more consideration should be given to risk understanding, minimization and avoidance. It was noted by the SIAT that as a result of choices made during the original design, system redundancy had been compromised in 76 regions of the Orbiter (300+ different circuits, including 6 regions in which if wiring integrity was lost in the region, all three main engines would shut down). These were design choices made based on the technology and risk acceptance at that time. Some of these losses of redundancy may be unavoidable; others may not be. In either case, the program must thoroughly understand how loss of system redundancy impacts vehicle safety.

Issue 4

SSP maintenance and operations must recognize that the Shuttle is not an 'operational' vehicle in the usual meaning of the term.

Most aircraft are described as being "operational" after a very extensive flight test program involving hundreds of flights. The Space Shuttle fleet has only now achieved one hundred flights and clearly cannot be thought of as being "operational" in the usual sense. Extensive maintenance, major amounts of "touch labor" and a high degree of skill and expertise by significant numbers of technician and engineering staff will be always required to support Shuttle operations. Touch labor always creates a potential for collateral and inadvertent damage. In spite of the clear mandate from NASA that neither schedule nor cost should ever be allowed to compromise safety, the workforce has received a conflicting message due to the emphasis on achieving cost and staff reductions, and the pressures placed on increasing scheduled flights as a result of the Space Station. Findings of concern to the SIAT include: the increase in standard repairs and fair wear and tear allowances; the use of technician and engineering "pools" rather than specialties; a potential complacency in problem reporting and investigation; and the move toward structural repair manuals as used in the airline industry that allow technicians to decide and implement repairs without engineering oversight. The latter practice has been implicated in a number of incidents that have occurred outside of NASA (Managing the Risks of Organizational Accidents, Chapter 2, p. 21). When taken together these strategies have allowed a significant reduction in the workforce directly involved in Shuttle maintenance. When viewed as an experimental / developmental vehicle with a "one strike and you are out" philosophy, the actions above seem ill advised.

Issue 5

The SSP should adhere to a 'fly what you test / test what you fly' methodology.

While the "fly what you test / test what you fly" methodology was adopted by the Shuttle Program as a general operational philosophy, this issue arose specifically with the Space Shuttle Main Engine (SSME). For the SSME, fleet leader and hot-fire (green-run) testing are used very effectively to manage risk. However, the concept must be rigorously adhered to. Recent experience, for instance the pin ejection problem, has shown a breakdown of the process. An excellent concept, the fleet leader is also applicable to other systems, but its limitations must be clearly understood. In some cases (e.g., hydraulic testing, avionics, Auxiliary Power Unit) the SIAT believes that the testing is not sufficiently realistic to estimate safe life.

Issue 6

The SSP should systematically evaluate and eliminate all potential human single point failures.

In the past, the Shuttle Program had a very extensive Quality Assurance program. The reduction of the quality assurance activity ("second set of eyes") and of the Safety & Mission Assurance function ("independent, selective third set of eyes") increases the risk of human single point failures. The widespread elimination of Government Mandatory Inspection Points, even though the reductions were made predominantly when redundant inspections or tests existed, removed a layer of defense against maintenance errors. Human errors in judgment and in complying with reporting requirements (e.g., in or out of family) and procedures (e.g., identification of criticality level) can allow problems to go undetected, unreported or reported without sufficient accuracy and emphasis, with obvious attendant risk. Procedures and processes that rely predominantly on qualitative judgements should be redesigned to utilize quantitative measures wherever possible. The SIAT believes that NASA staff (including engineering staff) should be restored into the system for an independent assessment and correction of all potential single point failures (see also the concerns concerning the Safety and Mission Assurance function in Issue 3).

Issue 7

The SSP should work to minimize the turbulence in the work environment and its effects on the workforce.

Findings support the view that the significant number of changes experienced by the Shuttle Program in recent years have adversely affected workforce morale or diverted workforce attention. These include the change to Space Flight Operations Contract, the reduction in staffing levels to meet Zero Based Review requirements, attrition through retirement, and numerous re-organizations. Ongoing turbulence from cyclically heavy workloads and continuous improvement initiatives (however beneficial) were also observed to stress the workforce. While the high level workforce performance required by the Shuttle program has always created some level of workforce stress, the workforce perception is that this has increased significantly in the last few years. Specifically, the physical strain measured in the Marshall Space Flight Center workforce significantly exceeded the national norm, whereas the job stress components (e.g., responsibility levels, physical environment) were near normal levels. This typically indicates the workforce is internalizing chronic instability in the workplace. Similarly, feedback from small focus groups at Kennedy Space Center indicates unfavorable views of communication and other factors of the work environment. Clearly, from a health perspective, one would seek to reduce employee stress factors as much as possible. From a vehicle health perspective, stressed employees are more likely to make errors by being distracted while on the job, and to be absent from the job (along with their experience) as a result of health problems.

The SIAT believes that the findings reported here in the area of work force issues parallel those that were noted by the Aerospace Safety Advisory Panel. The SIAT is concerned that in spite of the Aerospace Safety Advisory Panel findings and recommendations, supported by the present review, these problems remain.

Issue 8

The size and complexity of the Shuttle system and of the NASA/contractor relationships place extreme importance on understanding, communication, and information handling.

In spite of NASA's clear statement mandate on the priority of safety, the nature of the contractual relationship promotes conflicting goals for the contractor (e.g., cost vs. safety). NASA must minimize such conflicts. To adequately manage such conflicts, NASA must completely understand the risk assumptions being made by the contractor workforce. Furthermore, the SIAT observed issues within the Program in the communication from supervisors downward to workers regarding priorities and changing work environments.

Communication of problems and concerns upward to the SSP from the "floor" also appeared to leave room for improvement. Information flow from outside the program (i.e., Titan program, Federal Aviation Administration, ATA, etc.) appeared to rely on individual initiative rather than formal process or program requirements. Deficiencies in problem and waiver tracking systems, "paper" communication of work orders, and FMEA/CIL revisions were also apparent. The program must revise, improve and institutionalize the entire program communication process; current program culture is too insular in this respect .

Additionally, major programs and enterprises within NASA must rigorously develop and communicate requirements and coordinate changes across organizations, particularly as one program relies upon another (e.g., re-supplying and refueling of International Space Station by Space Shuttle). While there is a joint Program Review Change Board (PRCB) to do this, for instance on Shuttle and Space Station, it was a concern of the SIAT that this communication was ineffective in certain areas.

Issue 9

Due to the limitations in time and resources, the SIAT could not investigate some Shuttle systems and/or processes in depth.

Follow-on efforts by some independent group may be required to examine these areas (e.g., other propulsion elements, such as the Reusable Solid Rocket Motor, Solid Rocket Booster, External Tank, Orbiter Maneuvering System, and Reaction Control System, and other wiring elements besides those in the Orbiter). This independent group should also review the SSP disposition of the SIAT findings and recommendations.

The Shuttle Upgrades program creates the opportunity to correct many of the observed deficiencies, e.g., the 76 areas of compromised redundancies (300+ circuits), and to incorporate design for maintainability and continuous improvement. However, without careful systems integration and prioritization, some of the deficiencies observed by the SIAT will be exacerbated, e.g., in wiring, hydraulics, software, and maintenance areas. Additionally, the elements of maintenance must be rigorously analyzed, including training, maintainability, spares support maintenance, and accessibility.

Return to Flight

The SIAT was asked by the SSP for its views on the return to flight of STS-103. The SIAT had earlier considered this question and had concluded that a suitable criterion would be that STS-103 should possess less risk than, for example, STS-93. In view of the extensive wiring investigation, repairs and inspections that had occurred this condition appeared to have been satisfied. Furthermore, none of the main engines scheduled to fly have pinned Main Injector liquid oxygen posts. The SIAT did suggest that prior to the next flight the SSP make a quantitative assessment of the success of the visual wiring inspection process. In addition, the SIAT recommended that the SSP pay particular attention to inspecting the 76 areas of local loss of redundancy and carefully examine the OV102 being overhauled at Palmdale for wiring damage in areas that were inaccessible on OV103. Finally, the team suggested that the SSP review in detail the list of outstanding waivers and exceptions that have been granted for OV103. The SSP is in the process of following these specific recommendations and so far has not reported any findings that would cause the SIAT to change its views.

Shortly before completing this report , the SIAT was gratified to learn that a number of steps had been taken by NASA to rectify a number of the adverse findings reported above. Of particular note was the strengthening of the NASA Quality Assurance function for the Shuttle at Kennedy Space Center. Upon completion of STS-103, the SIAT was pleased to learn that only two orbiter in-flight anomalies were experienced, a reduction from past trends (see Appendix 11).

Please follow SpaceRef on Twitter and Like us on Facebook.