Recommendations: Space Shuttle Independent Assessment team, Report to Associate Administrator, Office of Space Flight Flight

Section 5: Recommendations

Category 1: Immediate
Prior to Return to Flight

1. The reliability of the wire visual inspection process should be quantified (success rate in locating wiring defects may be below 70% under ideal conditions).

2. Wiring on OV102 at Palmdale should be inspected for wiring damage in difficult-to-inspect regions. If any of the wires checked are determined to be especially vulnerable, they should be re-routed, protected, or replaced.

3. The 76 CRIT 1 areas should be reviewed to determine the risk of failure and ability to separate systems when considering wiring, connectors, electrical panels, and other electrical nexus points. Each area that violates system redundancy should require a program waiver that outlines risk and an approach for eliminating the condition. The analysis should assume arc propagation can occur and compromise the integrity of all affected circuits. Another concern is that over 20% of this wiring can not be inspected due to limited access; these violation areas should as a minimum, be inspected during heavy maintenance and ideally be corrected.

4. The SSP should review all waivers or deferred maintenance to verify that no compromise to safety or mission assurance has occurred.

Category 2: Short Term
Prior to making more than four more flights

1. NASA should expand existing data exchange and teaming efforts with other governmental agencies especially concerning age effects.

2. A formal Aging and Surveillance Program should be instituted.

3. NASA and USA quality inspection and NASA engineers should review all CRIT 1 system repairs.

4. The failure of all CRIT 1 units should be fully investigated and corrected without waivers.

5. All testing of units must be minimized and documented as part of their total useful life. Similarly, maintenance operations must be fully documented.

6. The SIAT recommends comprehensive re-examination of maintenance and repair actions for adequate verification requirements (e.g., visual, proof test, or green run).

7. Human error management and development of safety metrics, e.g., Kennedy Space Center Shuttle Processing Human Factors team, should be supported aggressively and implemented program-wide.

8. Communications between the rank and file work force, supervisors, engineers and management should be improved.

9. NASA should expand on the Human Factors research initially accomplished by the SIAT and the Air Force Safety Center. This work should be accomplished through a cooperative effort including both NASA and AFSC. The data should be controlled to protect the privacy of those taking the questionnaires and participating in interviews. Since major failures are infrequent occurrences, NASA needs to include escapes and diving catches (see Appendix 3) in their human factors assessments.

10. Maintenance practices should be reviewed to identify and correct those that may lead to collateral damage.

11. Shuttle actuator soft goods should be adequately wetted to prevent downtime seepage.

12. Tank time and cycle data must be carefully logged to ensure safe life criteria are not exceeded.

13. Critical operations, especially those involving Self-Contained Atmospheric Protective Ensembles, must be staffed with technicians specifically experienced and properly trained with the operations.

14. Fleet Leader testing must be carefully scrutinized to ensure adequate simulation of operating conditions, applicability to multiple sub-systems, and complete documentation of results.

15. Vendor supplied training should be evaluated for all critical flight hardware.

16. The true mission impact of a second main engine pin failure (internal engine foreign object debris) during flight, similar to that which took place last July, should be determined.

17. The SSP should consider more frequent lot sample hot fire testing of the Solid Rocket Booster motor segments at full-scale size to improve reliability and safety and verify continued grain quality.

18. An independent review process, utilizing NASA and external domain experts, should be institutionalized.

19. NASA, USA, and the SSP element contractors should develop a Risk Management Plan and guidance for communicating risk as an integrated effort. This would flow SSP expectations for risk management down to working level engineers and technicians, and provide insight and references to activities conducted to manage risk.

20. Risk assessment matrix and Failure Modes and Effects Analysis should be updated based on flight failure experience, aging and maintenance history, and new information (e.g., wiring, hydraulics, etc.).

21. The SSP should revise the risk matrix for probable and infrequent likelihood for critical 1R** and 1R* severity to require a greater level of checkout and validation.

22. NASA Safety and Mission Assurance surveillance should be restored to the Shuttle Program as soon as possible.

23. The Safety & Mission Assurance role should include: mandatory participation on Prevention/Resolution Teams and in problem categorization, investigation of escapes and diving catches (see Appendix 3), and dissemination of lessons learned.

24. The SIAT believes that software systems (flight, ground, and test) deserve a thorough follow-on evaluation

25. Due to time constraints, the SIAT only examined Orbiter wiring; many other systems associated with the Shuttle also have critical wiring. The findings and recommendations in this report are applicable to all Shuttle systems, but unique conditions that may require additional actions.

26. During the inspection of wiring, several connector issues were also apparent. Loose connector backshells and wire strain relief that can potentially chafe wiring were noted. Under certain conditions loose backshells can compromise electrical bonding between shielding and structure. Movement of the backshell can also cause chafing between the wiring and strain relief. In either case, these are unacceptable conditions and should be eliminated by periodic inspection and connector design.27. Arc track susceptibility of aged wiring and circuit protection devices that are sensitive to arcing should be evaluated.

28. The need to examine wiring in areas that are protected or where damage may be induced by physical wiring inspection should be evaluated. Wiring should be continuously evaluated by conducting extensive electrical verifications on systems. When wiring damage is found in an area previously not examined, the remaining Orbiters should also be inspected

29. Wire aging characteristics should be evaluated, including hydrolysis damage, loss of mechanical properties, insulation notch propagation, and electrical degradation. Testing should be performed by an independent laboratory.

30. A database that continually evaluates wiring system redundancy for the current design, modifications, repairs, and upgrades should be maintained. System safety should evaluate the overall risk created by wiring failures

31. NASA engineering should specifically participate in industry and government technology development groups related to wiring. The SAE AE-8 committees (specifically A and D) are excellent forums for identifying wiring issues.

32. Wiring subjected to hypergolic contamination should be replaced since high pH fluids are known to degrade polyimide type wire insulation.

33. The current quality assurance program should be augmented with additional experienced NASA personnel.

34. Technician/inspector certification should be conducted by specially trained instructors, with the appropriate domain expertise.

35. The SIAT recommends an evaluation of depot repair documentation be performed to determine if the transition process attained a necessary and sufficient set of vendors for each Line Replaceable Unit, Shop Replaceable Unit, and special test equipment.

36. Teamwork and team support should be enhanced to mitigate some of the negative effects of downsizing and transition to Shuttle Flight Operations Contract. Most immediately needed is the provision of relief from deficits in core competencies, with appropriate attention to the need for experience along with skill certification. Further development of the use of cross-training and other innovative approaches to providing >on-the-job training in a timely way should be investigated.

37. Work teams should be supported through improved employee awareness of stresses and their effect on health and work. Workload and “overtime” pressures should be mitigated by more realistic planning and scheduling; a serious effort to preserve “quality of life” conditions should be made.

Category 3I: Intermediate term
Prior to January 1, 2001

1. Standard repairs on CRIT1 components should be completely documented and entered in the Problem Resolution and Corrective Action system.

2. The criteria for and the tracking of standard repairs, fair wear and tear issues, and their respective FMEA/CIL’s should be re-examined.

3. The SIAT recommends comprehensive re-examination of maintenance and repair actions for adequate verification requirements (e.g., visual, proof test, or green run).

4. The avionics repair facility should be brought up to industry standards.

5. Selected areas of staffing need to be increased (e.g., the Aerospace Safety Advisory Panel advised 15 critical functional areas are currently staffed one deep).

6. The SIAT recommends that the SSP implement the Aerospace Safety Advisory Panel recommendations. Particular attention should be paid to recurring items.

7. The SIAT believes that Aerospace Safety Advisory Panel membership should turnover more frequently to ensure an independent perspective.

8. The root cause(s) for the decline in the number of problems being reported to the Problem Resolution and Corrective Action system should be determined, and corrective action should be taken if the decline is not legitimate.

9. The root cause(s) for the missing problem reports from the Problem Resolution and Corrective Action system concerning Main Injector liquid oxygen Pin ejection, and for inconsistencies of the data contained within the existing problem reports should be determined. Appropriate corrective action necessary to prevent recurrence should be taken.

10. A rigorous statistical analysis of the reliability of the problem reporting and tracking system should be performed.

11. Reporting requirements and processing and reporting procedures should be reviewed for ambiguities, conflicts, and omissions, and the audit or review of system implementation should be increased.

12. The SSP should revise the Problem Resolution and Corrective Action database to include integrated analysis capability and improved problem classification and coding. Also, improve system automation in data entry, trending, flagging of problem recurrence, and identifying similar problems across systems and sub-systems.

13. All critical data bases (e.g., waivers) need to be modernized, updated and made more user friendly.

14. There are a number of cryogenic fluid mechanical joints and hot-gas mechanical joints that represent potential risks that should therefore be examined in detail.

15. All internal Foreign Object Debris (e.g., pins) occurrences during the program should be listed, with pertinent data on date of occurrence, material, and mass. The internal Foreign Object Debris FMEA/CIL’s and history should be reviewed and the hazard categorized based on the worst possible consequence.

16. Any type of engine repair that involves hardware modification -- no matter how minor (such as liquid oxygen post pin deactivation) -- should be briefed as a technical issue to the program management team at each Flight Readiness Review. The criticality of a standard repair should not be less than basic design criticality, based on worst case consequences, and all failures of standard repairs should be documented and brought to the attention of the Material Review Board.

17. The design and the post Solid Rocket Booster recovery inspection and re-certification for flight should be looked at and analyzed in careful detail by follow-on independent reviews.

18. The inspection and proof-test logic to screen for flaws or cracks in the Super-Light-Weight Tank should be reviewed in light of the reversal in fracture-stress-against-flaw-size between room and cryogenic temperatures.

19. The SSP should explore the potential of adopting risk-based analyses and concepts for its critical manufacturing, assembly, and maintenance processes, and statistical and probabilistic analysis tools as part of the program plans and activities. Examples of these analyses and concepts are Process FMEA/CIL, Assembly Hazard Analysis, Reliability Centered Maintenance, and On Condition Maintenance.

20. Failure analysis and incident investigation should identify root cause and not be artificially limited to a sub-set of possible causes.

21. Software requirements generated by Shuttle system upgrades must be addressed

22. Enhanced software tools should be considered for potential improvements in reliability and maintainability as systems are upgraded.

23. An assessment of using lower fatigue-crack-growth thresholds and their impact on fracture critical parts or components needs to be reviewed to establish life and verify the inspection intervals. Retardation and acceleration model(s) should be used to assess the type of crack-growth history under the Orbiter spectra.

24. Assessments of the impact of any new Orbiter flight loads on structural life should continue as responsibility for the Orbiter structure is transferred to the contractor.

25. The Orbiter Corrosion Control Review Board should consider incorporating the framework suggested by the Federal Aviation Administration for Corrosion Prevention and Control Plans of commercial airplane operators into their corrosion database to provide focus to the more serious occurrences of corrosion.

26. Hidden corrosion problems require a proactive inspection program with practical and reliable non-destructive evaluation techniques; at this point, this inspection is done on a randomized basis. An assessment of the impact of hidden (or inaccessible) corrosion and the repairs of identified corrosion on the integrity of the Orbiter structure should to be made.

27. Current wire inspection and repair techniques should be evaluated to ensure that wire integrity is maintained over the life of the Shuttle vehicles. Several new inspection techniques are available that use optical, infrared, or electrical properties to locate insulation and conductor damage, and should be explored for use on the Shuttle.

28. All CRIT 2 circuits should be reviewed to determine to what extent redundancy has been compromised in wiring, connectors, electrical panels and other electrical nexus points. The primary concern is that single point failure sources may exist in the original design or have been created by system upgrades or modifications.

29. The Shuttle program should form a standing wiring team that can monitor wire integrity and take program wide corrective actions. The team should include technicians, inspectors, and engineering with both contractor and government members. The chair of the team should have direct accountability for the integrity of the Shuttle wiring. One area that should be evaluated is the techniques that can detect an exposed conductor that has not yet developed into an electrical short.

30. The long term use of primarily polyimide wiring should be minimized, and wire insulation constructions that have improved properties should be evaluated and compared to the current wire insulation used on the Shuttle program. Alternate wire constructions should be considered for modifications/ repairs/upgrades. There are several aerospace wire insulation constructions that can provide more balanced properties.

Category 3L: Long term
Prior to January 1, 2005

1. Where redundancy is used to mitigate risk, it should be fully and carefully implemented and verified. If it cannot be fully implemented due to design constraints, other methods of risk mitigation must be utilized.

2. Serious consideration should be given to replacing the hydrazine power unit with a safer and easier to maintain advanced electric auxiliary power unit for the Thrust Vector Control hydraulic unit.

3. Due to obsolescence, Shuttle Reaction Control System propellant valves and propellant flight-half couplings should be replaced with ones that are more tolerant of the oxidizer environment.

4. The Problem Resolution and Corrective Action system should be revised using state-of–the-art database design and information management techniques.

5. Inspection technique(s) for locating corrosion under the tiles and in inaccessible areas should be developed.

6. Consideration should be given to modifying the Shuttle internal hydraulic line routing to the mold line to permit efficient facility hydraulic hose connections.

7. Non-intrusive methods of reliably detecting wiring damage should be developed, including those areas not accessible to visual inspection.

8. Quantitative methods of risk assessment (likelihood of failure) should be developed.

9. Quantitative measures of safety (likelihood of error), including assessment surveying techniques should be developed, e.g., Occupational Stress Inventory and MEDA.

10. Quantitative methods of risk assessment and safety (see above) need to be integrated to develop the ability to perform trade-off studies on the effect of new technology, aging, upgrades, process changes, etc. , upon vehicle risk.

Please follow SpaceRef on Twitter and Like us on Facebook.