An Overview of the Reliability Analysis Methods of Tunneling Equipment

The absolute prevention of damage occurrence is not possible, thus reducing the probability of failure in a system and its impact is very important regarding the operation of a whole system. A failure in a system or in its subsystems makes negative results such as the stop in the production process, rising labor costs, and increasing the cost of maintenance. Reliability, in recent years, is mentioned as one of the most significant aspects of the quality of goods and services. In the past, reliability concerned sensitive and complex industries such as military, nuclear, and aerospace where the lack of their reliability could cause irreparable damage to the entire system. However, today it has become a universal concern. Tunneling equipment has grown in size and complexity and therefore, lack of reliability may cause massive costs to this equipment. Therefore, reliability determination in order to identify the components and subsystems with low reliability is essential. The aim of this study is to review the methods of tunneling equipment reliability analysis including statistical analysis, failure mode and effects analysis, Markov and fault tree methods. In addition, previous available research on the reliability analysis of tunneling equipment is presented.


INTRODUCTION
The increasing rate of mechanization and automation in tunneling equipment in recent decades has made it increasingly important to obtain reliable operating systems in these industries. These systems are composed of different and interconnected subsystems whose performance is mainly influenced by the availability, reliability, reparability, and capacity of the subsystems [1]. Engineers and technical managers in a modern society are responsible for planning, designing, building, and operating from the simplest product to the most sophisticated systems. Failure of systems causes disruption at various levels and poses a threat to industries. Therefore, products and systems are expected to be reliable and secure. Reliability is a general engineering index to evaluate the confidence of the performance of different engineering systems. The index is widely used in all branches of science and technology, including aerospace engineering, military weapons engineering, telecommunications engineering, nuclear power plants, transportation networks, and power transmission networks. [2].
Reliability is the expectancy in which a system or subsystem operates under a stated condition for a certain time. In other words, reliability is the probability and frequency of failures in a system or subsystem [3]. This study provides a de-tailed overview of the reliability analyses methods used for tunneling and excavation equipment. In this regard, methods that are usually used to analyze the reliability of tunneling and excavation equipment were described in detail. In addition, the previous available research on the reliability analysis of tunneling equipment was discussed.

RELIABILITY
The most common definition of reliability is "the ability of a subsystem to perform a task required under the conditions given for a specified time interval" [4]. Therefore, identifying the different tasks of a subsystem is essential, which means that different reliabilities can be calculated for a subsystem. Since a subsystem can have multiple tasks, reliability can be calculated for each task separately. It is also necessary to identify the different predictable conditions and operating modes as well as the use and non-use of subsystems (systems, equipment, components, etc.) in the characteristic phase required in system design [5].
Some units and terms are used to determine reliability. Mean Time Between Failures (MTBF) is a common term used for repairable components. The term Mean Time To Failure (MTTF) also refers to components that are not repairable, meaning that they are complete [6]. Reliability can be considered as a possibility of equipment failure. Therefore, data is needed to determine equipment failure rates. The most common information needed to calculate reliability is the Time Between Failures (TBF) and the Time To Repair (TTR). The probabilistic behavior of an item's reliability according to the above definitions is summarized as follows [4]: whereas F(t) is the probability of failure of the item at time zero to t, and ƒ(t) is the probability density function of time between failures. Given that F(t) is a function of uncertainty, the reliability can be stated as follows: The failure rate is also defined as follows: The failure rate is an important function in reliability analysis because it indicates the probability of failure of a component over its lifetime. In fact, h(t) is often represented as a bathtub and is referred to as a bathtub curve (Fig. 1).
In addition to the main functions, there are also three most commonly used terms in reliability issues, based on which failure data is presented. These terms are shown in Table 1. Fig. (2) illustrates the relationship between these terms.
In non-repair systems, since the MTTR time intervals are very small, it is possible to use MTBF instead of MTTF. But for repairable systems, the MTTR value will not be negligible and ignoring it will cause an error. However, if MTTR is small, it can be used instead of MTBF [7].

System Reliability
A mechanical or electrical system consists of a set of units or subsystems. To determine system reliability, it is necessary to study the reliability of each component of the system as well as the structure of the system. If R(t) is system reliability at time t, R(t) represented by R sys will be system reliability at time t. It is assumed that the system consists of n units in which reliability is represented by R1, R2, ..., Rn. As mentioned, R sys depends on R 1 and R 2 , ..., R n and system structure. The basic hypothesis in this section is the independence of the constituent units of a system [8]. Systems are generally divided into two categories: series and parallel systems, which are described in the following sections.

Sequential Systems with Independent Units
Suppose a system consists of two units of c 1 and c 2 . The system will be sequential if the system fails with the failure of one of its components. In other words, the system is active if both units are active. Fig. (3) shows a sequential system.
The reliability of a sequential system is defined as Eq.

Parallel Systems
Assuming that a system consists of two units of c 1 and c 2 , this system is a parallel system in which performance requires the operation of at least one of the units c 1 and c 2 . In other words, the system fails when both units fail. Fig. (4) shows a parallel system.
The reliability of a parallel system is defined as Eq. (2): (2)

RELIABILITY ANALYSIS METHODS
Many methods have been developed to perform engineering systems reliability analysis. These methods are particularly useful for analyzing more complex engineering systems than configurations that have standard reliability [5]. The most important and common reliability analysis methods used in the tunneling and civil industries are presented in the following sections.

Statistical Analysis Method
In this method, reliability assessment is generally performed for repairable systems using MTBF and for nonrepairable systems using MTTF [10 -11]. The methodology of the reliability modeling of the failure data is presented in detail in Fig. (5). This figure shows the step-by-step flowchart for reliability analysis by using the statistical analysis method.

Collecting Failure Data
Failure data are divided into two general categories of complete and censored data. After determining the type of failure, it is necessary to form a database of the failure data including the time of failure, the time of repair, the duration of the repair, and the cost of the repair to perform the analysis. This is done in three stages [3]. In the first stage, the data are collected from different sources and in the second stage the failures are arranged in order of time of occurrence of the failure to calculate the times between failures in the third stage [12 -13].

Cumulative Frequency Analysis (Pareto Analysis)
In the conventional Pareto analysis method, in the first stage, the system is sub-divided into several appropriate subsystems [14 -18]. After specifying the subsystems for the equipment or machine in question, data on the occurrence of the failures are used to plot the cumulative frequency percentage of the costs relative to the cumulative frequency of the failures. If cost information is not available, another appropriate indicator can be used, such as the time of disability. In the case of failure data, the frequency of failures for each subsystem can be obtained and analyzed using Pareto charts to determine the most critical subsystem.

Evaluation of Independence and Identical Distribution
After collecting the data and before fitting the distribution to the data, the basis for the assumption of independence and identical distribution of the failure data should be examined. In statistics and probability, sequences of random variables are called Independent and Identical Distributions (IID), if they are all equally distributed and mutually independent. In order to analyze this assumption, two common methods of trend test and serial correlation test are used [3,19,20].

Trend Test
The trend test actually determines whether the distribution of failure data has changed significantly (discounted or improved) over the time interval or not. There are various ways to describe the existence or absence of trends. Table 2 describes some of these methods that are most commonly used in civil engineering systems.

Serial Correlation
Autonomy is defined as the correlation between members of a series of observations that are arranged in time (such as time series data) or location (such as cross-sectional data). In order to determine the correlation, the i th TBF (or TTR) is plotted against (i-1) th TBF (or TTR), if the data are independent and correlated, points are located along one line [13,14,19].

Statistical Distributions and Fitting Method
Statistical techniques enable us to obtain a proper insight into the diversity of subsystems, system components, and methods of maintenance by fitting the appropriate statistical distribution to the failure data. There are broad statistical distributions to describe the equipment life cycle, which are generally divided into two categories: stable and unstable. Exponential, normal, lognormal, Weibull, gamma, and Powerlaw model functions are the most commonly used functions in the reliability field [15,20]. Goodness-of-fit tests are used to validate the fitted functions and select the best distribution among them. Table 3 shows the valid tests used in this regard.
Many studies have been performed using the statistical analysis method in the reliability analysis of tunneling and civil engineering equipment. The reliability, availability, and maintainability analysis of EPB were discussed [21,22]. In order to model the EPB reliability, the device is divided into five separate subsystems including mechanical, electrical, hydraulic, pneumatic, and hydro subsystems in a series configuration. According to the trend test and series correlation, renewal processes have been applied to analyze all subsystems. After calculating the reliability and maintenance functions for all subsystems, it was found that the mechanical subsystem with the highest failure frequency had the least reliability and maintenance.
Reliability analysis of a drum shearer machine was investigated using the failure data obtained from Tabas Coal Mine [1]. Among the renewal process, homogeneous Poisson process, and non-homogeneous Poisson process, the best simulation approach was selected for each subsystem and the reliability of subsystems was evaluated. The results of this study indicated that the most critical subsystem of the drum shearer machine is water spray in which its reliability reaches zero before other subsystems. The plots of the drum Shearer machine are shown in Fig. (6).
The reliability-based maintenance planning of the hydraulic system of rotary drilling machines was performed in a study [23]. In this research, data analysis shows that the time between failures of some machines follows the two-parameter and three-parameter Weibull distribution. Furthermore, the time between failures of other machines follows the normal log distribution. According to the hydraulic systems reliability plan Fig. (6). Reliability plots of each subsystem of drum shearer [1]. reliability-based preventive maintenance intervals for machine reliability levels of 80% were 10 hours. In another study, reliability and maintenance analysis of rotary drilling machines pneumatic system was performed [23] After modeling the pneumatic system reliability, maintenance and repair schedules were presented based on different reliability levels. The pneumatic system of the machine under study was divided into four sub-sections A, B, C, and D. The results showed that the reliability of the pneumatic system and subsystems A and B reached 80% after about 7 hours, subsystem C after about 103 hours, and subsystem D after 44 hours of drilling.
For a dragline machine, the analysis of reliability and evaluation of failure rate for critical subsystems was done [24]. The results showed that the dragline's bucket is the most critical subsystem of this machine. In addition to the abovementioned research, several studies have also been carried out using the statistical analysis method to conduct equipment reliability analysis in tunneling and civil engineering [25 -39].

Failure Mode and Effects Analysis Method (FMEA)
The Failure Mode and Effects Analysis method (FMEA) is an analytical technique based on the pre-occurrence prevention law that is used to identify potential causes of failure. This technique focuses on enhancing the security factor by preventing failures. FMEA is a low-risk tool used to predict problems and deficiencies in the process of designing or developing processes and services in an organization [5,40]. One of the major differences between FMEA and other qualitative techniques is that FMEA is an action, not a reaction. In many cases, facing the problem may be defined and implemented to eliminate those problems. These actions are a reaction to what has happened. In the implementation of FMEA, measures to eliminate or reduce potential problems occurrence are defined and implemented by predicting them and calculating their risk. This preventative approach is an action against what may happen in the future and will certainly require much less cost and time to take corrective measures in the early stages of product or process design. The purpose of FMEA is to search for all that can cause a product or process to fail before that product has reached production or the process is ready for production.
The Military Standard (MIL STD1629A), the Non-Automobile Standard (SAE ARP5580) and the Design, Manufacture, and Assembly of Machinery standards (SAE J1739) describe the FMEA implementation method in failure mode and effects analysis. In many studies, the Risk Priority Number (RPN) is calculated and then the safety status is

Graphical [14]
Draw a cumulative graph of failure times (TBFs) or repair times (TTRs) by the cumulative number of failures The straight line indicates no trend in data and the convex or concave curves, respectively, indicating a decrease or an increase in the failure rate. The increasing failure rate indicates the occurrence of premature failure.

MIL-HDBK-189 [16]
If U<x c 2 , the assumption H0 (data have no trend) is rejected.  This method uses the risk matrix to prioritize risks. The risk matrix is developed using the severity, occurrence, and detection parameters. The RPN is determined using these parameters and the Mean Time Between Failures (MTBF) is used to rank the probability. The severity parameter is rated according to the recommendations of the Quality Management System in the American Automotive Industry (QS9000) and SAE J1739. In this way, the risks are classified using RPN. The RPN is calculated using the following descriptive (Equation.

(3)
In the FMEA method, failure characteristics checking including critical conditions, control probabilities, safety or severity, and acceptable RPN are considered as a decision criterion for correction or prevention of failure and failure states with higher RPN are classified as critical [42].
By means of the FMEA method, the reliability of a tunnel drilling machine was analyzed [43]. For this purpose, 48 failure modes were assumed for the main machine system and all subsystems, and then the effects of each failure were determined. Finally, necessary corrective actions were taken to prevent or reduce the failure. This method was also used to quantify the contribution of maintenance activities to offshore oil structures by [44,45].
Risk assessment of a road tunnel construction was conducted using FMEA [46]. This study resulted that the instability of the working face is the most probable risk, whereas, other possible risks include underground water inflow, existing of karst bulbs, mixed tunnel face, and face instability and squeezing.

Markov Method
For each given system, a Markov model contains a list of possible states of that system, possible transition paths between those states, and the parameter rates of those transitions. In reliability analysis, transition usually involves failures and repairs. When a graphical Markov model is expressed, each state is usually represented as a bubble, with arrows indicating the transition path between the states. Fig. (7) shows a singlecomponent Markov state that contains only two healthy and corrupted states [47,48].
In Fig. (7), λ represents the transition rate parameter from zero to one. In addition, Pj (T) means the probability of the system being in state j at time t. If the health of the device is specific at some early time in T = 0, the initial probabilities include two P (0) = 1 and P 1 (0) = 0 modes [49]. Subsequently, the probability of a zero state decreases at a constant rate λ, which means that if the system is zero at any given time, the probability of switching to one during the next increase in time dt is equal to λ x dt as Eq. (4).

(4)
It is assumed that X(t) represents a set of random variables that represent the Markov process. Then the probability of P ij , transition from state i at time t = 0 to state j is defined as Eq. (5):

(5)
Given the set of possible states for j, the total probability of a transition to any j from i, plus the probability of remaining at i, must be added to 1. According to Fig. (8), in the third state of K, X can have transition t i it after j.
Possible transitions between different states of a Markov process can be easily described by a transition state diagram as in Fig. (8). A probability matrix P(t) can be constructed by adjusting its elements to the corresponding probability transition. For example, the probability of transition from i to j is equal to P ij in row i and column j of P(t). Given the m+1 possible state of X, the probability matrix P(t) is constructed as follows: The Markov method was used to estimate the reliability of auxiliary ventilation systems in the construction of long tunnels by [50]. In this study, active and standby jet fans were modeled as a random process. Therefore, the probability of replacement of any disabled jet fan with a standby jet was estimated by using the Markov chain theory. Also, Markov chain reliability analysis based on random process principles supported by mathematical rules was presented. A Markov Chain is a special state of the Markov process that is used to study the behavior of a particular random short-term and long-term system behavior. The Markov method was also used in the reliability analysis of drilling operations in open mines [51]. In this study, the failure rate and repair rate of all machines were calculated using available data. Then, 16 possible operating modes were defined and the probability of drilling fields in each case was calculated using Markov theory. The results of the study showed that about 77 percent of all drilling machines were in operational condition. This means that given 360 working days a year, the drilling operation in a reliable condition was 278 days.
The reliability analysis for an Earth Pressure Balance-Tunnel Boring Machine (EPB-TBM) was conducted using Markov modelling [52] and the failure and repair rates of the different subsystems were determined. Fig. (9) shows the transition diagram for the EPB-TBM and its subsystems (b1, b2, b3, and b4) based on a reliability block diagram. In this figure, I1/4 and λ are repair rate and failure rate of subsystems, respectively. The results showed that the availability of this machine was 61% which increased to 70% by proper maintenance and planning. Fig. (7). Markov states.

Fault Tree Analysis
Fault tree analysis is one of the common methods used to analyze the reliability of engineering systems. This method was developed at Bell Telephone Laboratories in the early 1960s to analyze the standby control system due to reliability and safety [5]. The fault tree analysis method is a top-down logical and graphical diagram describing the failure and its causes [53].
The fault tree analysis diagram shows all the failures of the system, subsystem, and collection that use a set of signs and symbols to represent the relationships between the failures and their causes [8]. One of the benefits of this method is that while identifying all the intermediate and final events, it is possible to calculate their probability of occurrence. It can also be used to reconfigure a system to reduce its sensitivity and vulnerability [54]. Table 4 shows the steps for developing a fault tree.
Events in a fault tree are associated with statistical probabilities. For example, a component failure may occur at some constant rates λ (a constant risk function). In this simple case, the probability of failure depends on the rate λ and the time t as Eq. (6):

(6)
A fault tree is often normalized to a given interval and the probability of an event depends on the relationship between the event risk function and this interval. A series of gates is used to estimate the reliability of this method that these gates in fault tree are the output probability of a set of Boolean logic operations. The gate output event probability depends on the input event probability. An AND gate represents a combination of independent events. Mathematically, this gate is equivalent to a subset of input events, and the probability of output of the AND gate is as Eq. (7): On the other hand, an OR gate belongs to the assembly set and its output probability is as Eq. (8): Step Vertical drilling machine reliability analysis was performed by using the fault tree method [55]. In this research, the components and subsystems of vertical drilling machine were firstly classified as a tree and investigated using Boolean law of faults. Finally, the reliability of the tunnel drilling machine using the fault tree analysis method was 0.53. The reliability of the hydraulic excavator system using the fault tree was also discussed [56]. In this research, a reliability block diagram of the excavator system fault tree was developed. Furthermore, an algorithm was presented to obtain the minimum set of cuts as well as the minimum set of paths from the fault tree and the reliability of machines and its subsystems over time. Using Boolean and fuzzy laws, it was found that power generation has the highest reliability among subsystems. The reliability analysis of a conveyor system using compound data was also performed [57], in which the proposed method estimates the probability of major event failure using statistical analysis of recorded field failures. Under these circumstances that past failure records do not exist, the method follows a fuzzy theoretical set evaluation based on the expert judgment of the failure intervals. Analyzing the results of the proposed method illustrates the practical role of the experience of experts in providing reliable information.
The potential risk analysis of undesirable events for shield driven tunnels was conducted using fault tree analysis and Analytic Hierarchy Process (AHP). The possible risks of tunneling were considered into four groups: Machine blockage or hold-up, mucking problems, cutter-related malfunction and finally, segment defects [58]. The risk analysis indicated that the related risk to the cutters could reduce the tunnel advance rate. In addition to the mentioned studies, much research has been conducted using the fault tree method to conduct the reliability and risk analysis [59 -69].

CONCLUSION
Reliability, availability, and maintainability analysis should always be an integral part of civil engineering for the effective management and operation of equipment in the project. The main purpose of the present study is to investigate the available methods for analyzing the reliability, availability, and maintainability of tunneling systems and equipment. The reviewed methods in this research are used to analyze the reliability of tunneling machines and equipment including excavator, shovel, LHD machines, conveyor transport system, mechanized tunneling machine, network ventilation equipment in tunnels and underground mines. Since these methods conducted are based on actual data collected from the executive projects, in some cases, access to the data is faced ( with some limitations. Subsequently, some barriers occur to apply these methods to determine the reliability of equipment. However, these methods are effective in estimating the number of failures and thus reducing the breakdowns caused by the failures and ultimately the incurred costs.

FUNDING
None.