Failure modes after re-test

The Problem

Quality control and testing actions are always mandatory before delivering a product or a service. In this particular case study, we are considering a some manufacturing data recording during testing. Every row is the result of a testing procedure. The tabular data may look like this

timestamp	id	result	retests
13:32:10	#AA3	pass	1
13:32:20	#AA4	failmode1	1
13:32:30	#AA5	pass	1
13:32:50	#AA4	failmode1	2
13:33:20	#AA4	pass	3

From this tabular data, we may look how much of units passed on first try, how much units failed with mode 2 on first, second, … tries etc… When a unit fails some $K$ times, it is scrapped. In our demo, we take $K$ to be 4. Different failure modes depend on a various number of factors. We may divide this into two main “categories” - the testing procedure or the intrinsic problem of the unit.

The dataset may be modeled in various, sometimes extremely complicated ways, which we will 3

keep for later discussions, to model both the temporal and spacial dependences. For now, we focus on simple summary statistics and fit different types of models, since sometimes, we all need simple things! Another reason of using a summary table are the significant costs related to work/retrieve the full data.

The goal is to describe the summary statistics using some models and observe the severity of some failure modes. Namely, if some failure mode is intrinsic to the batch/units due to some manufacturing issues, many modules will be failing with this mode without “recovering”. We will thus want to somehow quantify this issue.

The summary table will thus look like this

failure_mode	rank	count
SUCCESS	2	29769
FMODE_C	2	351
FMODE_C	3	201
FMODE_C	1	1038

We can illustrate the difference between a “real” problem and a “random” one.

The evolution of number of errors/retests (retesting a unit means it didn't pass the test during the previous test). The mode B _decays_ quickly, but not the mode D, suggesting the latter is worth looking into.

First-order estimation

As we are working with a short summary table, it is clear we are loosing a lot of information and relationships. Indeed, we dropped all of the spatio-temporal dependence. Our first central assumption consists of considering that there exists a unique probability for each failure mode. Namely, $$ \mathbb{P}(\text{fail} \in (M,r) | \text{fail} \in (M, r-1)) = \mathbb{P}(\text{fail} \in (M,r)) $$ , which is almost always not true due to the persistence of a failure mode, due to the temporal dependence and many other reasons. Yet this assumption is a good starting point, especially for discovering and learning about the processes.

Our assumption assumes a model, where the number of units getting $r$-retested will decay geometrically. Let $$ C_{M,r} $$ the number of testing counts at failure mode $M$ and rank $r$. The our first-order decay model gives $$ C_{M,r} \sim C_{M, r-1}q_{M} $$ Where $q_M$ is what we will call here the persistence factor of the mode $M$. This characterizes how good the tested units are “recovered” after failures. We may write for any testing attempt $$ C_{M,r} \sim C_{M,1} q_{M}^{r-1} $$

A small persistence would mean that after the first retest, there will be few units to be retested more, so the decay will be quick.

The first estimate will therefore be by simply computing the ratio $$ \hat{q} \coloneqq (\frac{C_{M,K}}{C_{M,1}}) ^{1/(K-1)} $$ which is a great first estimate for what we need - to quantify the persistence effect of a certain mode.

Poisson model

The Poisson model is a well-known model that models events or count data. This is the reason we will be interested in using it to model our situation. In fact, in our setup, there are important notions of success/failure, number of re-tests, … Intuitively, this signals the usage of Bernoulli, geometric and other distributions. In the first order, however, we will consider the retestings ranks simply as binned/histogram events.

Remember our discrete decaying model. In this model, we expect the number of retests $\mu$ to behave as $$\mu_{M,r} \sim A_{M} q_{M}^{r-1}$$

It is highly tempting to use some $\log$ transformation here, so taking it from both sides and developing gives $$ \log[ \mu_{M,r}] = \log[ A_{M} q_{M}^{r-1} ] \ \log[ \mu_{M,r}] = \log[A_M] + (r-1)\log[ q_M ] $$

where we rename the quantities $\log q \mapsto \alpha $ and $\log A \mapsto \beta$, and obtain the expression $$ \log(\mu_{m,r}) = \alpha_m + \beta_m (r-1) $$

which looks exactly the same as the Poisson regression written in the GLM formalism!

Frequentist fit

Before proceeding with a (arguably) more complete version of a Bayesian model, we will fit the Poisson GLM. The result for one of the decay modes can be illustrated as below

After fitting a Poisson model, there are multiple ways of assessing the quality of the fitting method/fit. One of the most common things is the post-hoc check the Poissonian equidispersion assumption, which is the known property of variance being equal to the mean. The are multiple ways of checking it

The Problem#

First-order estimation#

Poisson model#

Frequentist fit#

The Problem

First-order estimation

Poisson model

Frequentist fit