In this article the problem of estimating project completion time is addressed by fracturing the project into requirements and considering the completion time of the individual requirement to be probabilistic.

The software development process essentially is a transformation of the requirements into the code. The requirements are ideas  that are expressed in the human-readable form. They are supposed to be read by developers or programmers or testers or any other person participating in the project. There are common problems associated with them: lack of structure, overlapping, contradictions, etc. The requirements management process address these problems and it is a critical part of project's success.

There are some methodologies on requirement refining and management. They effectively improve the project requirements, structure them, improve clarity, deduplicate and simplify time estimations. Such methodology is not a point of interest of this work so it is enough to make an assumption that one of such methodologies was applied and produced the requirements with implementation time about one day.

Assuming that the requirements are implemented one by one, it is trivial to calculate how much time it takes to implement all the requirements. It is is just a time to implement one multiply to a number of requirements. However, any software developer agree that it is very hard, if even possible, to elaborate the requirements to that point. There is always some uncertainty in how much time would it take to implement a specific requirement and whether it is detailed enough to start implementing it.

One of ways to make the situation more realistic is to add randomness into the model. Implementing the requirement is an event and all the requirements are independent of each other. Such sequences are common in the Queueing theory and the Reliability theory. They are modelled by using the exponential distribution. The meaning of it is that some requirements are implemented earlier and some take much longer to complete than expected. The probability density function for single requirement with expected implementation time q is

$$f(t) = \frac{e^{-\frac{t}{q}}}{q}$$

Probability density of implementation time when expected implementation time is 1 day.

In the simplest case, when all the n requirements are implemented sequentially and have the same estimated time q, then the implementation time for the entire project is just the sum of n random variables. The result is an Erlang distribution. When there are more than one developer, working in parallel, then the times can be just divided by the number of developers k.

$$f(t) = \frac{t^{n-1} k^n e^{-\frac{t k}{q}}}{q^n (n-1)!}$$

Probability density of project completion time when expected implementation time for one requirement is 1 day, 6 requirements, 2 developers.

The probability density produced by estimating density from repeated sampling of the n random variables, where n is a number of requirements, is similar to density, calculated from the analytical solution. To get the single sampling for every requirement its completion time is calculated and then all calculated times are summarised. Repeated many times this calculation gives a set of samples, distributed over the range (0; ∞). Then the probability density is estimated by using the histogram density estimator.

Exponentially distributed random samples can be constructed from uniformly distributed random samples \(\{ x_1, x_2, ... \}\) by applying the following transformation:

$$ t(x)=-\frac{ln(1-x)}{\lambda}, x \in \{ x_1, x_2, ... \} $$

And then the single sample is calculated as:

$$ s_j=\sum_{i=1}^{n} \frac{k e^{-\frac{t(x_{i+j n}) k}{q}}}{q} $$

where \(x_i\) is the i-th sample from the uniformly distributed random number generator, q is estimated implementation time for single requirement, and k is a number of developers.

PDFs calculated analytically (blue) and by simulation (red) for 6 requirements, 1 day per requirement, 2 developers.