Binomial Distribution Sample Confidence Intervals Estimation

1. Sampling and Medical Key Parameters Calculation


Tudor DRUGANa, Sorana BOLBOACĂa*, Lorentz JÄNTSCHIb,



a “Iuliu Haţieganu” University of Medicine and Pharmacy, Cluj-Napoca, Romania

b Technical University of Cluj-Napoca, Romania

* corresponding author,




The aim of the paper was to present the usefulness of the binomial distribution in studying of the contingency tables and the problems of approximation to normality of binomial distribution (the limits, advantages, and disadvantages). The classification of the medical keys parameters reported in medical literature and expressing them using the contingency table units based on their mathematical expressions restrict the discussion of the confidence intervals from 34 parameters to 9 mathematical expressions. The problem of obtaining different information starting with the computed confidence interval for a specified method, information like confidence intervals boundaries, percentages of the experimental errors, the standard deviation of the experimental errors and the deviation relative to significance level was solves through implementation in PHP programming language of original algorithms. The cases of expression, which contain two binomial variables, were separately treated. An original method of computing the confidence interval for the case of two-variable expression was proposed and implemented. The graphical representation of the expression of two binomial variables for which the variation domain of one of the variable depend on the other variable was a real problem because the most of the software used interpolation in graphical representation and the surface maps were quadratic instead of triangular. Based on an original algorithm, a module was implements in PHP in order to represent graphically the triangular surface plots. All the implementation described above was uses in computing the confidence intervals and estimating their performance for binomial distributions sample sizes and variable.



Confidence Interval; Binomial Distribution; Contingency Table; Medical Key Parameters; PHP Applications





The main aim of a medical research is to generate new knowledge which to be used in healthcare. Most of the medical studies generate some key parameters of interest in order to measure the effect size of a healthcare intervention. The effect size can be measure in a variety of way such as sensibility, specificity, overall accuracy and so on. Whatever the measure used are, some assessment must be made of the trustworthiness or robustness of the finding [1]. The finding of a medical study can provide a point estimation of effect and nowadays this point estimation had a confidence interval that allows us interpreting correctly the point estimation of an interest parameter.

The formal definition of a confidence interval is a confidence interval gives an estimated range of values that is likely to include an unknown population parameter, the estimated range being calculates from a given set of sample data. If independent sample are take repeatedly from same population, and a confidence interval calculated for each sample, then a certain percentage (confidence level) of the intervals will include the unknown population parameter. Confidence intervals are usually computes so that this percentage is 95%. However, we can produce 90%, 99%, 99.9% confidence intervals for an unknown parameter [2].

In medicine, we work with two types of variables: qualitative and quantitative, variables that can be classifies into theoretical distributions. Continuous variables follow a normal probability distribution, Laplace-Gauss distribution and discrete random variables follow a Binomial distribution [3].

            The aim of this series of article was to review the confidence intervals used for different medical key parameters and introduce some new methods in confidence intervals estimation. The aim of the first article was to present the basic concept of confidence intervals and the problem, which can occur in confidence intervals estimation.



Binomial Distribution and its Approximation to Normality


The normal distribution was first introduced by De Moivre in an unpublished memorandum later published as part of [4] in the context of approximating certain binomial distribution for large n. His results were extends by Laplace and are now called the Theorem of De Moivre-Laplace.

Confidence interval estimations for proportions using normal approximation have been commonly uses for analysis of simulation for a simple fact: the normal approximation was easier to use in practice comparing with other approximate estimators [5].

We decided to work in our study with binomial distribution and its approximation to normality. Let’s saw how binomial distribution and its approximation to normality works! If we had from a sample size (n) a variable (X, 0 ≤ X n) which follows a binomial distribution the probability of obtaining the Y value (0 ≤ Y n) from same sample is:


The mean and the variance for the binomial variable are:

                        ,                                                                (2)

The apparition probability of a normal variable Y (0 ≤ Yn) which has a mean M(n,X) and a standard deviation Var(n,X) is:


The approximation to normality for a binomial variable used the values of the mean and of the dispersion. Replacing the mean and the dispersion in binomial distribution formula, we obtain the next formula for 0 < X < n:


The approximation error of binomial distribution of a variable Y (0 ≤ Yn) with to normal distribution is:


Because the probability of Y decreasing with straggle of the expected value X, for mark out this digression we can adjust the expression above to zero for PB(n,X,Y) < 1/n (in this case the differences can not be remarked for a extraction from a binomial sample):


In figure 1 was represent the approximation error of a binomial distribution with a normal distribution (using the formula above) for X/n = 1/2 where n = 10, 30, 100 and 300 (Errc(n,n/2,Y)).



Figure 1. The approximation error of the binomial distribution with a normal distribution for X/n=1/2 where n=10, 30, 100 and 300.


If we chouse to represent the errors of the approximation of a binomial distribution with a normal distribution for X/n=1/10 where n=10, 30, 100 and 300 (Errc(n,n/10,Y)) we obtain the below graphical representation (figure 2).

Figure 2. The approximation error of the binomial distribution with a normal distribution for X/n=1/10 where n=10, 30, 100 and 300.


                The experimental errors for the 5/10 fraction (X=5, n=10) with normal approximation induce an underestimation for X = 3 and X = 7 with a sum of 7.8‰ and an overestimation for X = 4, X = 5 and X = 6 with a percent sum of 8.3‰ which compared with the choused significance level α = 5% represent less than 10%. However, the error variation shows us that the normal approximation induce a decreasing of the confidence interval for a proportion being considered  more frequently the values neighbor to the 5 than the extremities values.

            Graphical representation for the variation of the errors sum of positive estimation (for the values away to the measured values) and the variation of the error sum of negative estimation (for the values closed to the measured values) in logarithmical scale (n, and the errors sum) was presented in figure 3.


Figure 3. Logarithmical variation of the sum of positive (log(sp)) and negative(log(sm))   estimation errors depending on log(n) function for X/n = 1/2 fraction

                Looking at the graphical representation from the figure 3 it can be observe that the variation of the errors sum was almost linear and inverse proportional with n. For the proportion X/n = 1/2 and generally for medium proportion the normal approximation is good and its quality increase with increasing of sample size (n).  Moreover, to the surrounding of 1/2 the symmetry hypothesis of the confidence estimation induce by the normality approximation work perfectly. 

            Looking at the variation of the positive errors induced by the normal approximation for the proportion in form of X/n = 1/10 depending on n (see figure 4), it can be remarked that the abscissa of maximum point of deviation relative to binomial distribution are found at lower n in the vicinity of the extreme values. Thus, for n = 10, and X = 1 the error reach the maximum value of 0.108 at 0. The abscissa is moving towards X value with increasing of n (for n = 300, and X = 30 the error reach the maximum value of 0.0283 at 25).

For the variation of the negative errors induces by the normal approximation of the proportion of type X/n=1/10 depending on n, we can remark that the abscissa of maximum point of deviation depending on binomial distribution are found at lower n in the vicinity of the extreme values. Thus, for n = 10, and X = 3 the error reach the maximum value of 0.05 at 0. The abscissa is moving towards X value with increasing of n (for n = 300, and X = 30 the error reach the maximum value of 0.0260 at 34). More over, the approximation errors depending on X value is asymmetrical, and there was not any tendency for symmetry with increasing of n.


Figure 4. Logarithmic variation of the sum of values less than X (log(si)) and greater

 than X (log(ss)) depending on log(n) for X/n = 1/10 fraction


            Because the sum of errors for values smaller than X were always greater than the sum of errors for the values greater than X, the confidence interval obtained base don normality approximation it will be displaced to the right (there were generate more approximation errors on the left side).


Figure 5. The error of binomial distribution approximate with a normal distribution with

 X/n = 1/n fraction express through Errc(n,1,Y) function

 at n = 10, n = 30, n = 100 and n = 300


The error had a maximum value of 10.8% for n = 10 and X = 1 into point 0, value which was greater twice than the accepted significance level α = 5% (!). A maximum value of 2.6% were obtained twice consecutively for n = 30, X = 3 into the 1 and 2 points and a maximum value of 2.5% into the point 4, that was half reported to the significance level  α = 5% (!); which represent a sever deviation. For large n the sum of the approximation error there was a tendency to linearity inverse proportional with n.


Figure 6. The variation of the error estimation sum for X/n = 1/n fraction

            The graphical representations from the figure 5 and 6 reveal that for small proportions (X/n = 1/n) the sum of approximation errors were increasing with n. The sum of the approximation errors always exceeded the significance level (α = 5% (!)) for n > 50, which means that the real error threshold using normality hypothesis will exceeded 10%, the error being obtained in the favor of the greater values for the X.



Implementation of the Mathematical Calculation in MathCad


            The experimental described in the previous section was possible after a MathCad implementation. The results obtained running the MathCad program were imports in Microsoft Excel were the graphical representations were creates.

            In MathCad were implements the next functions:

·      n combination taking to k: was necessary a logarithmical expression because we were working with large sample size and for n = 300 we could not used  the classical formula C(n,k) = n!/((n-k)!k!)




·        The calculation of Y/n apparition into a binomial distribution by n sample size and X/n proportion:


·        The computing of the apparition of Y/n into a normal distribution sample by volume n, mean X/n and variance X(n-X)/n:


·        The differences between the binomial and normal distribution:


·        Initializations (needed for defining the series of values):

 (values which were consecutively changed with 10,1; 10,5; 30,1; 30,3; 30,15; 100,1; 100,10; 100,50; 300,1; 300,30; 300,150)

·        The computing of the series of corrected differences (graphically represented in figure 1, 2 and 5):


·        The construction of the series of the positive sum of probabilities differences until to X (graphically represented in figure 3)


·        The construction of the series of the positive sum of probabilities differences for the values less than X/n (graphically represented in figure 4)


 - display the values of the sum for each pair of data

·        The calculation of the differences probabilities for every parameters:

 display the values of the sum for each pair of data             (16)

Was also used a File Read/Write component (linked with dBNCY) in order to export the results into *.txt file from where the data were imported in Microsoft Excel where the graphical representation were drawing.



                        Generating a Binomial Sample using a PHP Program


In order to evaluate the confidence intervals for medical key parameters we decide to work with binomial samples and variables. In PHP was creates a module with functions that generates binomial distribution probabilities. Each by another, was implements functions for computing the binomial coefficient of n vs. k (bino0 function), the binomial probability for n, X, Y (bino1, bino2, and bino3 functions), drawing out a binomial sample of size n_es (bino4 function). Finally, array rearrangement of binomial sample values staring with position 0 and saving in the last element the value of rearrangement postponement are makes by bino5 function.

The used functions are:

function bino0($n,$k){ $rz = 1;

 if($k<$n-$k) $k=$n-$k;

 for ($j=1;$j<=$n-$k;$j++)

  $rz *= ($k+$j)/$j;

 return $rz; }

function bino1($n,$y,$x){ return pow($x/$n,$y)*pow(1-$x/$n,$n-$y)*bino0($n,$y); }

function bino2($n,$y,$x){

 if(!$x) if($y) return 0; else return 1;

 $nx=$n-$x; $ny=$n-$y;

 if(!$nx) if($ny) return 0; else return 1;



  $ret += log($x)*$y; 


   $ret += log($nx)*$ny;

   if($ny>$y) $y=$ny;


   for($k=2;$k<=$n-$y;$ret-=log($k++));  }

 } else $ret += log($nx)*$ny;

 return exp($ret-log($n)*$n); }

function bino3($n,$y,$x){

  if(!$x) if($y) return 0; else return 1;

 $x/=$n; $nx=1-$x; $ny=$n-$y;

 if(!$nx) if($ny) return 0; else return 1;



  $ret += log($x)*$y; 


   $ret += log($nx)*$ny;

   if($ny>$y) $y=$ny;


   for($k=2;$k<=$n-$y;$ret-=log($k++));  }

 } else $ret += log($nx)*$ny;

 return exp($ret); }

function bino4($n,$k,$n_es){


  if($j>15) $c[$j]=bino1($n,$j,$k); else $c[$j]=bino3($n,$j,$k);


  if(!$c[$j])unset($c[$j]); }

 return $c; }

function bino5($N,$k,$f_N){



 foreach($d as $v) $r[$j++]=$v;

 foreach($d as $i => $v) break;


 return $r; }


                Computing the confidence intervals for a binomial variable is time consuming. In this perspective, an essential problem was optimizing the computing time. We studied comparative the time needed for generating the binomial probabilities using the bino1 function, the bino2 function, and the bino3 function. We developed a module which to collect and display the execution time. In the appreciation of the execution time, we used the internal clock of the computer.

                The module that collects and displays the execution time is:




 for($x=0;$x<=$n;$x++)  for($y=0;$y<=$n;$y++)



 for($x=0;$x<=$n;$x++)  for($y=0;$y<=$n;$y++)



 for($x=0;$x<=$n;$x++)  for($y=0;$y<=$n;$y++)








                Table 1 contains the logarithmical values of the executions time for n: 3, 10, 30, 50, 100, 150, and 300 obtained by the bino1, bino2, and bino3 functions.


































Table 1. The execution time (seconds) express in logarithmical scale needed for generating binomial probabilities with implemented function


                The differences between executions time express in logarithmical scale for the functions bino1and bino3 and respectively for the functions bino2 and bino3 were uses in order to obtain the graphical representations from the figure 3. Looking at the execution time ratio for the bino1 and bino3 function (the Bino2-Bino3 curve, figure 3) we can remark that optimizing parameter with X/n instead of X and (n-X)/n instead of (n-X) bring an increasing performance; this increased performance was cancel with increasing of n (the logarithmical difference was almost 0 for n >50).

Looking at the execution time ratio for the bino2 and bino3 functions (the Bino2-Bino3 curve, figure 3) we can remark that with increasing of n the bino3 function obtained the best performance comparing with bino1 and bino2 (the differences between execution time express in logarithmical scale became less than 0 beginning with n ≈ 15). Based on above observations, was implements for the binomial probabilities estimation the bino4 function.


Figure 3. Binomial probabilities execution time for bino1, bino2, and bino3 functions



Medical Key Parameters on 2´2 Contingency Table


            For the 2´2 contingency table data and based on the type of the medical studies we can compute some key parameters used for interpreting the studies results. Let us suppose that we have a contingency table as the one in the table 2.



Result/Disease (Characteristic B)






Treatment / Risk factor / Diagnostic Test (Characteristic A)













Table 2. Hypothetical 2´2 contingency table

            In table 2, a denotes the real positive cases (those who have both characteristics), b the false positive cases (those who have just the characteristic A), c the false negative cases (those who have the characteristic B) and d the real negative cases (those who have neither characteristic).

            We classified the medical key parameters based on the mathematical expressions (see table 3). This classification was very useful in confidence intervals assessment for each function, and for implementing the function in PHP.

The PHP functions for the mathematical expression described in the table 3 had the next expressions:

function ci1($X,$n){

 return $X/$n; }

function ci2($X,$n){

 if($X==$n) return (float)"INF";

 return $X/($n-$X); }

function ci3($X,$n){

 if(2*$X==$n) return (float)"INF";

 return $X/(2*$X-$n); }

function ci4($X,$m,$Y,$n){

 if(($X==$m)||($Y==0)) return (float)"INF";

 return $X*($n-$Y)/$Y/($m-$X); }

function ci5($X,$m,$Y,$n){

 return $Y/$n-$X/$m; }

function ci6($X,$m,$Y,$n){

 return abs(ci5($X,$m,$Y,$n)); }

function ci7($X,$m,$Y,$n){


 if(!$val) return (float)"INF";

 return 1/$val; }

function ci8($X,$m,$Y,$n){

 if($Y==0) return (float)"INF";

 return ci1($X,$m)/ci1($Y,$n); }

function ci9($X,$m,$Y,$n){

 if($Y==0) return (float)"INF";

 return abs(1-ci1($X,$m)/ci1($Y,$n)); }



Mathematical expression and

PHP function

Name of the key parameter

Type of study


, ci1

Individual Risk on Exposure Group and

NonExposure Group

Risk factor or prognostic factor studies [6], [7]

Experimental and Control Event Rates

Absolute Risk in Treatment and Control Group

Therapy studies [8], [9]

Sensitivity; Specificity; Positive and Negative Predictive Value; False Positive and Negative Rate; Probability of Disease, Positive and Negative Test Wrong; Overall Accuracy; Probability of a Negative and Positive Test

Diagnostic studies [10], [11], [12]


, ci2

Post and Pre Test Odds

Diagnostic studies 7, 9, 10]


, ci3

Post Test Probability

Diagnostic studies [9, 10]


, ci4

Odds Ratio

Risk Factor studies [6, 7]


, ci5

Excess risk

Risk Factor studies [5]


, ci6

Absolute Risk Reduction and Increase,

Benefit Increase

Therapy studies 7, 9]


, ci7

Number Needed to Treat and Harm

Therapy studies [7, 9]


, ci8

Relative Risk

Studies of Risk Factor [5, 6]

Positive and Negative Likelihood Ratio

Diagnostic studies [7, 9, 10]


, ci9

Relative Risk Reduction, Increase and

Benefit Increase

Therapy studies [7, 9]

Table 3. Mathematical expressions and PHP functions for medical key parameters



Estimating Confidence Intervals using a PHP program


            In order to be able to check out if that the experimental numbers belong to confidence interval were implements in PHP two functions, one named out_ci_X for one-variable functions (depending on X) and the other, named out_ci_XY for two-variables functions (depending on X, Y). The functions had the expressions:

function out_ci_X(&$cif,$XX,$X,$n,$z,$a,$f,$g){


 if(!is_finite($ci_v[1])) return 0;

 if(!is_finite($ci_v[0])) return 0;


 if(!is_finite($ci_v[1])) return 0;

 if(!is_finite($ci_v[0])) return 0;


 if(!is_finite($fx)) return 0;

 return (($fx<$ci_v[0])||($fx>$ci_v[1]));}

function out_ci_XY(&$cif,$XX,$X,$YY,$Y,$m,$n,$z,$a,$f){


 if(!is_finite($ci_v[0])) return 0;

 if(!is_finite($ci_v[1])) return 0;


 if(!is_finite($fxy)) return 0;

 return (($fxy<$ci_v[0])||($fxy>$ci_v[1]));}

            The function est_ci_er compute the confidence interval boundaries, the experimental errors, the mean, and the standard deviation for the expression with one variable (X); the est_ci2_er was the function for the two-variable expressions (X, Y):

function est_ci_er($z,$a,$c_i,$g,$f,$ce){

 if($ce=="ci") est_X(z,a,$c_i,$f,$g);

 if($ce=="er") est_N(z,a,$c_i,$f,$g,1);

 if($ce=="me") est_N(z,a,$c_i,$f,$g,2);

 if($ce=="va") est_N(z,a,$c_i,$f,$g,3);

 if($ce=="mv") est_N(z,a,$c_i,$f,$g,6);}

function est_ci2_er($z,$a,$c_i,$f,$ce){

 if($ce=="ci") est_X2(z,a,$c_i,$f);

 if($ce=="er") est_N2(z,a,$c_i,$f);

 if($ce=="me") est_MN(z,a,$c_i,$f,2);

 if($ce=="va") est_MN(z,a,$c_i,$f,3);

 if($ce=="mv") est_MN(z,a,$c_i,$f,6);}

            The est_X compute all the confidence intervals for the one-variable functions (depending on X) while est_X2 compute the confidence intervals for two-variable functions (depending on X, Y):

function est_X($z,$a,$c_i,$f,$g){




  echo("\t".$c_i[$i]."U"); }




    $ci[$i] = ci($c_i[$i],$g,$X,$n,$z,$a);

   $fX = $f($X,$n);




    echo("\t".trim(sprintf("%6.3f",$ci[$k][1]))); }  }}}

function est_X2($z,$a,$c_i,$f){




  echo("\t".$c_i[$i]."U"); }


  $m = $n;




             $g = $c_i[$i];

     $ci[$i] = $g($X,$m,$Y,$n,$z,$a);    }

   $fXY = $f($X,$m,$Y,$n);




    echo("\t".trim(sprintf("%6.3f",$ci[$k][1])));   }  }  } }}

            The est_N function computed the estimation errors or means or/and standard deviations of the mean for the one-variable functions (depending on X); the est_N2 function computed the estimation errors for two-variable functions (depending on X, Y):

function est_N($z,$a,&$c_i,$cif,$g,$ce){

 if($ce==1)echo("N\tX\tf(x)");else echo("N");






   if($ce==1) echo("\r\n".$n."\t".$X."\t".trim(sprintf("%6.3f",$cif($X,$n))));

   for($i=0;$i<count($c_i);$i++){//pt fiecare ci



    foreach ($d as $XX => $nXX){//pt fiecare valoare posibila de esantion

     if(out_ci_X($c_i[$i],$XX,$X,$n,$z,$a,$cif,$g)) $ttt[$X][$i] += $nXX;

     $conv += $nXX;    }

    $ttt[$X][$i] *= 100/$conv;

    if($ce==1)echo("\t".trim(sprintf("%3.3f",$ttt[$X][$i]))); }  }

  if ($ce > 1) est_MV($ttt,count($c_i),$n,p,0,$ce,$a); }}

function est_N2($z,$a,&$c_i,$f){





  $m = $n;









     foreach ($dX as $XX => $nXX){

      foreach ($dY as $YY => $nYY){


        $ttt[$X][$Y][$i] += $nXX*$nYY;

       $conv += $nXX*$nYY;     }     }

     echo("\t".trim(sprintf("%3.3f",100*$ttt[$X][$Y][$i]/$conv)));     }   }  } }}

            The est_MN function computed the means and/or the standard deviations of the mean for the two-variable functions (depending on X, Y):

function est_MN($z,$a,&$c_i,$f,$ce){







   if($ce % 2 == 0) $md[$i] = 0;

   if($ce % 3 == 0) $va[$i] = 0;  }








     foreach ($dX as $XX => $nXX){

      foreach ($dY as $YY => $nYY){


        $ttt[$X][$Y][$i] += $nXX*$nYY;

       $conv += $nXX*$nYY;      }     }

     if($ce % 2 == 0) $md[$i] += 100*$ttt[$X][$Y][$i]/$conv;

     if($ce % 3 == 0) $va[$i] += pow(100*$ttt[$X][$Y][$i]/$conv-5,2);    }   }  }



   if($ce % 2 == 0)


   if($ce % 3 == 0)   

    echo("\t".trim(sprintf("%3.3f",sqrt($va[$i]/(($m-1)*($n-1)-1)))));  } } }}

The est_MV function computed the estimation errors means and/or standard deviations:

function est_MV(&$ttt,$n_c_i,$n,$p,$v,$ce,$a){


  $md[$i]=$v;  if($v)continue;


   for($X=1;$X<$p*$n;$X++) $md[$i]+=2*$ttt[$X][$i];

   if(is_int($p*$n)) $md[$i]+=$ttt[$p*$n][$i];


  } else {

   for($X=1;$X<$p*$n;$X++) $md[$i]+=$ttt[$X][$i];

   $md[$i]/=($n-1);  } }

 if($ce % 3 == 0)





    $ds[$i]+=2*pow($ttt[$X][$i]-$md[$i],2);   }

   if(is_int($p*$n)) $ds[$i]+=pow($ttt[$X][$i]-100*$a,2);//$md[$i],2);



  } else {


    $ds[$i]+=pow($ttt[$X][$i]-100*$a,2);//$md[$i],2);   }


   $ds[$i]=pow($ds[$i],0.5);  } }

 if($ce % 2 == 0){



   echo("\t".trim(sprintf("%3.3f",$md[$i]))); }

 if($ce % 3 == 0){



   echo("\t".trim(sprintf("%3.3f",$ds[$i]))); }}

The est_R2 function compute and display, for a 100 random sample size and binomial variables values the experimental errors using the imposed (theoretical) value of significance level for all confidence interval calculation methods given also as parameter:

function est_R2($z,$a,&$c_i,$f){





  $n = rand(4,1000);

  $m = rand(4,1000);

  $X = rand(1,$m-1);

  $Y = rand(1,$n-1);




  for($i=0;$i<count($c_i);$i++){//pt fiecare ci



   foreach ($dX as $XX => $nXX){

    foreach ($dY as $YY => $nYY){


      $ttt[$X][$Y][$i] += $nXX*$nYY;

     $conv += $nXX*$nYY;







            The est_C2 function compute and display for a fixed value of proportion X/n (usually the central point X/n = 1/2) the errors for a series of sample size (from N_min to N_max):

function est_C2($z,$a,&$c_i,$f){





  $m = $n;

  //$X = (int)(3*$m/4); $Y = (int)($n/4);

  $X = (int)($m/2); $Y = (int)($n/2);








   foreach ($dX as $XX => $nXX){

    foreach ($dY as $YY => $nYY){


      $ttt[$X][$Y][$i] += $nXX*$nYY;

     $conv += $nXX*$nYY;







            The ci function used in est_N, est_N2, est_X, and est_X2 functions compute the confidence interval for specified function with specified parameter and it was uses for the nine functions described in table 3. Corresponding to the functions described in table 3, were defines:

·      the I function for the expression X/n;

·      the R function was used for the expression X/(n-X);

·      the RD function was used for the expression X/(2X-n);

·      the R2 function for the expression X(n-Y)/Y/(m-X);

·      the D function for the expression Y/n-X/n;

·      the AD function for the expression |Y/n-X/n|;

·      the IAD function for the expression 1/|Y/n-X/n|;

·      the RP function for the expression Xn/Y/m;

·      the ARP function for the expression |1- Xn/Y/m|.

For the one-variable expressions (f(X)) the confidence intervals was compute using the next expressions:

·      For ci1 function:

o       Mathematical expression:                                        (17)

where CIL(X) and CIU(X) were the lower limit and respectively the upper limit of the confidence interval;

o       PHP function:

function I($f,$X,$n,$z,$a){

 return $f($X,$n,$z,$a);}

·      For ci2 function:

o       Mathematical expression:                           (18)

o       PHP function:

function R($f,$X,$n,$z,$a){

 $L = $f($X,$n,$z,$a);

 if ($L[1])

  if ($L[1]-1) $Xs = 1/(1/$L[1]-1); else $Xs = (float) "INF";

 else $Xs = 0;

 if ($L[0]-1)

  if ($L[0]) $Xi = 1/(1/$L[0]-1); else $Xi = 0;

 else $Xi = 0;

 return array ( $Xi , $Xs );}

·      For ci3 function; the expression for confidence interval is based on confidence intervals expression used for ci1:

o       Mathematical expression:                                     (19)

o       PHP function:

function RD($f,$X,$n,$z,$a){

 if($X==$n-$X) return array( -(float)"INF" , (float)"INF" );


 if($X==$n) return array ( 1/$q[1] , 1/$q[0] );

 if($q[0]==0) $tXs = 0; else $tXs = 1/(2-1/$q[0]);



 else if(!$q[1])



  $tXi = 1/(2-1/$q[1]);


  $tX= 1/(2-$n/$X);

  if(($tXi>$tX)||($tXs<$tX)) {



   $tXi = $tX * exp(exp(abs($tX)));

   $tXs = -$tX * exp(exp(abs($tX)));  } }

 return array( $tXi , $tXs );}

            The formulas for computing the confidence intervals for two-variable functions (depending on X, Y) can be express base on the formulas for confidence intervals of one-variable functions (depending on X):

·      For ci4 function:

o       Mathematical expression:      (20)

o       PHP function:

function R2($f,$X,$m,$Y,$n,$z,$am,$an){

 $ciX = R($f,$X,$m,$z,$am);

 $cinY = R($f,$n-$Y,$n,$z,$an);

 return array ( $ciX[0]*$cinY[0] , $ciX[1]*$cinY[1] );}

·      For ci5 function:

o       Mathematical expression:             (21)

o       PHP function:

function D($f,$X,$m,$Y,$n,$z,$am,$an){

 $ciX = $f($X,$m,$z,$am);

 $ciY = $f($Y,$n,$z,$an);

 return array ( $ciY[0]-$ciX[1] , $ciY[1]-$ciX[0] );}

·      For ci6 function:

o       Mathematical expression:


o       PHP function:

function AD($ci){


  $max =abs($ci[0]);

  if($max<abs($ci[1])) $max = abs($ci[1]);

  return array ( 0 , $max );


  if($ci[0]<0) {

            $ci[0] = abs($ci[0]);

            $ci[1] = abs($ci[1]);  }

  if($ci[0]<$ci[1]) return array( $ci[0] , $ci[1] );

  else return array( $ci[1] , $ci[0] ); }}

·      For ci7 function:

o       Mathematical expression:                        (23)

o       PHP function:

function IAD($CI){

 if($CI[0]) $Xs = 1/$CI[0]; else $Xs = (float)"INF";

 if($CI[0]) $Xi = 1/$CI[1]; else $Xi = (float)"INF";

 return array( $Xi , $Xs );}

·      For ci8 function:

o       Mathematical expression:                                         (24)

o       PHP function:

function RP($f,$X,$m,$Y,$n,$z,$am,$an){

 $ciX = $f($X,$m,$z,$am);

 $ciY = $f($Y,$n,$z,$an);

 $XYi = $ciX[0]/$ciY[1];

 if($ciY[0]) $XYs = $ciX[1]/$ciY[0]; else $XYs = (float)"INF";

 return array ( abs($XYi), $XYs );}

·      For ci9 function:

o       Mathematical expression:

o       PHP function:

function ARP($f,$X,$m,$Y,$n,$z,$a){

 $ci = $f($X,$m,$Y,$n,$z,$a);

 if(!is_finite($ci[1])) return array ( abs(1-$ci[0]) , (float)"INF" );

 $t1 = 1-$ci[0];

 $t2 = 1-$ci[1];


  $Xi = 0;


   return array( 0 , abs($t2) );


   return array( 0 , abs($t1) ); 



   return array ( abs($t1) , abs($t2) );


   return array ( abs($t2) , abs($t1) ); }}

            Were defined the normal distribution percentile (z), the error of the first kind or the significance level (a), the value for the maximum proportion for which the confidence intervals was estimated (p), the sample sizes start value (N_min) and the sample sizes stop value (N_max):






            A specific PHP program calls all these functions for a given n and display the estimation errors or confidence intervals limits at a specified significance level (α =5%) using est_ci_er or est_ci2_er functions. For example, in order to compute the confidence interval for a proportion (ci1 function) was call the est_ci_er function with the next parameters:


where $c_i was a matrix that memorize the name of the methods used in confidence intervals estimation and I was the name of the function defined above for the expression X/n.

            In the next articles, we will debate the confidence intervals methods for each function described in table 1 and their variability with sample size.

The standard deviation of the experimental error was computes using the next formula:


where StdDev(X) is standard deviation, Xi is the experimental errors for a given i (0 i n), M(X) is the arithmetic mean of the experimental errors and n is the sample size.

If we have a sample of n-1 elements (withdrawing the limits 0 and n from the 0..n array) with a known (or expected) mean ExpM(X), the deviation around α = 5% (imposed significance level) is giving by:


The (27) formula was used as a quantitative measure of deviation of errors relative to imposed error value (equal with 100α) when ExpM(X) = 100α.



Two-dimensional Functions


            We used the uni-dimensional (X, m) confidence intervals functions to compute the confidence intervals for bi-dimensional functions (X, m, Y, n) but were necessary to perform a probability correction. We consider that the confidence intervals boundaries of bi-dimensional functions were gives by the product of the limits from the uni-dimensional cases with a correction of the significance level based on error convolution product.

            Experimentally was infers that the best expression for the significance level for bi-dimensional function was:

                         or, with correction by m,                            (26)

where a was the uni-dimensional significance level and aU şi aU(m) were imposed significance level for uni-dimensional variable (X) with a volume m. Analogue can be obtained the expression for the uni-dimensional variable (Y) with a volume n, aU(n).



Random Sampling


            We consider 100 random values for n and respectively for m in the 4..1000 domain and X, respectively Y in domain 1..m-1, respectively 1..n-1. The error estimation of the confidence interval was performs for these samples and the results were graphically represented using MathCad program. Thus, we implement a module in MathCad that allowed us to perform these representations using:

·        The frequencies of experimental errors;

·        Dispersion relative to the imposed error value (α = 5%);

·        The mean and standard deviation of the experimental errors;

·        The theoretical errors distribution graphic based on Gauss distribution of means equal with five (significance level α = 5%) and standard deviation equal with a standard deviation of a volume sample size equal width 10 (double of α).

The corresponding equations are:

·                                                               (27)

·                                                                    (28)

·                                                             (29)

·                            (30)

·                                                                      (31)

·                                                                 (32)

·                                                                       (33)

·                                                               (34)

·                                                                    (35)

·                                                             (36)

·                                                                                                   (37)

·                                                                                               (38)

·                                                                                                 (39)

·                                                                                                        (40)

·                                                                                               (41)

·                                                                                           (42)



Graphical Representations


We developed a graphical module in PHP for graphical error map representations for bi-dimensional functions (X, m). The module used the confidence intervals computed by the program described above and performs (m, X, Z) graphical representations, where Z can be the value of the confidence intervals or the estimation errors (percentages) and perform a triangular graphical representation for Z (X on the horizontal axis; m on the vertical axis, descending order) using a color gradient. The module lines were:










function i_p(&$file,&$v){


 $image = @imagecreate($in, $in);

 $alb = imagecolorallocate($image,255,255,255);

 $negru = imagecolorallocate($image,0,0,0);

 $t_l = $in-imagefontwidth(2)*strlen($file)-10;

 imagestring($image, 2, $t_l,10, $file, $negru);

 $cl = s_c_t($image,tip);




    imagesetpixel($image,pct*$v[$k][1]+$i,pct*$v[$k][0]+$j,$cl[c($v[$k][2])]);   }  }



 //imageinterlace($image,1);imagejpeg($image); }

function c($z){

 if($z<=fmin) return 1;

 if($z>=fmax) return ncol-1;

 $cu = ($z-fmin)*(ncol-3)/(fmax-fmin);

 $cu = (int)$cu;

 return $cu + 1;}

function g_d($file,&$v){



 for($i=0;$i<count($r);$i++) $v[$i]=explode("\t",$r[$i]);}

function s_c_t(&$image,$tip){


  for ($i=0;$i<ncol;$i++){

   $cb=round($i * 255/(ncol-1));

   $tc[$i]=array( $cb , $cb , $cb );  } }


  $tc[0] = array( 255 , 0 , 0 );

  $yl = (ncol-1)/2; $yl = (int) $yl;



    $cb = round($i * 255/$yl);

    $tc[$i] = array( 255 , $cb , 0 );   }

  $tc[$yl] = array( 255 , 255 , 0 );



    $cb = round($i * 255/$yl);

    $tc[$yl+$i] = array( 255-$cb , 255-$cb , $cb );   }

  $tc[ncol-1] = array( 0 , 0 , 255 ); }


  $tc[0] = array( 255 , 0 , 0 );

  $yl = (ncol-1)/2; $yl = (int) $yl;



    $cb = round($i * 255/$yl);

    $tc[$i] = array( 255 , $cb , 0 );   }

  $tc[$yl] = array( 255 , 255 , 0 );



    $cb = round($i * 255/$yl);

    $tc[$yl+$i] = array( 255-$cb , 255 , 0 );   }

  $tc[ncol-1] = array( 0 , 255 , 0 ); }

 for ($k=0;$k<ncol;$k++)


  return $cl;}


            The module describe above used the pct (= 3) constant in order to represent a 3×3 pixeli square for each m, X, color(Z). The constant fmin was used to represent the values between the lower limit color (red color in our representations) and the middle limit color (yellow in our representations); the fmax constant was used to represent the maximum values which color were between upper limit color (blue) and middle limit color (yellow). All the values which not belong to the fmin , fmax] domain were represented with the lower limit color (if Z < fmin) or upper limit color (if Z > fmax). The number of shade (including the based color represented by red, yellow and blue) were defined by n_col constant (was chosen 21). The tip constant (was choose 1) define the couplet of the limit color and the corresponding shade using 3 couplet sets defined as follows: (White, Gray, Black), (Red, Yellow, Blue), and (Red, Yellow, Green).





            The approximation to normality of a binomial distribution had major disadvantages more obvious to the extremities of the proportion (X/n → 0 and X/n → 1).

            Comparing different probability distribution functions revealed significant differences performances with increasing of n. The best performance methods were the functions that apply first direct and then inverse the logarithmical expression of the binomial probability function. Defining the key parameters as mathematical functions depending on random variable and sample size restrict the number of inquisition of theirs confidence intervals (from 34 medical key parameters to 9 distinct mathematical expression) (see table 3).

            Working with PHP programming language offer a maximum flexibility in:

·      defining and using the medical key parameters;

·      defining and generating the binomial samples;

·      performing the experimental of computing the confidence intervals limits;

·      computing the experimental errors for each defined confidence interval methods.

For the medical key parameters, the confidence intervals expression of the parameters which depend on two random variable (X, Y) and two sample sizes (m, n) can be express based on the confidence intervals expressions for a proportion with changing the imposed value of probability.

MathCad program proved very useful in the situation were was necessary to perform complex integrals and to represent graphically the results, how was the case of the integrals based on expressions of the Gaussian function (f(x)=a∙exp((x-b)2/c2) [13]).

The tree dimensional graphical representations of the functions with two parameters (most of the medical key parameters), where the expression of the confidence intervals function of f(X,m), need a triangular grid (0 ≤ X n, were n vary on a specified domain - 4..103). There are a few software which to allow to represent correctly these kind of triangular plots, most of the software used some interpolation functions and the graphical representation are square plots (0 ≤ X ≤ 103; 4 ≤ n ≤ 103) instead of triangular plots (0 ≤ X n; 4 ≤ n ≤ 103). Thus was necessary to used PHP in order to create correct error maps.

All the aspects presented in this paper uses in the confidence intervals estimations for the key parameters used as measure the effect size for different healthcare intervention express as different medical key parameters (see further papers).





[1]. Huw D., What are confidence intervals?, What is…, Hayward Group Publication, 2003, 3, p. 1-9, available at:

[2]. Easton V.J., McColl H.J., Confidence intervals from Statistics Glossary v1.1, available at:

[3]. Bland M., Probability, chapter from An Introduction to Medical Statistics, Oxford Medical Publications, Oxford - New York - Tokyo, 1996, p. 86-99.

[4]. Abraham Moivre, The Doctrine of Chance: or The Method of Calculating the Probability of Events in Play, W. Pearforn, Second Edition, 1738.

[5]. K. Pawlikowski D.C., McNickle G.E., Coverage of Confidence Intervals in Sequential Steady-State Simulation, Simulation Practice and Theory, 1998, 6, p. 255-267.

[6]. Drugan T., Bolboacă S., Colosi H., Achimaş Cadariu A., Ţigan Ş., Dignifying of prognostic and Risc Factors Chapter in Statistical Inference of Medical Data, ALMA MATER Cluj-Napoca, 2003, p. 66-71 (in Romanian).

[7]. Sackett D.L., Straus S., Richardson W.S., Rosenberg W., Haynes R.B., Diagnosis and screening Chapter In Evidence-based medicine. How to practice and teach EBM, Second Ed., Edinburgh, Churchill Livingstone, 2000, p. 67-93.

[8]. Achimaş Cadariu A., Evidence Based Gastroenterology Chapter from Grigorescu M., Treaty of Gastroenterology, Medical National Publishing House, Bucharest, 2001, p. 629-659 (in Romanian).

[9]. Sackett D., Straus E.S., Richardson W.S., Rosenberg W., Haynes R.B., Therapy Chapter In: Evidence-based Medicine: How to Practice and Teach EBM, 2nd Ed., Edinburgh, Churchill Livingstone, 2000, p. 105-154.

[10]. Achimaş Cadariu A., Diagnostic Procedure Assessment Chapter from Achimaş Cadariu A. Scientific Medical Research Methodology, Medical University “Iuliu Haţieganu” Publishing House, Cluj-Napoca, 1999, p. 29-38 (in Romanian).

[11]. Hamm R. M., Clinical Decision Making Spreadesheet Calculator, University of Oklahoma Health Sciences Center, available at:

[12]. Epidemiology&Lab Statistics from Study Counts, EpiMax Table Calculator, Princeton, New Jersey, USA, available at:

[13]., Gaussian function, available at: function