Calculating d’, beta, c and Ad’ in Python and PHP

About d’

A central component of Signal Detection Theory is d’ – a measure of the ability to discriminate a signal from noise. The d’ is flanked by the parameters “beta” and c, which are measures of the criterion that the observer uses to discriminate between the two.

These measures can be calculated in every experiment where there is a signal (e.g. target trials) and noise (e.g. nontarget trials), and the observer (e.g. subject) is to indicate whether the signal was present or not. d’, beta and c are statistical measures in a model where noise and noise+signal are regarded as two “probability-of-detection” distributions on a “threshold-of-detection” continuum. d’ is basically a Z-score on how well the observer discriminates the two distributions, i.e. the number of standard deviations between the probability-of-response distributions for signal and noise for this given subject. c (the criterion) is the number of standard deviations from the midpoint between these two distributions, i.e. a measure on a continuum from “conservative” to “liberal”.

Every trial is scored as one of four possibilities:

  • noise + signal + response: HIT
  • noise + signal + no response: MISS
  • noise + response: FALSE ALARM (FA)
  • noise + no response: CORRECT REJECTION (CR)

It is often analyzed in terms of rates:

  • HIT RATE = HIT/SIGNALS (ratio of signals detected as signals)
  • FA RATE = FA/NON-SIGNALS (ratio of noise detected as signals)

You want to have a high hit rate and a low fa rate (hitrate > farate). Say you have conducted an experiment on a subject and calculated hitrate and farate. A signal-detection account of the subjects signal-detection-abilities can be one of the following

  • If hitrate=1 and farate=0, d’ is infinite, meaning that you signals and noise are totally separated, i.e. the distributions are non-overlapping. In that case, you have effectively discriminated signal from noise.
  • If hitrate > farate, d’ is positive, meaning that you are more likely to detect the information as signal when signal is present than when it is absent.
  • If hitrate=farate, d’=0, meaning that you are just as likely to detect signals and noise as signals, i.e. you were unable to discriminate the signal and noise and hence the probability-of-detection distributions overlap perfectly. This occurs when responses are perfectly random.
  • If hitrate < farate, d’ is negative, meaning that you have a wierd bias of primarily detecting noise as signals while ignoring the actual signals.
  • If hitrate = 0 and farate = 1, d’ is negatively infinite, meaning that you have completely confused noise and signal.

For a a more thorough introduction and guide to calculating d’/beta, “Calculation of signal detection theory measures” by Stanislaw & Todorov.

Calculating d’, beta, c and Ad’ in Python

Using the Percentile Point Function (ppf) from scipy:

from scipy.stats import norm
import math
Z = norm.ppf

def SDT(hits, misses, fas, crs):
    """ returns a dict with d-prime measures given hits, misses, false alarms, and correct rejections"""
    # Floors an ceilings are replaced by half hits and half FA's
    half_hit = 0.5 / (hits + misses)
    half_fa = 0.5 / (fas + crs)
 
    # Calculate hit_rate and avoid d' infinity
    hit_rate = hits / (hits + misses)
    if hit_rate == 1: 
        hit_rate = 1 - half_hit
    if hit_rate == 0: 
        hit_rate = half_hit
 
    # Calculate false alarm rate and avoid d' infinity
    fa_rate = fas / (fas + crs)
    if fa_rate == 1: 
        fa_rate = 1 - half_fa
    if fa_rate == 0: 
        fa_rate = half_fa
 
    # Return d', beta, c and Ad'
    out = {}
    out['d'] = Z(hit_rate) - Z(fa_rate)
    out['beta'] = math.exp((Z(fa_rate)**2 - Z(hit_rate)**2) / 2)
    out['c'] = -(Z(hit_rate) + Z(fa_rate)) / 2
    out['Ad'] = norm.cdf(out['d'] / math.sqrt(2))
    
    return(out)

Note the adjustment of rate=0 and rate=1, to prevent infinite values.

Calculating d’, beta, c and Ad’ in PHP

// Percentile Point Function from http://spechal.com/2009/11/15/php-function-to-emulate-excel-normsinv/
function Z($p){
	$a1 = -39.6968302866538; $a2 = 220.946098424521; $a3 = -275.928510446969;
	$a4 = 138.357751867269; $a5 = -30.6647980661472; $a6 = 2.50662827745924;
	$b1 = -54.4760987982241; $b2 = 161.585836858041; $b3 = -155.698979859887;
	$b4 = 66.8013118877197; $b5 = -13.2806815528857; $c1 = -7.78489400243029E-03;
	$c2 = -0.322396458041136; $c3 = -2.40075827716184; $c4 = -2.54973253934373;
	$c5 = 4.37466414146497; $c6 = 2.93816398269878; $d1 = 7.78469570904146E-03;
	$d2 = 0.32246712907004; $d3 = 2.445134137143; $d4 = 3.75440866190742;
	$p_low = 0.02425; $p_high = 1 - $p_low;
	$q = 0.0; $r = 0.0;
	if($p < 0 || $p > 1) throw new Exception("NormSInv: Argument out of range.");
	else if($p < $p_low){
		$q = pow(-2 * log($p), 2);
		$ppf = ((((($c1 * $q + $c2) * $q + $c3) * $q + $c4) * $q + $c5) * $q + $c6) / (((($d1 * $q + $d2) * $q + $d3) * $q + $d4) * $q + 1);
	} 
	else if($p <= $p_high){
		$q = $p - 0.5; $r = $q * $q;
		$ppf = ((((($a1 * $r + $a2) * $r + $a3) * $r + $a4) * $r + $a5) * $r + $a6) * $q / ((((($b1 * $r + $b2) * $r + $b3) * $r + $b4) * $r + $b5) * $r + 1);
	} 
	else {
		$q = pow(-2 * log(1 - $p), 2);
		$ppf = -((((($c1 * $q + $c2) * $q + $c3) * $q + $c4) * $q + $c5) * $q + $c6) / (((($d1 * $q + $d2) * $q + $d3) * $q + $d4) * $q + 1);
	}
	return $ppf;
}
 
// Credit to comment on http://php.net/manual/en/function.stats-stat-percentile.php
function erf($x) { 
        $pi = 3.1415927; 
        $a = (8*($pi - 3))/(3*$pi*(4 - $pi)); 
        $x2 = $x * $x; 
 
        $ax2 = $a * $x2; 
        $num = (4/$pi) + $ax2; 
        $denom = 1 + $ax2; 
 
        $inner = (-$x2)*$num/$denom; 
        $erf2 = 1 - exp($inner); 
 
        return sqrt($erf2); 
}
 
function cdf($n) { 
        if($n < 0) return (1 - erf($n / sqrt(2)))/2; 
        else return (1 + erf($n / sqrt(2)))/2; 
}
 
// Calculate d' and avoid infinity
function dPrime($hits, $misses, $fas, $crs) {
	$half_miss = 0.5/($hits+$misses);
	$half_fa = 0.5/($fas+$crs);
	$hitrate = $hits/($hits+$misses);
	$farate = $fas/($fas+$crs);
 
	// Avoid d' negative infinity
	if($hitrate < 0.025) {
		if($hitrate+$half_miss < 0.025) $hitrate = 0.025;
		else $hitrate = $half_miss;
	}
	if($farate < 0.025) {
		if($farate+$half_fa < 0.025) $farate = 0.025;
		else $farate = $half_fa;
	}	
 
	// Avoid d' infinity
	if($hitrate > 0.975) {
		if($hitrate-$half_miss > 0.975) $hitrate = 0.975;
		else $hitrate = 1-$half_miss;
	}
	if($farate > 0.975) {
		if($farate-$half_fa > 0.975) $farate = 0.975;
		else $farate = 1-$half_fa;
	}
 
	// Calculate d' using Percentile Point function
	$out = new stdClass;
	$out->d = Z($hitrate)-Z($farate);
	$out->beta = exp((pow(Z($farate),2) - pow(Z($hitrate),2))/2);
	$out->c = -(Z($hitrate)+Z($farate))/2;
	$out->Ad = cdf($out->d/sqrt(2));
	return $out;
}

Note that a rate-threshold of 0.025 is used because the PHP-ppf function above is only an estimation which fails violently if this threshold is exceeded.