Problem
The problem that the study was aiming to address:
The study evaluates whether basic Propensity Score Modeling (PSM) can reliably replicate results from Randomized Controlled Trials (RCTs) in criminal justice research. RCTs are considered the gold standard for causal inference but are often impractical in this field due to ethical and logistical challenges.
General impact to the system and/or public:
With increasing reliance on PSM as a substitute for RCTs, the accuracy of PSM in replicating unbiased causal relationships is critical for ensuring evidence-based criminal justice policies. Misestimations could lead to policy failures or misallocation of resources.
Research Question:
- Can PSM accurately replicate the findings of RCTs in criminal justice research?
- How do different PSM techniques perform in reducing selection bias and approximating RCT results?
- Under what conditions are PSM techniques most effective?
Method and Analysis
Program evaluated or gaps addressed:
This study addresses a significant gap by systematically testing the reliability of PSM techniques in criminal justice data, which has previously been underexplored compared to fields like medicine and education.
Data and Sample Size:
- Data from 10 RCT datasets (criminal justice focus) were retrieved from the National Archive of Criminal Justice Data.
- Sample sizes ranged from 351 to 1,469 cases per dataset.
- Covariates ranged from 33 to 131, with 104 outcomes analyzed.
Analysis Used:
- Artificial selection bias was introduced into the treatment groups of RCTs.
- Seven PSM methods were applied, including 1:1 matching (with/without caliper), 1-many matching, inverse probability of treatment weighting (IPTW), stratified weighting, and optimal pairs matching.
- Meta-analyses compared effect sizes (Cohen’s d) between PSM and RCTs.
- Moderator analyses examined covariate numbers, bias reduction, and use of calipers.
Outcome
Key Findings:
Effectiveness of PSM Techniques:
- PSM replicated RCT results well in most cases, with strong correlations between effect sizes (r ≥ 0.90).
- Differences in effect sizes between PSM and RCTs were small to moderate (mean d = 0.026 to 0.076).
Performance Variability:
- 1:1 matching (with/without caliper) and stratified weighting yielded the closest approximations to RCT results.
- IPTW and 1-many matching performed less reliably, with increased variability in effect sizes.
Challenges:
- At least 11% of PSM estimates fell outside the 95% confidence intervals of RCTs, indicating overestimation risks.
- The reliability of PSM was influenced by sample size, covariate quality, and bias reduction.
Implications or Recommendations:
Cautious Use of PSM:
- PSM is a viable tool when RCTs are impractical, but its limitations must be acknowledged.
Improving Accuracy:
- Employ robust bias reduction techniques and ensure covariates are theoretically relevant.
- Report balance metrics and sensitivity analyses to enhance transparency.
Policy and Research:
- Researchers and policymakers should critically assess PSM-based studies to avoid overconfidence in their findings.
- Consider alternative statistical techniques or multiple methods to validate results.
This study advocates for cautious optimism in applying PSM while emphasizing the need for methodological rigor in its implementation.