CircadifyCircadify
Program Evaluation8 min read

How do health programs prove their new phone tools actually save lives?

How global health programs measure whether phone-based tools save lives, from proxy metrics to mortality endpoints, for researchers and grant-making bodies.

trycareview.com Research Team·
How do health programs prove their new phone tools actually save lives?

A donor writes a check, a ministry signs a memorandum, and within a year thousands of community health workers are carrying smartphones into homesteads that have never seen a blood pressure cuff. The pitch is always the same: these tools will save lives. The harder question, and the one that decides whether a pilot becomes a national program, is how anyone proves it. Demonstrating global health technology impact is rarely as simple as counting downloads or screenings. Mortality is a rare event, attribution is contested, and the gap between a phone in a worker's hand and a child who survives a fever can span months and a dozen intervening decisions. This report examines the methods programs actually use to make that causal case, and where the evidence is strongest and weakest.

A 2023 multi-country survey of community health workers found that roughly 80 percent already engage with digital health tools and 84 percent are optimistic about their impact, yet rigorous outcome evaluations linking those tools to survival remain comparatively scarce., Multi-country CHW survey, published in 2023

Measuring global health technology impact beyond the screening count

The central problem in measuring global health technology impact is that the thing programs most want to claim, lives saved, sits at the far end of a long causal chain. A phone tool might improve the speed of data collection, which improves triage accuracy, which improves referral timing, which improves treatment, which eventually reduces deaths. Each link can be measured, but each also leaks. Evaluators distinguish between three tiers of evidence: process metrics such as scans completed and forms submitted, intermediate outcomes such as referral completion and antenatal visit rates, and distal outcomes such as case fatality or all-cause mortality. The credibility of a claim depends heavily on which tier it rests on, and on the study design used to rule out alternative explanations.

The World Health Organization's 2016 guide, "Monitoring and Evaluating Digital Health Interventions," remains the reference text here, and its 2022 follow-up, "Monitoring the implementation of digital health," extended the indicator frameworks toward maturity and equity. Both documents make the same uncomfortable point: most digital health projects report activity, not effect. A tool can be wildly adopted and still change nothing about who lives.

The table below maps the common evaluation approaches against what they can and cannot establish.

Evaluation method What it measures Strength of causal claim Typical cost and timeline Best suited for
Activity dashboards Scans, registrations, form completion Very low (no comparison) Low, continuous Operational monitoring
Pre-post comparison Change in outcome before vs after rollout Low to moderate (confounded by trends) Moderate, 1-2 years Early signal detection
Interrupted time series Trend shift at intervention point Moderate (controls secular trend) Moderate, 2-3 years Routine-data settings
Stepped-wedge rollout Outcomes as sites phase in High (built into deployment) High, 2-4 years Phased national scale-up
Cluster randomized trial Outcomes vs randomized controls Highest (rules out confounding) Very high, 3-5 years Definitive impact claims

No single row is correct for every program. A grant-making body funding a national scale-up gains little from a five-year trial that ends after the money is spent, while a research consortium making a mortality claim cannot lean on a dashboard.

The proxy problem and how programs work around it

Because mortality endpoints demand enormous sample sizes, most field programs build their case on validated proxies. The logic is defensible when the proxy has an established link to survival in the clinical literature. Common proxy chains include:

  • Referral completion rate as a proxy for treatment access, which is linked to survival for conditions such as severe malaria and obstetric emergencies.
  • Antenatal visit counts as a proxy for maternal and neonatal outcomes, given the well-documented dose-response between visit frequency and outcomes.
  • Time-to-triage as a proxy for early detection, particularly for childhood pneumonia and dehydration.
  • Hypertension and fever detection yield as a proxy for averted complications.

The weakness is that a proxy improves only if every downstream link holds. A program can raise referral generation by 40 percent and save no lives if the receiving clinic has no drugs. This is why serious evaluations now measure the whole pathway rather than a single endpoint, and why contactless vitals data, captured through methods such as rPPG field deployment, is valued less for any single reading than for the auditable, time-stamped trail it leaves across the referral chain.

Industry Applications

Grant-making bodies and value for money

Funders increasingly ask for cost-effectiveness expressed per disability-adjusted life year averted, not per person screened. This shifts the burden from adoption metrics toward modeled outcomes, usually built by combining program process data with effect sizes drawn from published trials. The quality of that model depends entirely on the quality of the program's own outcome data, which makes structured digital capture a precondition for any credible funding case.

Public health institutions and surveillance integration

Ministries of health evaluate phone tools partly on whether village-level data reaches national surveillance systems intact and on time. Here the impact claim is about system performance, completeness, timeliness, and the ability to detect outbreaks earlier, rather than about individual survival. A tool that improves reporting latency from weeks to hours has a measurable public health value even before any mortality analysis.

Academic research partnerships

Researchers bring the designs, stepped-wedge and cluster-randomized, that produce defensible causal claims. The 2023 sickle cell disease trial combining community health worker support with mHealth, which reported improved health-related quality of life among young adults, is an example of the rigor required to publish an effect rather than an association. Field programs that embed researchers from the start tend to generate publishable, fundable evidence; those that bolt evaluation on at the end rarely do.

Current research and evidence

The published record is uneven but growing. A time-series analysis of mHealth use by community health workers in rural Malawi found measurable increases in maternity care utilization, an intermediate outcome with a strong link to maternal survival. The 2023 multi-country CHW survey documented that workers themselves attribute faster data collection and more frequent community contact to digital tools. A 2023 WHO study reported that digital tools improved health workers' performance, skills, and mental health. And a pilot randomized controlled trial of mHealth combined with community health worker support for antiretroviral therapy adherence demonstrated feasibility and acceptability, the necessary precursor to a powered effect study.

What remains thin is the direct mortality evidence. Most studies stop at utilization, adherence, or quality of life because death is rare, expensive to measure, and ethically complicated to randomize against. The honest conclusion from the current literature is that phone tools reliably improve the intermediate steps that precede survival, while the final link to mortality is usually inferred rather than directly observed. Programs that acknowledge this distinction are more credible than those that claim lives saved from a dashboard.

The Future of global health technology impact measurement

Three shifts are likely to define the next several years. First, evaluation is moving from one-off studies toward continuous, embedded measurement, where the same digital system that delivers the intervention also captures the outcome data. Second, the WHO's 2023 update to the Global Digital Health Monitor folded in equity, gender, and person-centered indicators, signaling that impact will increasingly be judged on who benefits, not only on aggregate effect. Third, contactless measurement adds a layer of objective, sensor-derived data to records that were previously self-reported, which strengthens the auditability of any causal chain. The programs that win renewed funding will be those that treat evaluation as infrastructure rather than as a final-year report.

Frequently asked questions

Can a phone-based health program ever prove it saves lives?

Directly proving mortality reduction requires a large, often randomized study because death is statistically rare. Most programs instead build a defensible chain of evidence using validated proxies such as referral completion and antenatal visits, paired with study designs like stepped-wedge or interrupted time series that rule out competing explanations.

Why are screening counts not enough to show impact?

Counts measure activity, not effect. A program can complete millions of screenings and change no outcomes if referrals are never completed or clinics lack treatment. Credible global health technology impact claims require evidence that the activity changed an outcome relative to a comparison group or trend.

What study design gives the strongest causal evidence?

Cluster randomized trials give the highest confidence but are slow and expensive. Stepped-wedge designs are a practical compromise because randomization is built into a phased rollout, letting programs scale and evaluate at the same time.

How do funders judge value for money?

Funders increasingly use cost per disability-adjusted life year averted, combining a program's own process data with published effect sizes. This makes structured, high-quality outcome data a precondition for any strong funding case.

Circadify is working in this space, building contactless vitals capture designed to leave the kind of auditable, time-stamped data trail that rigorous impact evaluation depends on. Researchers and grant-making bodies exploring collaboration or reviewing field evidence can find related research and partnership material at circadify.com/blog.

global health technology impactmHealth evaluationcommunity health program outcomesrPPG field deployment resultsprogram impact measurement
Read the Research