Cardiovascular disease remains a leading cause of global mortality. However, early detection often relies on subjective surveys or time-intensive lab results. In high-volume clinical environments or low-resource settings, every second counts.
For my submission to Byte 2 Beat: Hack4Health, I developed the Rapid Risk Indicator (RRI) set—seven verifiable "bytes" of health data that can be collected in just a few minutes to provide immediate clinical triage.
1. Problem Framing
The goal was to bridge the gap between complex diagnostics and rapid screening. By stripping away subjective "soft" indicators, I focused on a "high-signal" approach that uses only objective, non-invasive data.
2. Methods & Data Strategy
I utilized the CDC’s Behavioral Risk Factor Surveillance System (BRFSS) dataset, which provides a robust foundation of over 250,000 records.
The "7-Byte" Feature Set
To ensure clinical reliability, the model uses only these seven indicators:
- Demographics: Age, Sex
- Vascular History: Prior Stroke History
- Biometrics: BMI
- Lifestyle: Smoking Status, Alcohol Consumption
- Systemic Access: Healthcare Access
Modeling Strategy
I employed a logistic regression model with a 60/20/20 train-validate-test split.
- Class Imbalance: Given the 10% prevalence of heart disease in the dataset, I utilized balanced class weighting.
- Optimization: This strategy specifically optimizes for recall, ensuring that the cost of a missed diagnosis (False Negative) is minimized in a triage context.
3. Statistical Validation
All seven indicators achieved a p-value of < 0.001, confirming their validity as significant predictors. Age emerged as the primary driver of cardiovascular risk with an Odds Ratio of 2.59.
4. Results & Evaluation
On the unseen test set (n = 50,736), the framework demonstrated its utility as a powerful clinical safety net:
- Sensitivity (Recall): 73% – The model successfully flagged nearly 3 out of every 4 cardiovascular events using only 7 non-invasive questions.
- Negative Predictive Value (NPV): 96% – If the model classifies a patient as low-risk, there is a 96% probability that they are healthy, providing high confidence for clinical discharge.
Calibration: How Reliable are the Risks?
Calibration curve analysis showed that while the model is a powerful tool for ranking and triaging risk, it remains conservative in its absolute probability estimates. This confirms its utility for triage ranking rather than absolute diagnosis.

The calibration curve compares predicted probabilities against actual cardiac event proportions.
5. Conclusion
The Rapid Risk Indicator framework demonstrates that a minimal "byte" of objective data can provide a high-sensitivity safety net for heart disease detection. By focusing on p-verified hard indicators and high recall, we can provide a tool that is fast, scientifically ironclad, and ready for real-world clinical implementation.
Top comments (0)