Glen Larkin โ† Back to Portfolio
Data Analysis & Machine Learning

Diabetic Patient Readmission Analysis

Exploratory data analysis and visualization of 69,570 hospital records to uncover patterns in diabetic patient readmissions โ€” using Python, Pandas, and AWS SageMaker.

Python Pandas Matplotlib Seaborn AWS SageMaker SageMaker Autopilot Data Wrangler

Project Overview

Hospital readmissions are a major cost driver in healthcare โ€” and for diabetic patients, they're especially common. This project analyzes a real-world dataset of 69,570 diabetic patient encounters to identify which factors are most associated with readmission, and to build a foundation for predictive modeling.

The full pipeline spans raw data exploration, feature engineering with AWS SageMaker Data Wrangler, automated model training with SageMaker Autopilot, and batch inference โ€” all tied together with custom Python visualizations.

Dataset at a Glance

69,570
Patient Records
16
Features
40%
Readmission Rate
4.3
Avg. Days in Hospital

Readmission Breakdown

Not Readmitted41,613 (60%)
Readmitted after 30+ days21,805 (31%)
Readmitted within 30 days6,152 (9%)

Key Findings

๐Ÿ‘ด

Age is a strong predictor

Over 65% of patients were aged 65โ€“85. Readmission rates climb steadily with age, peaking in the 75โ€“85 bracket.

๐Ÿ’Š

Medication changes matter

45% of patients had their medication changed during the visit. 75% were on diabetes medication โ€” both are correlated with readmission risk.

๐Ÿงช

Testing gaps are significant

82% of patients had no HbA1c test recorded, and 95% had no blood glucose serum test โ€” despite being diabetic patients.

๐Ÿฅ

Prior admissions are telling

Number of prior inpatient visits is positively correlated with readmission โ€” patients with a history of hospitalization are more likely to return.

Tools & Pipeline

  1. 1
    Data Exploration
    Loaded and profiled 69,570 records with Pandas. Identified class imbalance, missing test data, and key demographic distributions.
  2. 2
    Feature Engineering โ€” AWS SageMaker Data Wrangler
    Applied transformations, handled categorical encoding, and exported a clean feature set to S3 via a SageMaker Processing Job.
  3. 3
    AutoML โ€” AWS SageMaker Autopilot
    Ran automated model selection and hyperparameter tuning. Autopilot generated candidate models and a full data exploration notebook.
  4. 4
    Inference & Visualization
    Ran batch inference on unlabeled data with prediction probabilities. Built a 6-panel visualization dashboard with Matplotlib and Seaborn.

Visualization Dashboard

Generated with Python (Matplotlib + Seaborn) โ€” 6 charts covering readmission distribution, age trends, race breakdown, hospital stay length, medication density, and feature correlations.

Diabetic Readmission Visualization Dashboard

Interested in this project?

Let's connect on LinkedIn or check out more of my work on GitHub.