TargetRWE Technical Product Owner May 2025 – Nov 2025

Standing up a patient chart viewer from zero to MVP

Real-world evidence (RWE) platform leveraging EMR, claims-linked longitudinal data, and health registries to power pharmaceutical research across a portfolio of liver-disease indications.

-45%Manual clinical-review time via LLM-powered extraction, accelerating trial startup by ~3 weeks.

12 + 8Feasibility studies and prospective trials delivered with 100% on-time pharma package delivery.

MVP shippedPatient Chart Viewer live with stakeholder buy-in; cohort identification moved from batch SAS refreshes to real-time Sigma dashboards.

Context

The setup

TargetRWE builds real-world evidence infrastructure for pharmaceutical sponsors running studies across a portfolio of liver-disease indications. The data estate spans EMR charts and health registries across a research network of tier-1 academic medical centers, with patient records tokenized and linked to external claims datasets via industry-standard vendors (Datavant-style DOD linking) for longitudinal cohort studies and NLP model validation at scale. Standards alignment: FHIR, OMOP, DICOM.

When I joined, the Patient Chart Viewer didn't exist. Data scientists and clinical analysts worked unstructured EMR notes through ad-hoc SQL and spreadsheets, no unified workflow for ground truthing, annotation review, or cohort definition. Sigma dashboards on Snowflake (ODS + EDW) required manual SAS refreshes before a cohort could be materialized.

My role sat at the intersection of engineering, data science, and medical affairs, translating clinical hypotheses into productized data pipelines, and acting as the risk manager of credibility for what reached pharma sponsors. Most of my contribution was upstream of code, scoping schemas, writing validation specs, and defining the acceptance criteria that engineering and DS implemented.

Problem

What was broken

Clinical NLP models needed structured ground truth at scale, specifically for liver-disease endpoints (biopsy findings, alcohol-use mentions, fibrosis progression, medication history), but there was no operator-friendly interface to label unstructured EMR records.

Cohort identification was hypothesis-driven but the tooling was batch. A data scientist filed a SAS request, waited hours, reviewed results in a spreadsheet, then iterated, turning what should be a 10-minute hypothesis test into a multi-day loop.

Linked EMR + claims data needed data-quality contracts and an AI model build/validate process before pharma sponsors would trust downstream analytics.

A distributed engineering + data science team across US time zones needed clear requirements, acceptance criteria, and a rhythm for standups, grooming, and retrospectives.

Approach

What I did

0→1 RWE product definition Drove requirements gathering across clinical informatics, biostatistics, data science, and engineering to scope the MVP. Owned technical architecture, data model for linked EMR + claims, viewer components, annotation interactions, and presented to pharma-facing stakeholders for buy-in.
LLM operations for clinical data extraction Architected LLM-powered pipelines for extracting structured data from unstructured EMR notes (liver biopsy findings, alcohol-use mentions, medication history, lab values). Cut manual review time ~45% and accelerated trial startup by ~3 weeks on average across studies.
AI/ML ground truthing workflows Architected annotation guidelines and labeling workflows (Encord) for clinical NLP models processing unstructured liver-disease EMR data. Defined inter-annotator agreement thresholds and review loops for biopsy, alcohol-use, and medication-history extraction feeding downstream NLP validation.
Multi-site ingestion + PII-safe landing zone Owned the S3 → ODS → Snowflake EDW pipeline with Presidio-based PII redaction, automated QC, duplicate-patient quarantine logic, and cross-site person-domain matching. Healthcare standards alignment across FHIR, OMOP, and DICOM.
Snowflake + Sigma + SAS cohort pipeline Built Sigma BI dashboards on Snowflake ODS and EDW, automating SAS refreshes so analysts could run hypothesis-driven cohort queries in real time instead of batch. Tokenized patient records and linked them to external claims datasets (Datavant-style DOD linking) for longitudinal views.
Data quality + AI model validation Codified data quality contracts for the EMR + claims pipeline and the AI model build/validate process, trace from ingest → curation → model output, with QC sanity checks and unit-test gating before release.
Agile rhythm for a distributed team Led cross-functional standups, retrospectives, and backlog grooming for engineering + data science, translating clinical hypotheses into user stories, acceptance criteria, and release trains.

Outcome

What moved

-45%

Manual clinical-review time via LLM-powered extraction, accelerating trial startup by ~3 weeks.

12 + 8

Feasibility studies and prospective trials delivered with 100% on-time pharma package delivery.

MVP shipped

Patient Chart Viewer live with stakeholder buy-in; cohort identification moved from batch SAS refreshes to real-time Sigma dashboards.

NLP-ready

Annotation guidelines adopted as the ground-truth standard for downstream clinical NLP model validation.

Stack

Built with

Snowflake (ODS + EDW)Sigma BISASLLM operationsClinical NLPEncord annotationPresidio PII redactionDatavant-style tokenizationFHIR / OMOP / DICOMAgile / Scrum

Reflections

What I learned

The risks in RWE don't shout, they whisper. Protocol drift (eligibility quietly stretched), incomplete provenance (data pulled from multiple systems without clear traceability), PHI creep (identifiers slipping back into datasets despite de-identification), unchecked assumptions ("everyone knows how this variable is collected" until you find out they don't), stakeholder misalignment (sponsors expecting trial-like rigor while operational teams adapt to real-world flexibility). Most of my job here was making those whispers visible early, turning each into a validation rule, schema decision, or phased delivery plan.
The trade-off I wrestled with most. Speed versus rigor on clinical-variable standardization. Engineering wanted raw processor outputs in the warehouse fast; medical affairs wanted full standardization first. I usually landed on phased delivery, promote the safe categorical layer now, standardize the noisier quantitative layer next cycle, flag low-confidence outputs explicitly. Kept sponsor timelines intact without spending data scientists' trust.
The detective work that mattered. When sponsor analysts flagged missing values on a standard clinical lab, the kind that should appear on every chemistry panel, the investigation meant comparing ODS to EDW patient-by-patient, breaking discrepancies out by site and by cycle date, and finding a promotion-logic bug long before the sponsor escalation curve. The pattern was clear enough that I started treating "lab missingness investigations" as a recurring product surface, query schema, sampling protocol, known root-cause taxonomy.
What I'd do differently. I underestimated how much of the job was translation, between engineering's idea of "done," a data scientist's idea of "usable," and a sponsor's idea of "defensible." Doing this again from day one, I'd build the validation-criteria template before the pipeline, not after.