23 APR 2026

rahulmnavneeth

pipeline

homedocs

Every tracker is a composition of stages:

Source ──► Detector ──► Associator ──► Filter ──► Smoother ──► Sink
frames    per-frame     bind detections  recursive  (optional    OSC / MIDI
          detections    to tracks         estimate   offline)     / shm / CLAP

Source — camera or video file. Emits Frame values with a host-monotonic timestamp.

Detector — converts a frame into zero-or-more measurements (points, boxes, landmarks, masks). Typically a neural net invoked through the runtime inference layer. Stateless across frames.

Associator — given the current set of measurements and the current set of tracked instances, decide which measurement belongs to which track. Hungarian assignment on IoU / Mahalanobis distance / feature similarity. Returns (measurement, track_id) pairs plus a list of birthed / dead tracks.

Filter — recursive estimator. For vector states, a standard Kalman filter; for group states (SE(3), SL(3), etc.), the EKF-on-manifold in uify-core::filters::ekf_manifold. Produces the online sample stream.

Smoother — optional backward pass (RTS on the manifold) used offline for post-processing, rotoscope refinement, and dataset cleanup.

Sink — emits Sample<G, C> values to consumers via a transport or directly into the CLAP plugin's ring buffer.

Swappability

Each stage is a trait. Swapping, say, the detector from MediaPipe Hands to a distilled custom model does not touch the filter or the sink. Swapping the filter from an EKF to a UKF does not touch the detector. This is the payoff for the uniform Sample<G, C> contract at every stage boundary.

Where stages live