The context
Garment quality control on the factory floor is still done by hand. An inspector lays each piece on a table, runs a tape measure against a spec sheet, and makes a call. The work is slow, the results vary from operator to operator, borderline pieces get judged inconsistently, and there is no clean audit trail or shareable proof for buyers. Photo-based shortcuts do not fix it either, because a flat pixel measurement changes with camera distance and angle. We set out to replace that manual step with a depth-aware measurement engine wrapped in factory-grade workflows: configure the standard once from a golden sample, then inspect every production piece against it in seconds.
The problem, precisely
Manual measurement is slow and subjective. Inspectors measure each garment by hand against a spec sheet. Results drift between operators, borderline calls are inconsistent, and a high-volume line cannot keep up without sacrificing rigour.
Pixel measurements lie. Measuring distances in a flat 2D photo depends on how far and at what angle the camera sits, so the same garment reads differently shot to shot. True-to-life centimetre accuracy needs real-world 3D geometry, not pixels.
No defensible standard or audit trail. Without a captured reference standard, every inspection is a fresh judgement call. There is no objective record of why a piece passed or failed, and no easy way to share a verifiable report with a buyer or auditor.
CV output does not speak the factory's language. A measurement model emits opaque keys and a raw graph of distances. Each factory has its own named Points of Measure and tolerances, so the engine's output has to be mapped to business meaning without retraining the model per customer.
What we built ✓ verified in code
Depth-aware measurement engine
A FastAPI service drives an Intel RealSense depth camera, captures aligned colour and depth frames, and back-projects each detected 2D garment landmark into a real-world 3D point using the depth map and camera intrinsics. Every measurement is the true Euclidean distance in centimetres, so it stays accurate regardless of how far the camera sits from the garment.
Automated CV model chain
A four-stage pipeline locates measurement points automatically across nine garment types: a fine-tuned TinyViT classifier identifies the garment, BiRefNet segments it from the background, and an HRNet pose network detects the landmarks. The model surfaces a complete graph of landmark-to-landmark distances, keeping the vision layer decoupled from any single factory's measurement vocabulary.
Django QC system of record
A Django REST backend models the apparel catalog, Points of Measure, golden-sample seeds, per-SKU tolerances, and configurable quality rules. It maps the raw model keys to named POMs, compares each production garment against its seed within tolerance, tallies failures by priority, and returns a PASS, FAIL, or REVIEW verdict per garment and per batch.
Role-based factory workflows
Operator-facing apps cover three roles built for non-technical floor staff: Admin sets up the standard and captures the golden sample, QC Operator runs line inspections, and Supervisor adjudicates borderline REVIEW cases with mandatory remarks. Every inspection gets a tokenised public link for QR-shareable reports, backed by an immutable activity log.
Zero-hardware single-photo path
A separate, lighter estimator uses an OpenAI GPT-4o-mini vision model to read garment dimensions and a confidence score from a single ordinary photo plus a visible reference object for scale. It offers a no-special-hardware alternative to the depth-camera pipeline for lower-fidelity use cases.
How it works
- 1
Configure the standard once. An admin sets up the master data and maps the model's measurement keys to the factory's named Points of Measure per category, then captures the golden sample for a style and size. The system measures it, maps the values to POM codes, and locks it in as the reference seed.
- 2
Capture and measure on the line. A QC operator selects a style with a locked seed and captures the production garment. The depth camera grabs aligned colour and depth frames, and the model chain classifies, segments, and detects landmarks in a single orchestrated pass.
- 3
Back-project to true centimetres. Each 2D landmark is converted into a real-world 3D point using the aligned depth value and camera intrinsics, and every measurement becomes the true 3D distance in centimetres rather than a distance-sensitive pixel estimate.
- 4
Compare and verdict. The backend compares each measured POM against the seed within its configured percentage tolerance, weights failures by CRITICAL, MEDIUM, or INFO priority, applies the SKU's quality rule, and returns PASS, FAIL, or REVIEW with a full per-POM breakdown.
- 5
Route, review, and share. Borderline REVIEW cases route to a supervisor for an accept or reject decision with mandatory remarks. Batches close automatically once the target quantity is reached, and any inspection can be shared through a tokenised public report link.
The outcome
The result is a repeatable, objective, auditable QC gate that turns a subjective manual step into a configurable, data-backed PASS, FAIL, or REVIEW decision. The standard is learned once from a golden sample, measurements are taken in true centimetres using depth and camera intrinsics, and verdicts are weighted by per-POM priority with automatic supervisor routing for borderline cases. Built for non-technical floor staff with two measurement modalities, a high-fidelity depth-camera path and a no-hardware single-photo path, the measurement and inspection engine is real and ready to connect to the live camera rig.
