Inspiration Vietnam's logistics sector loses an estimated USD 8–12 billion annually to operational inefficiencies, with total logistics costs consuming 16–18% of GDP — nearly double the global benchmark. Yet when we spoke with logistics operators, the problem they described wasn't a lack of data. They had spreadsheets, ERP exports, and CSV files everywhere. What they lacked was the ability to look at that data and answer one simple question: which step, right now, is abnormal — and how bad is it? Delays were only discovered after they had already become SLA breaches or demurrage invoices. Root causes were invisible. The insight that drove Vyn was this: the data to detect bottlenecks already exists inside every logistics event log. What was missing was an automated intelligence layer that could read it, without requiring labeled data, complex integration, or a data science team to operate it. What it does Vyn is an AI-powered Logistics Process Intelligence Platform that transforms raw event-log CSV data into actionable bottleneck alerts across three operational domains: Trucking Delivery, Warehouse Fulfillment, and Import Customs Clearance. Users upload a CSV file. Vyn's AI-assisted schema mapping engine automatically aligns the column structure to its canonical schema, then runs two parallel inference streams. Stream A applies Isolation Forest to detect step-level process anomalies and compute a domain-specific risk score. Stream B runs role-specific classifiers — Driver AI, Fleet AI, and Ops AI — to assess behavioral risk at the entity level. The two streams converge into a single final_risk_score = 0.7 × process_risk + 0.3 × entity_risk, which is surfaced to the user as a plain-English risk dashboard with highlighted bottleneck steps, Z-score deviations, and prioritized entity flags. No labeled training data is required, and no ERP integration is needed.
How we built it The stack is a modular, three-layer pipeline. The frontend is built with React, TypeScript, and Tailwind CSS, using React Flow with node virtualization for rendering process diagrams across large datasets without frame drops. The backend runs on FastAPI with Pydantic for schema validation, connected to MongoDB for flexible document storage of process baselines, anomaly records, and entity profiles. The analytics engine uses Python with scikit-learn's Isolation Forest (n_estimators=200) and NumPy for feature engineering. The feature engineering pipeline encodes each execution case as a 6-feature vector — total duration, max step duration, standard deviation, top Z-score, mean Z-score, and step count — normalized with StandardScaler before inference. The schema mapping engine uses a keyword + fuzzy matching algorithm with confidence scoring to automatically align user CSV columns, validated at zero schema errors across 17,000 rows. The final risk score fuses process and entity outputs at a 70/30 ratio, calibrated to reflect signal confidence rather than arbitrary weighting. Challenges we ran into No labeled training data. Vietnamese SME logistics operators have zero historical anomaly annotations. Supervised models were infeasible. We chose Isolation Forest precisely because it is unsupervised, runs in O(n log n), and handles non-Gaussian step-duration distributions — but tuning contamination parameters without ground truth required building a synthetic calibration dataset aligned to published Vietnamese logistics benchmarks. Inconsistent column naming. Vietnamese 3PL exports use divergent field aliases — "act", "status", and "step_code" all map to the same concept; "shipment_no" and "order_id" both refer to case_id. A rigid schema contract would have rejected real-world inputs. We built a fuzzy matching engine with a keyword dictionary and Levenshtein distance scoring, which resolved all naming inconsistencies across 17,000 rows. Interpretability gap. Isolation Forest outputs a raw score in [−0.5, +0.5] that logistics managers cannot act on. We built a dual-layer output: a normalized Risk Score 0–100 derived from the duration/P95 ratio, paired with a plain-English bottleneck label per case, so that every flag comes with a human-readable explanation.
Visualization performance. React Flow with 250+ nodes caused frame drops below 30fps on mid-range hardware. We solved this with node virtualization — only visible nodes are mounted to the DOM — combined with TanStack Query for server-state caching, achieving smooth rendering up to 10,000 events per view. Accomplishments that we're proud of Across 750 validated cases, the Isolation Forest pipeline achieved a 3–4× risk score separation between anomalous and normal entities — anomalous cases averaged 40–42, normal cases averaged 9–10 — without a single labeled training example. Customs Clearance reached a 35.6% anomaly rate with the inspection cluster (STEP_006–008) accounting for 34.8% of all bottlenecks. Trucking detected a 31.6% cycle time differential; Warehouse detected the largest proportional gap at +32.1%. Driver AI achieved a recall of 0.908 — meaning 90.8% of genuinely high-risk drivers are flagged before incidents escalate — which we consider the most operationally meaningful metric in the system. Ops AI identified low-rated operational contexts that accumulate 52% more detention hours on average, giving dispatch managers a direct, quantified prioritization signal. We are also proud that the entire system deploys from a CSV file with no ERP integration and no data science background required to operate — which is the exact constraint faced by the SMEs we built this for.
Log in or sign up for Devpost to join the conversation.