LH OneRecord Crew

Problem we dealt with : Cargo irregularities and discrepancies occur during the transportation of goods, leading to disruptions and problems. These issues include damaged or lost shipments, delivery delays, incorrect documentation, regulatory non-compliance, and logistical challenges. Factors like mishandling, inadequate packaging, poor tracking, or unexpected events contribute to these irregularities. When cargo irregularities happen, they can result in high consequences and end to dissatisfied customers and others involved, affecting the quality and transparency of the process. This dissatisfaction may lead to the risk that the customers to work with another party or carrier in the future, risking profit loss, loosing transparency, and loosing overall satisfaction. Additionally, addressing cargo irregularities is crucial for the sustainability of the cargo industry, as each extra step taken to rectify problems negatively impacts the environment.

Solution: Our solution revolves around combination of Machine learning and ONErecord by reading and sending back predicted data to the ONErecord. ONErecord platform aims to create a transparent cargo world for all stakeholders and we would like to bring this transparency to another level. By adopting this technology, we can anticipate and address potential troubles and irregularities in the shipment delivery process. Our solution eliminates any ambiguity regarding the occurrence of irregularities or discrepancies during the journey from point A to B. This reduction in effort and time invested in resolving issues will greatly benefit all parties involved in the supply chain, as each one will be aware of the probability of irregularities and can take proactive actions accordingly.

The implementation of ONErecord brings an unprecedented level of transparency, resulting in heightened customer satisfaction. Additionally, by preventing unnecessary steps and repetitions, our solution indirectly contributes to the sustainable delivery of shipments, aligning with the cargo world's objectives. Through the effective management of cargo irregularities, organizations can ensure the smooth and reliable movement of goods, minimize financial losses, and uphold their reputation within the industry.

Model:

Problem 1: Predicting Irregularity in the Next Status of Shipments: A Binary Classification Approach

Predicting the occurrence of irregularities in shipment statuses is a pivotal task within the supply chain management domain. The ability to foresee potential anomalies in this realm not only contributes to enhanced operational efficiency but also bolsters customer satisfaction rates. This document details our methodology to predict these irregularities using machine learning, specifically through binary classification - the determination of whether an irregularity will occur or not.

Exploratory Data Analysis (EDA)

Our initial stage involved performing an extensive Exploratory Data Analysis (EDA) to identify crucial features influencing the predictive outcomes. This phase helped us comprehend our data, unearth trends, and patterns, as well as discern relationships among features. It also facilitated the early detection of potential issues that could hinder the effectiveness of our predictive model.

Feature Engineering

Post-EDA, we conducted feature engineering to create new attributes that could potentially enhance the model's predictive capabilities. A significant feature we engineered was the 'time difference between sequences'. This attribute provided valuable temporal insights into our dataset, contributing an additional dimension to our prediction model.

Data Processing with Spark Databricks and MLflow

We utilized Spark Databricks for data processing, while also enabling collaborative and interactive analytics. Additionally, MLflow helped us track experiments to record and compare parameters and results. This combination of tools expedited our data processing workflows and enhanced model management.

Data Preparation

Our data preparation involved one-hot encoding categorical variables to ensure that our machine learning algorithms could process our data effectively. Additionally, we implemented a strategy for imputing missing values to avoid potential bias and inaccuracies in our model.

Model Training and Validation

Once our data was appropriately prepared, we split it into training, testing, and validation sets to ensure our model's robustness and its ability to generalize on unseen data.

Algorithm Selection and Hyperparameter Tuning

Our approach tested various algorithms, including XGBoost and Logistic Regression. Both algorithms were selected due to their effectiveness in binary classification tasks, and their performance was assessed with different hyperparameters. This step facilitated the identification of the most optimal model and hyperparameters combination.

Model Evaluation

Given the nature of our task - predicting irregularities, we prioritized maximizing recall. Recall is a valuable metric when the cost of false negatives (not identifying an irregularity when there is one) is high. While this might lead to some false positives (predicting an irregularity when there isn't one), in our case, we deemed it acceptable to err on the side of caution.

Score Function Preparation

After creating the model, we prepared a score function that takes raw data as input and outputs the probability of an irregularity in the next status event. This probability is then attached to the input file, creating a comprehensive dataset that combines original data with insights on potential irregularities. This approach allows us to flag possible anomalies proactively, significantly improving the efficiency and reliability of the supply chain.

Problem 2: Estimating Delivery-to-POD Duration: A Regression Analysis

In supply chain management, time prediction is critical. Accurate estimates allow for better planning and allocation of resources, improved customer satisfaction, and overall efficiency. This document elaborates on our project aimed at estimating the duration, in hours, between the delivery (DLV) status of a shipment and the proof of delivery (POD). This time prediction problem, defined as a regression problem, has our target variable as the duration between DLV and POD.

Problem and Challenges

The key challenge in our endeavor was the limited data available. In the domain of machine learning, data is the fuel that powers our models. Limited data can restrict the model's ability to learn and generate accurate predictions.

Exploratory Data Analysis (EDA)

Our first approach towards this problem was to conduct an Exploratory Data Analysis (EDA). It helped us understand our data better and identify crucial features that could influence the duration between DLV and POD. EDA also allowed us to spot any potential outliers or patterns that could affect our model's performance.

Feature Engineering

Post EDA, we undertook feature engineering to improve our model's predictive accuracy. We created new features, such as adding the time difference between sequences, that could provide valuable insights into our data. These engineered features are often more beneficial for our predictive models than the raw data.

Data Processing using Spark Databricks and MLflow

Our data processing tasks were executed using Spark Databricks, an excellent tool for handling large datasets and enabling interactive analytics. In parallel, we employed MLflow for experiment tracking to record and compare different parameters and results.

Data Preparation

The data preparation stage consisted of one-hot encoding categorical variables to convert them into a format that could be understood by our machine learning algorithms. Moreover, we dealt with missing values through imputation techniques, preventing possible bias and inaccuracies in our model due to incomplete data.

Model Training and Validation

With our data suitably prepared, we divided it into training, testing, and validation sets. This step is critical to ensure our model's robustness and its capability to generalize on unseen data.

Algorithm Selection and Hyperparameter Tuning

We utilized regression algorithms such as XGBoost for our problem, given its success in solving regression tasks. To maximize the algorithm's efficiency, we fine-tuned different hyperparameters. This process helped us identify the optimal combination of parameters that provided the best predictive performance.

Model Evaluation

The model's performance was evaluated using the R2 score, a common metric for regression tasks. The R2 score, or the coefficient of determination, indicates how well our model's predictions fit the actual data. A higher R2 score signifies a better model fit.

Application: We used AppSheet to create our application - Shipment in Nutshell.

What is AppSheet?

AppSheet is a free, user-friendly app developtment platform. It can be also created by the the business department to create custom mobile and web application without extensive coding knowledge. It offers a no-code or low-code approach, allowing users to build powerful apps by leveraging their existing data sources, such as spreadsheets or databases. With AppSheet, users can visually design the interface and functionality of their applications through a web-based interface. The platform supports a wide range of features, including data synchronization, user authentication, workflow automation, and interactive forms. It also provides various templates and sample apps to help users get started quickly.

Welcome to Shipment in Nutshell This is our Shipment in Nuthstell App. Currently you are see two pages: Irregularity Prediction Event Tracking Page - Irregularty Preciction: Here you see an overview over predicted irregularities for status events, sorted by awb numbers. If you click on it, you see the details of this shipment and this event. For the irregularity prediction we have created a new event, called "DIS", which shows its probability based on the last status event. Page - Event Tracking In this event tracking board you see for each awb number the shipment in a nutshell, including: Tracking event Tracking event description Irregularity Prediction Planned DateTime Actual DateTime Estimated DateTime (for the event proof of delivery, POD)

In this app you are able to filter for certain criterias, If you click on one status, then you get more information about this shipment. For example the depature and arrival location or the commodity. In the detailed shipment overview you have the capability to upload shipment documents. Furthermore you can review documents, which were already uploaded.

Limitations: Due to the limit of time during this hackathon this application is a Proof Of Concept. LogIn Details For AppSheet you need a Google Account. Take a look at our application: https://www.appsheet.com/start/3757a841-4cca-43fe-8b36-07b423855c8f User: lhonerecordcrew@gmail.com Password: onerecord2023

Built With

appsheet
azure
databricks
onerecord
pyspark
python

Updates

Helena Ahmadi posted an update — Jun 25, 2023 06:13 AM EDT

Dear Jury, In case if youtube blocks our video due to the copyright issue for the background musics that we used, the video is still available publicly via the following link. https://drive.google.com/file/d/1cypXzMNyrMvXYggKPbILJjbi2Er841Iy/view

Log in or sign up for Devpost to join the conversation.

Helena Ahmadi started this project — Jun 25, 2023 05:54 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.