Project Story: Predicting Airfare Using Machine Learning

Overview

This project aims to predict airfare prices between two cities based on various features using a neural network model implemented in PyTorch. The dataset consists of airline fare data along with geographical, demographic, and market-related features, which influence airfare pricing. The objective is to train a robust model to make accurate fare predictions, assess model performance, and gain insights from the trained model.

Data Preparation

  1. Data Loading and Initial Cleaning:

    • The dataset was loaded from a pickle file (final_features.pkl), containing essential features such as year, city market IDs, passenger counts, and fare details.
    • Rows with missing values were dropped to ensure clean data for analysis and training.
  2. Outlier Removal:

    • Outliers in the fare column were identified using the Interquartile Range (IQR) method. This step was important for maintaining model integrity and improving predictive performance.
  3. Normalization:

    • Features were normalized using the MinMaxScaler to bring all numerical values into a uniform range. This is especially important for neural networks, as it helps in convergence during training.
  4. Train-Test Split:

    • The dataset was divided into training and testing sets (80-20 split) to evaluate the model’s performance on unseen data.

Model Development

  1. Neural Network Architecture:

    • A feedforward neural network with three hidden layers was created. Each hidden layer used the ReLU activation function to introduce non-linearity into the model.
    • The output layer produced a single value representing the predicted fare.
  2. Loss Function and Optimizer:

    • The Mean Squared Error (MSE) loss function was utilized for regression tasks, while the Adam optimizer facilitated efficient training through adaptive learning rates.
  3. Training the Model:

    • The model was trained for 4,000 epochs, with progress monitored through the loss output every 100 epochs. This iterative process allowed the model to learn from the training data effectively.

Model Evaluation

  1. Prediction and Rescaling:

    • After training, predictions were made on both training and testing datasets. These predictions were then rescaled back to the original fare values for interpretability.
  2. Performance Metrics:

    • Various regression metrics were calculated to evaluate model performance:
      • Mean Absolute Error (MAE)
      • Mean Squared Error (MSE)
      • Root Mean Squared Error (RMSE)
      • R² score
      • Standard Deviation of Residuals
    • These metrics provided insights into how well the model performed on training and testing data.
  3. Visualizations:

    • Scatter plots were generated to compare actual fare values against predicted values, allowing for visual assessment of model accuracy.
    • A regression formula derived from the model’s weights and biases was displayed, providing an equation to understand how features contribute to fare predictions.

Insights and Future Work

  • The model successfully captured the relationships between the input features and airfare, as indicated by favorable R² scores and low error metrics.
  • The findings suggest that certain features, such as city popularity and market demand, significantly influence fare prices.
  • Future work may involve:
    • Incorporating additional features (e.g., seasonal factors or economic indicators).
    • Exploring more complex models, such as recurrent neural networks, to capture temporal patterns in airfare data.
    • Expanding the dataset to include more diverse routes and markets for improved generalization.

Conclusion

This project demonstrates the potential of machine learning in the airline industry for fare prediction. The developed model serves as a foundational tool that can be refined and expanded to support dynamic pricing strategies and enhance customer insights in airfare trends.

Built With

Share this project:

Updates