Used sources

Causes

  • weather
  • airspace
  • strikes
  • environmental
  • airport closure
  • national
  • large events
  • political conflicts
  • military operations
  • other

Used languages

  • Python
  • Go
  • JS (Node, React & all that jazz)

Technology

The collected text is tokenized and then classifed using the bag-of-words approach. A random forest classifier is trained to pick messages that have a high probability of causing flight delays. Ground-truth labels for training the ensemble were manually collected during the event using custom tooling. In total the classifier used about 2000 training samples and we believe that better accuracy could be achieved with a larger dataset. A nice property of random forests is their interpretability. We can for example ask the classifier which words have the highest impact on the results. In our case the top-10 words are

  • eruption
  • winds
  • storm
  • life-threatening
  • have
  • homes
  • imminently
  • people
  • smoke
  • prepares
Share this project:

Updates

posted an update

Technology

  • The collected text is tokenized and then classifed using the bag-of-words approach. A random forest classifier is trained to pick messages that have a high probability of causing flight delays. Ground-truth labels for training the ensemble were manually collected during the event using custom tooling. In total the classifier used about 2000 training samples and we believe that better accuracy could be achieved with a larger dataset.
  • A nice property of random forests is their interpretability. We can for example ask the classifier which words have the highest impact on the results. In our case the top-10 words are
    • eruption
    • winds
    • storm
    • life-threatening
    • have
    • homes
    • imminently
    • people
    • smoke
    • prepares

Log in or sign up for Devpost to join the conversation.