Wind energy prediction

Train/loss of Long Short-Term Memory Model on dataset 1
Train/loss of Long Short-Term Memory Model on dataset 2

What it does

We created a python notebook that cleans the data up and uses a simple keras deep learning model to predict the wind power 6.5hrs into the future. It requires the last 500mins of wind measurements and the forecasts from the 500m mins before the target time up to the target time.

i.e. if the time was 11:00, our model would predict the wind power at 17:30. To do so it would use the hourly forecasts between 09:10 (500 mins before 17:30) and 17:30. So that's the 9am forecast, the 10am forecast, etc. It also uses the last 500mins of wind measurements. So that's all the wind speed measurements since 02:40 (11am - 500 mins = 02:40am).

It only attempts to predict the first 6.5hrs of the dataset 3 data as to do more requires more wind measurements, which were not released for the challenge but would be available in a real scenario.

How we built it

Using pandas for data loading and cleaning. We normalised the data which meant that the differnce between the wind farm A and B became less important. We trained on the dataset 1 data and validated on the dataset 2 data. Normally we would split the two instead and fune tune on dataset 2 but the performance on trainign data was very similar to the performance on dataset 2, suggesting their isn't really much difference between the two wind farms as far as the model is concerned.

We interpolated missing data linearly, polynomial could also be tried.

We used a simple 3-layer fully connected NN. An LSTM or a TCN could also work but there wasn't large amounts of data so a simple network seemed most appropriate.

We have included the graph of the first 6.5 hrs of predictions. Our model doesn't predict further than 6.5hrs into the future and so cannpt process mroe of dataset3.