Inspiration
We were inspired by using GAN as a way of generating similar images for computer vision. We thought that we might also be able to generate data similar to the original data for data augmentation.
What it does
We used our GAN network to generate 3000 synthesis data and concatenated them with the original 1000 training data. With those 4000 data, we fitted our logistic regression and support vector machine. The accuracy for SVM increased from 0.61 to 0.628, and the accuracy for logistic regression dropped by 0.02.
How we built it
We built it with GAN consisting of a fully connected generator taking a noise of length 100 to output with a length of 12 which corresponds to the original shape of the training data. It also has a discriminator with an input of 12 and an output of 1. Then we trained them separately to ensure that the model would not collapse and that they learned something during each epoch.
Challenges we ran into
Training a GAN was tricky because we tried to train the two networks simultaneously, but the generator was not learning anything. So we tried to increase the complexity or depth of the two nets. However, the result was still not good enough to be representative of the population. We figured out that maybe the discriminator was too good so that the generator would not learn anything because it starts to output the same result every epoch. So we shrunk down the size of the discriminator and started to train them separately. This was a good approach because we started to generate similar data from the random noise. And after training them carefully, we observed that the D(x) and D(G(z)) started to converge to 0.5. Then we stopped the training and started to fit the logistic regression and SVM.
Accomplishments that we're proud of
We are proud that our accuracy went up by a little bit, although it was not too much improvement. However, it showed that this approach of bootstrapping was viable and could be improved in the future. At one time during training, we were frustrated with the lowered accuracy, we questioned ourselves and our approach. But gladly, we kept training the GAN and got a fairly good result.
What we learned
We learned that during the training of GAN, it would be better to train the two nets separately. This would be much more flexible than training them simultaneously because we can train different epochs of the generator and discriminator, which ensures the optimized performance of both nets.
What's next for Using GAN as a way of bootstrapping
For the next step, we think it would be a good idea to experiment with the different architectures of GAN, trying out different architectures can help us find the best solution to this kind of problem. It would also be helpful if we could test out more traditional statistical models on different datasets, for example, we can use linear regression for the dataset that requires regression analysis. So we can see that would different model behave differently using our way of bootstrapping.
Built With
- python
- pytorch
- scikit-learn
- singlestore
Log in or sign up for Devpost to join the conversation.