Data grabbed from the BBC
The Google's research of text summarization with TensorFlow, we try to use the dataset from BBC news and train our own network to do the summarization job!
What it does
It actually receives a news report from the news website and gives you some description of the whole report.
How we built it
We tried Google's architecture in 2 days and we replace the dataset part of the BBC news, we grab the BBC news description and full-text from the Internet by a python script and we got a lot of datasets from the website. Also, we pre-treated the dataset with Matlab scripts and finally sent them into our RNN nets.
Challenges we ran into
@Bad decode output, still not fixed
@How to improve the training efficiency of the neural network
@Starting the project so late that time is very tight
@HTTP port of AWS is not available
@Need to write crawler to collect data as a consequence of no available data set, result in a small number of data for training
Accomplishments that we're proud of
This is the first time for the whole team to actually train a network and make it work with our own dataset from the network! We found it very interesting not only teaching the computer to learn something but also learning the RNN for our own!
What we learned
@The sequence of sequence learning model
@Tensorflow and tensorboard
@Web crawler for BBC news
@ROUGE evaluation for summary
What's next for BBCNewsSummarize
The dataset is still too small and we can't train the net for too long because the time is limited.