I wanted to learn more about big data and machine learning tools available and saw microsoft azure offer web services with wide range of vm during this hackerthon. NLP text classification on popular social network twitter data set.
What it does
The main goal for me in this hackerthon is to learn a broad range of web services/tools. Having the opportunity to play with different services offer by different company helps me understand the strength of weakness of the products
How I built it
For this hack, i went on all microsoft stack with Azure. First: i setup a few vm to see how the security,mobility, and setup works. Second: i loaded up the visual studio 2015 on the vm and local, as well as web express for my ide for this hackerthon Third: i selected a few webapp images from the gallery and try out different python frameworks(django, flask) Four: Begin dive into Azure ML and played around model training/testing/prediction with their drag and drop workbench. Visually appealing, but time consuming for each tweak since i am not familiar with the models Fifth: Finally, i want to integrate the ML model into webapp to demo on how to make prediction from a training model
Challenges I ran into
lots of setup challenges, and security issues.(installations, ssh keys,api keys) i find visual studio a bit less flexible in managing class/file systems, and also big installations Azure ML defintely was very confusing, so many models/transformations combinations. Many prebuilt models comes with optimized results, but not much descriptions for beginners to learn or apply them. I do like the free public datasets, and visualizations at each steps to help guide the setups.
Accomplishments that I'm proud of
intergrated much of the webapp with azure ML. Learn a lot about visual studio setups, sync, and file hierarchy
What I learned
python framework(djanago,flask) visual studio azure services
What's next for Microsoft Azure ML on Tweets
the model i use was reduced because the time it takes to run a prediction live would take too long for users. optimized rebuilt version was 7 minutes to test. my reduced model is at 10-15 seconds... so accuracy might not be at best