Synopsis
We wanted to make a website that categorized high school student opportunities from various NYC websites using web scraping. To increase the scalability and make full use of the cloud we created a more complex backend. We did not finish.
Inspiration
Kyle has a CS internship interview on Monday that he found from one of the city backed technology organizations that he only found, he went to an in-person hackathon where a person knew and told him because they probably knew from a person that told them. We wanted to fix that problem, that many smaller student opportunities fall into obscurity, sometimes on purpose, on the cities' hundreds/thousands of website pages. Our goal was to highlight student opportunities that wouldn't be found on the internet organically otherwise. Websites such as which only have one relevant opportunity obscurely at the bottom.
What it was supposed to do
Our intentions for our MVP were to a simply website that had tagged categories that one could visit using links that would work with the flask backend to query our database for information. The process would be that you would find our site, be interested in what was there, want to look for a specific category, click the link, and view the opportunities. We created a web scraping/organizing algorithm by manually for a specific webpage to turn it into JSON since we couldn't figure out working the database with the backend. We wanted to put the JSON file in the apache + flask webserver so it didn't need to communicate with our server.
How We Tried to Build It
Our dream plan on jamboard. Our MVP plan, which we did not accomplish, is a significantly cut down version where we just have a front and backend where the data is put into a JSON file. The web scraping and parse algo that Sahib wrote is perfectly fine, though we only did manage to write it for one website, NYC's Student Opportunities and it doesn't collect all the information and organize it properly yet.
{
"NYCDOE Minecraft Net Zero Challenge": {
"Deadline": "May 27, 2022",
"Event": "April 11-May 27, 202"
},
"Acing Your Resume and Interview Questions": {
"Deadline": "June 7, 2022",
"Event": "June 7, 2022"
},
"YVote ChangeMakers Summer Institute\u00a0": {
"Deadline": "June 6, 2022",
"Event": "July 12-Aug 16, 2022"
},
"Literacy Unbound Summer Institute": {
"Deadline": "June 3, 2022",
"Event": "July 18 - 30, 2022"
},
"First Tech Fund 2022-2023 Fellowship": {
"Deadline": "June 1, 2022",
"Event": "Ongoing"
},
"See MCC Theater's Production of Soft": {
"Deadline": "June 19, 2022",
"Event": "May 12-June 19, 2022"
},
"ONE Lab Summer Studio 2022": {
"Deadline": "Ongoing ",
"Event": "June 20-July 28, 202"
},
"Next Gen Democracy Camp": {
"Deadline": "May 27, 2022",
"Event": "June 27- July 1, 202"
}
}
Challenges we ran into
The front-end development for the website had some issues and was we didn't get to the subpage development or making it work with flask. The python web scraping was just annoying because there were things like zero-width characters and fact that the links didn't scrape properly. The backend development was slow and minimal. The MySQL database was new to me so I had to set it up along with downloading the MySQL workbench and figuring out how to use it. I also didn't know how to insert JSON files into the DB. The Ubuntu Server was by default managed by CLI, so I had issues generally doing things. I had to figure out Apache and Flask. Independently they worked fine. Together there were issues. I also didn't get to making flask work at all to actually do the purpose we wanted.
Accomplishments that we're proud of
Front-End(Nihal): Working with other people, collaborating, and making a site that involving flask.
Back-end(Kyle): Getting Apache, Flask, and the Database to work(caveat being separately).
Python Web Scraping(Sahib and Shahriar):
What we learned
Front-End: Collaborative skills and making a webapp with a database.
Back-End: Learned about how the backends of websites work. Learned how complicated it is, and how easy it was to spin up servers, configuring them is the hard part. Thank you digitalocean.
Python Web Scraping: Learned about the basics of web scraping and regex, with the workshop being a very helpful resource. Also learned about JSON files and how to read and write to them
What's next for SummerInTech
Next steps would be making our plan actually work. With more time and skill I think that this project is finishable.
Log in or sign up for Devpost to join the conversation.