Inspiration

I was inspired by the innovative tools that I was learning about at the most recent AWS Summit here in New York (Checkout the presentations on SlideShare and Youtube if you haven't already.)

I knew that I wanted to use AWS Lambda to run "Event-Driven Code in the Cloud" as they put it, and when they showed us how simple it was to create an HTTP endpoint for your Lambda functions the first thing that sprung to mind was a scraper that would take Devpost pages and return JSON.

How it works

It's a little quirky to work with since you'll need to make POST requests in order to retrieve your data as JSON, since the new API Gateway/Lambda combination doesn't seem to have anyway to support URL parameters yet. Here is the example CURL command I would execute to scrape my portfolio:

$ curl -X POST -H "Content-Type: application/json" -d '{"screen_name":"MGerrior"}' https://iii3mdppm7.execute-api.us-east-1.amazonaws.com/prod/UserPortfolioEndpoint

Obviously you will want to replace my screen name with your screen name to obtain data related to you. It's not built as a crawler yet, so it will only return information that is visible when you first view your portfolio on Devpost.

The API uses Amazon's new API Gateway that provides built in caching and throttling out of the box. The API endpoint is the hooked up to a Lambda function that is executed when the request is received. The Lambda function then scrapes the page, formats it as a JSON response, and returns it to the API Gateway which subsequently returns it to you.

Challenges I ran into

One of the challenges that I ran into was finding resources for learning how to work with these tools. They're both fairly new services (API Gateway was just announced at the summit a week ago), so finding resources online outside of the official docs was not easy. I additionally ran into some issues with the Grunt tasks I was using from grunt-aws-lambda since it was producing empty zip files for me to upload to Lambda, but I eventually just zipped the files myself.

Accomplishments that I'm proud of

I'm pretty proud that I was able to get this up and running in just a few hours as part of Dev Thursday here at Devpost.

What I learned

I learned that I hate writing Java. As of right now AWS Lambda mainly supports two runtime environments which are node.js and Java. I couldn't figure out how to get NPM modules loaded in Lambda (Save yourself some time, just read this) so I decide I would just use Java instead and take advantage of all the built-in libraries, or so I thought. After struggling with Eclipse, and then struggling to install the AWS SDK for Eclipse (something something Data Tools Platform?), I switched back to node.js and bundled the npm modules as a ZIP file thanks to the guide above.

What's next for Unofficial Devpost API

I want to figure out how to add CORS headers to the responses from the API Gateway so that developers on Devpost can integrate this data into third party sites such as their portfolio by doing something as simple as:

$.post("https://iii3mdppm7.execute-api.us-east-1.amazonaws.com/prod/UserPortfolioEndpoint", {screen_name: "MGerrior"}, function(data) {
  // Add your projects to the DOM
});

If you try that right now the request will fail for security purposes since the Access-Control-Allow-Origin header is not set in the response.

Also, it wouldn't hurt for this to have more endpoints, or to just build an actual API for Devpost.

Built With

+ 2 more
Share this project:
×

Updates

Matthew Gerrior posted an update

Requests to the UserPortfolioEndpoint ... endpoint ... now have CORS headers so that you can $.getJSON("https://iii3mdppm7.execute-api.us-east-1.amazonaws.com/prod/UserPortfolioEndpoint/MGerrior") from any page on the web!

Log in or sign up for Devpost to join the conversation.

Matthew Gerrior posted an update

Like most AWS products it took wading through the complex UI for quite some time but I finally figured out how to accept URL parameters and pass those on to the underlying lambda function that is powering the API endpoint. What does this mean for the API? No more POST requests just to get a resource! Both endpoints are set up to accept usernames and project slugs in the URL like so:

$ curl https://iii3mdppm7.execute-api.us-east-1.amazonaws.com/prod/UserPortfolioEndpoint/MGerrior
$ curl https://iii3mdppm7.execute-api.us-east-1.amazonaws.com/prod/ProjectEndpoint/unofficial-challengepost-api

Log in or sign up for Devpost to join the conversation.

Matthew Gerrior posted an update

Looks like the new UserPortfolioEndpoint is slightly broken after the rebrand and markup changes, but such is the life of a scraper. Will update this as soon as possible, so far the only thing that seems affected is project titles on the user portfolio.

Log in or sign up for Devpost to join the conversation.

Matthew Gerrior posted an update

Developers, start your linters. @nealrs kindly pointed out to me that the Unofficial ChallengePost API wasn't actually returning valid JSON, and was instead returning a string full of escaped JSON, which wasn't very useful. I guess that's what I get for staring at JSON responses in a terminal and thinking that they look close enough to JSON for me, instead of trying to actually parse the responses. Either way, I've updated both endpoints to return valid JSON, at least, valid according to JSONLint.

What does this mean if you've already integrated with this API? Absolutely nothing, unless you wrote some sort of hackish work-around to make up for the malformed JSON this was returning. In that case, you can just rip all that out and work with real JSON now.

In case anyone is wondering, the issue was caused by me returning JSON.stringify(object) from the lambda function. Apparently Lambda is smart enough to stringify whatever you pass it, so you should just pass an actual javascript object to the success function.

Log in or sign up for Devpost to join the conversation.

Matthew Gerrior posted an update

Are you ever on ChallengePost thinking to yourself, "I'd love to learn more about this project, but JSON is my native language and I just can't find the structured data I need that is strewn all about this page?" Well I've got news for you! I've added a new endpoint to the Unofficial ChallengePost API that returns project details in JSON format. Here's the CURL command you can use to test it out for this project (I know, so meta):

curl -X POST -H "Content-Type: application/json" -d '{"slug":"unofficial-challengepost-api"}' https://iii3mdppm7.execute-api.us-east-1.amazonaws.com/prod/ProjectEndpoint

Let me know if there are any bugs, since I tried banging this out in like two hours.

Log in or sign up for Devpost to join the conversation.