posted an update

Did a very simple first pass today in Python. If you check the repo, you'll find tag.py and db.py which are the main script and small tag database.

If you run python tag.py nextpocket, the script will:

  1. Pull all the project details from the Devpost API and concatenate it into 1 string.
  2. Loop through every phrase in the database (e.g. Pocket, javascript, js, etc.) and run a regex search for it in the project details.
  3. For every match, pull the "cleaned", matching tag (JavaScript, instead of js or javascript)
  4. Returns all unique tags

This is the entire "extraction engine" and yes, it's just plain regex with a word boundary:

stags = [];
for t in db:
  reg = t['phrase'] + r'\b';
  if re.search(reg, text, re.IGNORECASE):
    stags.append(str(t['tag']))
stags = list(set(stags))

print "Suggested tags: \n"
print stags

Is it a little too simplistic? Probably, but it works for the two or three cases I've tried so far. It'll need work as I build out the database.

And speaking of the database, you'll notice that I listed things like StackOverflow, Stack Overflow, and SO as separate entries. I don't see a way around this. It'll be the same for things like node.js & nodejs & node.

Frankly, I don't see why we can't do this online, in the project submission form. Seems like a slam dunk to me.

Log in or sign up for Devpost to join the conversation.