Did a very simple first pass today in Python. If you check the repo, you'll find
db.py which are the main script and small tag database.
If you run
python tag.py nextpocket, the script will:
- Pull all the project details from the Devpost API and concatenate it into 1 string.
- Returns all unique tags
This is the entire "extraction engine" and yes, it's just plain regex with a word boundary:
stags = ; for t in db: reg = t['phrase'] + r'\b'; if re.search(reg, text, re.IGNORECASE): stags.append(str(t['tag'])) stags = list(set(stags)) print "Suggested tags: \n" print stags
Is it a little too simplistic? Probably, but it works for the two or three cases I've tried so far. It'll need work as I build out the database.
And speaking of the database, you'll notice that I listed things like
Stack Overflow, and
SO as separate entries. I don't see a way around this. It'll be the same for things like
Frankly, I don't see why we can't do this online, in the project submission form. Seems like a slam dunk to me.