Inspiration
MLH-LHD
What it does
It uses Natural Language processing, to determining whether the website belongs to Kali Linux?!
How we built it
NLP & Python
import nltk
from nltk.corpus import stopwords
import urllib.request
from bs4 import BeautifulSoup
response = urllib.request.urlopen('https://www.kali.org/blog/')
html = response.read()
soup = BeautifulSoup(html,'html5lib')
text = soup.get_text(strip = True)
tokens = [t for t in text.split()]
print(tokens)
sr= stopwords.words('english')
clean_tokens = tokens[:]
for token in tokens:
if token in stopwords.words('english'):
clean_tokens.remove(token)
freq = nltk.FreqDist(clean_tokens)
for key,val in freq.items():
print(str(key) + ':' + str(val))
freq.plot(20, cumulative=False)
What we learned
Natural Language Processing
Log in or sign up for Devpost to join the conversation.