I was inspired by this project because it was actually a research direction that was never entirely pursued during my undergraduate research experience in the machine learning group at the Summer Undergraduate Math Research at Yale (SUMRY) program. I wanted to verify whether it has merit. I believed that cranking it out over a hackathon would be a good way to finally tackle the experiment.
What it does
Dig2vec is an embedding method that transforms directed graphs (or networks) into numerical representations by exploiting node2vec with regards to the hidden in-degree edges.
How we built it
In it's entirety, I planned to use Modal to host a Flask server while I take text input from a user on a website, clean any stray non-alphanumeric characters using OpenAI's Edit API endpoint which is still in alpha, then split the sentences of that corpus of text by any remaining punctuation with a regex, turn those sentences into a directed graph by considering n-grams of each sentence using networkx, performing node2vec traversal over the graph, reversing its edges, performing another node2vec traversal over the reversed graph, aggregate that information into a 2D visualization, extract keywords and key phrases, perform clustering on that 2D plot using InterSystems IntegratedML, and perform diagnostics.
However, I was not able to perform clustering on that 2D plot using InterSystems IntegratedML, perform diagnostics, nor perform much frontend work.
Challenges we ran into
Challenges that I ran into involved finding a team, managing packages within Modal, editing code using OpenAI's Edit API endpoint when there wasn't any documentation that I could find in Python.
Accomplishments that we're proud of
I am proud of having gone through a good chunk of the work of this experiment, towards possibly a result worth sharing.
What we learned
I learned a lot about software tools used in industry as well as the startup attitude. In optimizing for the challenge prizes, I was challenged by the hackathon to cast as wide of a net for my project concept. This, in turn, challenged me to strategize. I made a plan of action steps for which tools, API's, credits that I would need and could incorporate in order to ultimately fulfill my idea.
What's next for dig2vec: From node2vec and Back
As for future work, I am interested in performing diagnostics on the faithfulness of the representation of the 2D visualization. Is it precise and accurate? Does it perform better than simply running node2vec? We know as recent as June 16, 2022 that researchers released a paper titled "On the Surprising Behaviour of node2vec" demonstrating that node2vec had poor quality and was unstable. Moreover, what is the difference in performance between spatial methods that incorporate random walks across the graph like node2vec versus spectral methods like Fanuel et. al.'s "Magnetic Eigenmaps for the Visualization of Directed Networks"? Can we improve the performance of the random walk strategy of node2vec on directed graphs by incorporating ShortWalk?