nlag2

Natural Language ASG Generator (Version 2)

About

This project automatically generates answer set grammars for encoding natural language.

In brief, the process is as follows:

The text corpus is tokenized and split into sentences. Each token is assigned a Part-of-Speech (POS) tag and a lemma. We use Stanford's CoreNLP for this step.
A simplified head-driven phrase structure grammar (HPSG) is generated. This step uses Zhou et Zhao's approach.
Working backwards from the HPSG, a rule-based approach, following the dependency structure of the text is used to generate production rules for an ASG.

Motivation

In brief, this work is motivated by two main factors:

Answer set grammars are context-sensitive grammars which can encode both the syntax (CFG part) and the semantics (ASP annotations) of a language. In natural language, this is a powerful formalism which encodes both a structural and a logical form for text, which we believe will be useful for natural language understanding tasks.
Answer set grammar induction can be translated into an ILASP task, allowing one to learn the semantic constraints of a grammar, given its syntax. In a future project, we aim to learn some semantic constraints of natural language, and apply this, for example, to difficult coreference problems which require commonsense reasoning.

Example

For the text "The fish ate the worm. It was hungry", we generate the ASG:

start -> S {
  event(E, S, O) :- event(E, S, O)@1.
  event(S, M) :- modifier(S, M)@1.
  valid :- event(eat, fish, worm)@1.
  valid :- modifier(it, hungry)@1.
  :- not valid.
}
S -> NP VP . {
  event(E, S, O) :- nominal(S)@1, partial_event(E, O).
  modifier(S, M) :- nominal(S)@1, partial_modifier(M).
  valid :- nominal(fish)@1, partial_event(eat, worm)@2.
  valid :- nominal(it)@1, partial_modifier(hungry)@2.
  :- not valid.
}
NP -> DT NN {
  nominal(N) :- lemma(N)@2.
  valid :- lemma(the)@1, lemma(fish)@2.
  valid :- lemma(the)@1, lemma(worm)@2.
  :- not valid.
}
VP -> VBD NP {
  partial_event(E, N) :- lemma(E)@1, nominal(N)@2.
  valid :- lemma(eat)@1, nominal(worm)@2.
  :- not valid.
}
NP -> PRP {
  nominal(N) :- lemma(N)@1.
  valid :- lemma(it)@1.
  :- not valid.
}
VP -> VBD ADJP {
  partial_modifier(M) :- lemma(M)@2.
  valid :- lemma(be)@1, lemma(hungry)@2.
  :- not valid.
}
ADJP -> JJ {
  lemma(X) :- lemma(X)@1.
  valid :- lemma(hungry)@1.
  :- not valid.
}

DT -> "The" { lemma(the). }
NN -> "fish" { lemma(fish). }
VBD -> "ate" { lemma(eat). }
DT -> "the" { lemma(the). }
NN -> "worm" { lemma(worm). }
PRP -> "It" { lemma(it). }
VBD -> "was" { lemma(be). }
JJ -> "hungry" { lemma(hungry). }
. -> "."

Setup

Recursively clone.
Setup the two submodules.
Download the models and set the model locations.
Setup the pipenv environment.
Choose your text and run the driver program.

Development

PyCharm is recommended. To speed up development, leave the CoreNLP server running (this avoids having to spin it up every time you need to test something).

java -mx4g -cp "lib/CoreNLP/*:lib/CoreNLP/lib/*:lib/CoreNLP/liblocal/*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -annotators "tokenize,ssplit,pos,lemma,ner" -port 9000 -timeout 10000

Built With

python

Updates

Jordan Spooner started this project — Jan 19, 2020 06:41 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.