# nlag2

*Natural Language ASG Generator (Version 2)*

### About

This project automatically generates answer set grammars for encoding natural language.

In brief, the process is as follows:

- The text corpus is tokenized and split into sentences. Each token is assigned a Part-of-Speech (POS) tag and a lemma. We use Stanford's CoreNLP for this step.
- A simplified head-driven phrase structure grammar (HPSG) is generated. This step uses Zhou et Zhao's approach.
- Working backwards from the HPSG, a rule-based approach, following the dependency structure of the text is used to generate production rules for an ASG.

#### Motivation

In brief, this work is motivated by two main factors:

- Answer set grammars are context-sensitive grammars which can encode both the syntax (CFG part) and the semantics (ASP annotations) of a language. In natural language, this is a powerful formalism which encodes both a structural
*and*a logical form for text, which we believe will be useful for natural language understanding tasks. - Answer set grammar induction can be translated into an ILASP task, allowing one to
*learn*the semantic constraints of a grammar, given its syntax. In a future project, we aim to learn some semantic constraints of natural language, and apply this, for example, to difficult coreference problems which require commonsense reasoning.

#### Example

For the text "The fish ate the worm. It was hungry", we generate the ASG:

```
start -> S {
event(E, S, O) :- event(E, S, O)@1.
event(S, M) :- modifier(S, M)@1.
valid :- event(eat, fish, worm)@1.
valid :- modifier(it, hungry)@1.
:- not valid.
}
S -> NP VP . {
event(E, S, O) :- nominal(S)@1, partial_event(E, O).
modifier(S, M) :- nominal(S)@1, partial_modifier(M).
valid :- nominal(fish)@1, partial_event(eat, worm)@2.
valid :- nominal(it)@1, partial_modifier(hungry)@2.
:- not valid.
}
NP -> DT NN {
nominal(N) :- lemma(N)@2.
valid :- lemma(the)@1, lemma(fish)@2.
valid :- lemma(the)@1, lemma(worm)@2.
:- not valid.
}
VP -> VBD NP {
partial_event(E, N) :- lemma(E)@1, nominal(N)@2.
valid :- lemma(eat)@1, nominal(worm)@2.
:- not valid.
}
NP -> PRP {
nominal(N) :- lemma(N)@1.
valid :- lemma(it)@1.
:- not valid.
}
VP -> VBD ADJP {
partial_modifier(M) :- lemma(M)@2.
valid :- lemma(be)@1, lemma(hungry)@2.
:- not valid.
}
ADJP -> JJ {
lemma(X) :- lemma(X)@1.
valid :- lemma(hungry)@1.
:- not valid.
}
DT -> "The" { lemma(the). }
NN -> "fish" { lemma(fish). }
VBD -> "ate" { lemma(eat). }
DT -> "the" { lemma(the). }
NN -> "worm" { lemma(worm). }
PRP -> "It" { lemma(it). }
VBD -> "was" { lemma(be). }
JJ -> "hungry" { lemma(hungry). }
. -> "."
```

### Setup

- Recursively clone.
- Setup the two submodules.
- Download the models and set the model locations.
- Setup the pipenv environment.
- Choose your text and run the driver program.

#### Development

PyCharm is recommended. To speed up development, leave the CoreNLP server running (this avoids having to spin it up every time you need to test something).

```
java -mx4g -cp "lib/CoreNLP/*:lib/CoreNLP/lib/*:lib/CoreNLP/liblocal/*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -annotators "tokenize,ssplit,pos,lemma,ner" -port 9000 -timeout 10000
```

Log inorsign up for Devpostto join the conversation.