Thoughts on implementation of Phase 2

I have had this maddening, nagging feeling that I have done this before. I feel like I should know how to parse a "sentence" according to a "grammar"; like this is just compiler class all over again. And yet itís not.

The difference springs from the ideas that: this grammar is far more complex than that of Floyd or C++; words in this grammar may perform functions of many parts of speech; that the lexicon of this grammar may not easily be encoded in a lex file; that any such lex file would not really simplify the remaining computations. And so, the parsing of the English discourse must be performed by an intelligence. Thus, we use PROLOG.

However, after the sentences are parsed, they must be understood. We move from syntax to semantics. In compiler construction, the "front end" is built automatically, but we hand-code the "back end". But we need the front end to give us a structure on which we can work. Yacc kindly builds us a tree. We then iterate through that tree assigning meanings to the parts of the grammar. A for loop is always encoded in one of a few ways. Another basic statement, such as a call is encoded in another well-defined way.

But if lex cannot parse this language and feed the data to yacc to be encoded in a parse tree, how are we to proceed? One answer would be to make the PROLOG program build the parse tree through which we iterate. That would probably be the best for this specialized application, but it requires that I enter into the bowels of PROLOG programming and learn to build a tree. I donít believe Dr. Dougherty covers PROLOG in that depth, and Iím not sure the language supports dealing with these complex structures. A somewhat more pragmatic alternative it to have the PROLOG program parse the English sentences and output them in a format which lex and yacc can then process.

Lex and yacc should be able to process sentences in the format Dr. Dougherty suggests. That format would leave the simple sentence, "Girls put books in houses", expanded thus: (s (np (n girls)) (vp (v put) (np (n books)) (pp ( in) (np (n houses)) ))). I would need to do some serious work to see if lex and yacc can process this kind of "grammar", but at least the hardest work would have been done already.

Another point this brings up, however, is that I really donít care about girls putting books in houses. I care about instructions to the computer. That is to say, I am interested in a narrower, and different subset of the English language than Dr. Dougherty is. He states simply that his work is restricted to declarative and interrogative sentences (173). I am interested primarily in imperative sentences, though I need to process declarations and questions as well. One may need to give information before issuing a command, or one may express a command as a request for information.

A third approach to the problem of parsing a sentence in PROLOG and putting it in a tree structure is to simply convert the tree structure that the PROLOG program indicates by its output. An open parenthesis indicates the beginning of a level, and a closing parenthesis indicates its end. Writing a simple converter in C++ would be fairly simple. That should permit me to input English into the PROLOG parser and receive a tree structure through which I may easily iterate.

Following this approach, I should be able to use Dr. Doughertyís methods to parse English sentences and be ready to write a "back end" to make sense of the output.