Independent Study Report: November 13, 1999

By Daniel Wilson

This week I solved the problem with recursion in Prolog. I also downloaded a large lexicon and translated into a Prolog program. Unfortunately, it was too large too compile about million entries. I also made some progress developing a grammar.

The recursion error stemmed from the fact that non-procedural Prolog programs are compiled into very procedural machine code. The fragment of code I will use as an example would parse a phrase like "the executable files". Logically, there is no difference between line 1 and line 2 in Figure 1. However, procedurally there is a big difference. In line 1 the program picks a determiner, then looks for a noun phrase. If it tried a noun first, that could come out all right. But if it recurs immediately, it goes into infinite recursion. In line 2, Prolog first breaks the phrase apart, then attempts to find each part in the respective part-of-speech list. The possible ways for append to break it are: ([],[the, executable, files]), ([the],[executable, files]), ([the, executable], [files]), ([the, executable, files], []). After breaking up the list of words one way, the program checks to see if the first part of the list is a determiner, then to see if the second part of the list is a noun phrase.

COMMON: np(X) :-n(X).

That is, X is an np if X is an n.

LINE 1: np(X) :- det(A),np(B),append(A,B,X).

That is, X is an np if there exists a det A and an np B such that B appends to A to form X

LINE 2: np(X) :-append(A,B,X), det(A),np(B).

That is, X is an np if there exist an A and a B such that B appends to A to form X and A is a det and B is an np.

Figure 1.

I downloaded a large lexicon, but it was too large for Gnu Prolog to compile. I have a few options. Probably I will just create a lexicon of a few dozen words by hand. That would be sufficient for demonstration purposes. In the future I could add words as necessary. Another option would be to try to locate a narrower lexicon and translate it into Prolog. A third option would be to find a different way to store the words. A large database might be in order. The C++ module could query the DB with the particular words used in the command. Then it could write the results of the query as Prolog facts and query the Prolog part of the program. A fourth option is to write the entire project in C++, again using a database. That would require me to write all of the search and backtrack stuff which Prolog provides in C++. It is not an attractive option. A fifth option is to get a large body of text, a corpus, -- perhaps issues of PC Magazine -- and filter all words from the lexicon which do not appear in that corpus. That would be a little difficult, but could work.

The problem is that million entries is way too large a lexicon to implement as a Prolog program. I will find a way to circumvent the problem this next week. Then I will push on to the semantic processing. By the end of the month, I should have it all wrapped up in a C++ module.