Analysing Key Phrases in Sentences

In my last post I talked about word sense disambiguation, which is the process of determining which grammatical tag should be assigned to words in a sentence given their relative positions/context.

During this update I hope to give more details about my project, what's going on and how it's progressing.

Analysing sentences

The first task is to take a sentence and break it down into it's component parts: words. Each word is taken one by one and assigned one of the following grammar tags: "noun", "verb", "adjective", "adverb", "pronoun", "determiner", "particle", "preposition", "conjunction", "interjection", "classifier".

Tags are assigned on a first sweep using a probablistic part-of-speech tagger and then are disambiguated using the methods discussed in [1].

Initially the plan was to implement a brill tagger for part of speech tagging and work on this had started when, after doing some additional research, it was found that a class NSLinguisticTagger exists in Objective-C that does the job of part-of-speech tagging. This made implementing [1] far simpler.

Breaking a sentence into Noun Phrases

Once linguistic tagging has been completed the sentence must be broken up into noun phrases.

What is a noun phrase?

A noun phrase (NP) is a phrase which has a noun (or indefinite pronoun) as its head word, or which performs the same grammatical function as such a phrase[2].

Why are they important?

Noun phrases are generally the subjects or objects [of verbs] in a sentence. Therefore, it is the noun phrases that are used to track the subject of a conversation in speech, as well as being used to identify the topic of conversation.

Noun phrases are the key to following the context of a conversation between discrete sentences and messages.

Known issues

Due to the time constraints an assuption was made that all words that are not verbs in a sentence are parts of a NP. This is an over-simplification as there also exists verb phrases. However, it should be a close enough approximation to continue with the project.

Proper nouns or pronouns are used for the head word in each noun phrase and for the most part are the word people would call the "subject" of a sentence.

classifying the subject of a sentence

I have now reached the point whereby the task is to classify correctly the subject of a sentence. These will be the noun phrase(s) that are "verbing". In order to achieve this I plan to try both a neural network and a heuristic algorithm.

My next blog post shall be on classifying noun phrases and identifying the subject of a sentence.


[1] Roberts, P.J.; Mitchell, R.; Ruiz, V., "Using triangulation to identify word senses," Cybernetic Intelligent Systems (CIS), 2010 IEEE 9th International Conference on , vol., no., pp.1,6, 1-2 Sept. 2010. (Available online at

[2] For definitions and discussions of the noun phrase that point to the presence of a head noun, see for instance Crystal (1997:264), Lockwood (2002:3), and Radford (2004: 14, 348).