Improved techniques are disclosed for permitting a user to employ more human-based grammar (i.e.,
free form or conversational input) while addressing a target
system via a voice
system. For example, a technique for determining intent associated with a spoken
utterance of a user comprises the following steps / operations. Decoded speech uttered by the user is obtained. An intent is then extracted from the decoded speech uttered by the user. The intent is extracted in an iterative manner such that a
first class is determined after a first iteration and a sub-class of the
first class is determined after a second iteration. The
first class and the sub-class of the first class are hierarchically indicative of the intent of the user, e.g., a target and data that may be associated with the target. The multi-stage intent extraction approach may have more than two iterations. By way of example only, the
user intent extracting step may further determine a sub-class of the sub-class of the first class after a third iteration, such that the first class, the sub-class of the first class, and the sub-class of the sub-class of the first class are hierarchically indicative of the intent of the user.