A computer system and methods are disclosed for automatically discovering topics and building a hierarchical topic structure, and for tagging and categorizing contents in a document or other natural language contents. The disclosed methods include steps for obtaining terms that best represent the topics in a text content, and building a hierarchical representation of topics of different levels or topic-comment relationships, and folder-subfolder structures. The methods further include obtaining, identifying, and selecting terms representing different degrees of informational importance based on the grammatical roles, parts of speech, and semantic attributes associated with the terms, using the terms to represent topics in the document, to automatically tag the document, to rank search results, and to build a category structure based on the selected terms.