Christ University Institutional Repository

Classification and Retrieval of Research Classification and Retrieval of Research Papers: A Semantic Hierarchical Approach

Mirza Nazura, Abdulkarim (2010) Classification and Retrieval of Research Classification and Retrieval of Research Papers: A Semantic Hierarchical Approach. Other thesis, Christ University.

[img]PDF
Restricted to Registered users only

6Mb

Abstract

"Classification and Retrieval of Research papers: A Semantic Hierarchical Approach" demonstrates an effective and efficient technique for classification of Research documents pertaining to Computer Science. The explosion in the number of documents and research publications in electronic form and the need to perform a semantic search for their retrieval has been the incentive for this research. The popularity and the widespread use of electronic documents and publications, has necessitated the development of an efficient document archival and retrieval mechanism. Categorizing journal papers by assigning them relevant and meaningful classes, predicting the latent concept or the topic of research, based on the relevant terms and assigning the appropriate Classification labels is the objective of this thesis. This thesis takes a semantic approach and applies the text mining techniques in a hierarchical manner in order to classify the documents. The use of a lexicon containing domain specific terms (DSL) adds a semantic dimension to classification and document retrieval. The Concept Prediction based on Term Relevance (CPTR) technique demonstrates a semantic model for assigning concepts or topics to papers. This Thesis proposes a conceptual framework for organizing and classifying the research papers pertaining to Computer Science. The efficacy of the proposed concepts is demonstrated with the help of Classification experiments. Classification experiments reveal that the DSL technique of training works efficiently when categorization is based on keywords. The CPTR technique, on the other hand, shows very high accuracy; when the classification is based on the contents of the document. Both these techniques lend a semantic dimension to classification. Narrowing down the scope of search at each level of hierarchy enables time efficient retrieval and access of the goal documents. The hierarchical interface for Document Retrieval enables retrieval of the target documents by gradually restricting the scope of search at each level of hierarchy This work comprises of two main components. 1. The Framework for Hierarchical Classification. 2. The Hierarchical Interface for Document Retrieval. Two distinct techniques for classification are proposed in this thesis. These include 1. The use of Domain Specific Lexicon (DSL) which is comparable to a Domain Specific Ontology. 2. The Concept Prediction Based on Term Relevance (CPTR) technique These techniques lend a semantic dimension to classification. Keywords: Text Mining, Classification, Document Retrieval, Hierarchical, Domain specific lexicon (DSL), Probabilistic Latent Semantic Analysis (PLSA) , Concept Prediction based on Term frequency (CPTR)

Item Type:Thesis (Other)
Subjects:Thesis > MPhil > Computer Science
ID Code:5962
Deposited By:Knowledge Center Christ University
Deposited On:29 Jan 2014 20:05
Last Modified:29 Jan 2014 20:05

Repository Staff Only: item control page