-
Essay / Ontology Case Study - 1728
Text document clustering is the technique used to group the document with reference to similarity. It is widely used in the digital library environment. Ontology plays an increasingly important role in knowledge management and the semantic web. For the document bringing together two types of classification approaches1) Supervised: In the supervised classification method, a set of predefined classes is given.2) Unsupervised: In the unsupervised classification methods, a set of predefined classes is not given. is not provided. This is also known as clustering. Classification of approaches: 1) Text-based: Text-based depends on the content of the document. 2) Link-based: Link-based depends on the link structure of the pages. 3) Hybrid: Hybrid depends on the content and the link. In [1] for document clustering, they discussed a multi-viewpoint based similarity measurement (MVS) method. In this method, the similarity between texts is checked from multiple views. The similarity between two documents di and djin inside the cluster Seen from a point dh which is outside this cluster is measured by the product of the cosine of the angle between the documents di and dj looking from dh and the Euclidean distance from dh to these two documents: MVS(di, dj│di, djɛ Sr)= 1/n-nr∑( di- dj)t ( dj- dh) dh ɛ SSr= 1/n-nr ∑ cos( di – dh , dj - dh )││di – dh ││ ││dj - dh││The two criteria functions are proposed for document clustering. • Internal criteria functions: This optimization function is defined on the documents which are part of each cluster and does not take into account the assigned documents. to different clusters.• External criteria functions: This optimization function is based on how the different ones are different from each other. They concluded in middle of paper......preprocessing of the document is done. • In feature extraction, the vector which contains the preprocessed data is used to collect the features of that document. This is done by comparing the vector with keywords from the ontology of different areas. • They used the Self Organizing Mapping (SOM) neural network approach for clustering. They pass the created ontology and feature vector to train and then specify the corresponding research domain. In the training and testing phase for training the SOM network, the feature vectors of the created research projects are transferred as input. then this trained network is tested with different feature vectors of proposition/article so that we can obtain the membership class of the proposition/article. This approach is very user friendly and less time consuming as the time at which one submits the article can be categorized and get the result. poster.