SCRAP Header

Medline Document Clustering with Semi-Supervised Spectral Clustering Algorithm

IJEECC Front Page

To clustering biomedical documents, three different types of information’s are used. They are local content (LC),global content(GC) and mesh semantic(MS).In previous method only one are two types of information are cluster using Constraints and distance based algorithm. But in proposed system we used Semi Supervised clustering algorithm. It made most of the noisy constraints to improve clustering performance. The result will be highly powerful and very promising.
Keywords:Biomedical text mining, document clustering, semi supervised clustering, spectral clustering


  1. E. Sayers, T. Barrett, D. A. Benson, E. Bolton, S. H. Bryant, K. Canese, V. Chetvernin, D. M. Church, M. DiCuccio, S. Federhen, M. Feolo, L. Y. Geer, W. Helmberg, Y. Kapustin, D. Landsman, D. J. Lipman, Z. Lu, T. L. Madden, T. Madej, D. R. Maglott, A. Marchler-Bauer, V. Miller, I. Mizrachi, J. Ostell, A. Panchenko, K. D. Pruitt, G. D. Schuler, E. Sequeira, S. T. Sherry, M. Shumway, K. Sirotkin, D. Slotta, A. Souvorov, G. Starchenko, T. A. Tatusova, L. Wagner, Y. Wang, W. J. Wilbur, E. Yaschenko, and J. Ye, “Database resources of the national center for biotechnology information,” Nucleic Acids Res., vol. 38, no. 1, pp. D5–D16, Jan. 2010.
  2. M. Krallinger, A. Valencia, and L. Hirschman, “Linking genes to liter-ature: Text mining, information extraction, and retrieval applications for biology,” Genome Biol., vol. 9, no. S2, pp. S8–S14, Sep. 2008.
  3. A. Rzhetsky, M. Seringhaus, and M. Gerstein, “Seeking a new biology through text mining,” Cell , vol. 134, no. 1, pp. 9–13, Jul. 2008.
  4. R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. Reading, MA: Addison-Wesley, 1999.
  5. M. Lee, W. Wang, and H. Yu, “Exploring supervised and unsupervised methods to detect topics in biomedical text,”BMC Bioinformat.,vol.7, no. 1, p. 140, Mar. 2006.
  6. G. Salton and M. McGill, Introduction to Modern Information Retrieval .New York: McGraw-Hill, 1983.
  7. J. Lin and W. Wilbur, “PubMed related articles: A probabilistic topic-based model for content similarity, ”BMC Bioinformatics. , vol. 8, no. 1,p. 423, Oct. 2007.
  8. T. Theodosiou, N. Darzentas, L. Angelis, and C. Ouzounis, “PuReD-MCL: A graph-based PubMed document clustering methodology,”Bioin-formatics , vol. 24, no. 17, pp. 1935–1941, Sep. 2008.
  9. S. J. Nelson, M. Schopen, A. G. Savage, J. L. Schulman, and N. Arluk,“The MeSH translation maintenance system: Structure, interface design, and implementation,” in Proc. MEDINFO , 2004, pp. 67–69.
  10. I. Yoo, X. Hu, and I.-Y. Song, “Biomedical ontology improves biomedical literature clustering performance: A comparison study,” Int. J. Bioinfor-mat. Res. Appl., vol. 3, no. 3, pp. 414–428, Sep. 2007.
  11. X. Zhang, L. Jing, X. Hu, M. Ng, and X. Zhou, “A comparative study of ontology based term similarity measures on PubMed document cluster-ing,” in Proc. DASFAA (LNCS 4443), 2007, pp. 115–126.
  12. S. Zhu, J. Zeng, and H. Mamitsuka, “Enhancing MEDLINE document clustering by incorporating mesh semantic similarity,” Bioinformatics , vol. 25, no. 15, pp. 1944–1951, Aug. 2009.
  13. D. Hanisch, A. Zien, R. Zimmer, and T. Lengauer, “Clustering of biological networks and gene expression data,” Bioinformatics , vol. 18, no. S1, pp. 145–154, Jul. 2002.
  14. W. Pan, “Incorporating gene functions as priors in model-based clustering of microarray gene expression data,” Bioinformatics , vol. 22, no. 7, pp. 795–801, Apr. 2006.
  15. D. Huang and W. Pan, “Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data,” Bioinformatics, vol. 22, no. 10, pp. 1259–1268, May 2006.