On unification of processing methods of the structured information

A.Yu. Doroshenko, L.G. Molotievskiy

Abstract


The problem of search and collection of information from DOM models of the same type is examined inthe article. A mechanism of collection of the structured information from relevant sources is offered, that allows to build the universal analyzers of data from these sources.

Prombles in programming 2011; 4: 56-62


References


DOM (Document Object Model)

http://www.w3.org/TR/REC-DOM-Level-1/

XPath

http://www.w3.org/TR/1999/REC-xpath-19991116/

Степанов Р.Г. Технология Data Mining: Интеллектуальный анализ данных. – 2008. – http://m8.ksu.ru/EOS/dm.pdf

VPSA (Vision-based Page Segmentation Algorithm)

http://research.microsoft.com/apps/pubs/default.aspx?id=70027

SmartBrowser

http://smartbrowser.codeplex.com/

HTML Agility Pack Project home page. – http://htmlagilitypack.codeplex.com/

Smith K.A. Cross-Disciplinary perspectives on meta-learning for algorithm selection. ACM Comput. Surv., 41, 1, Article 6 (December 2008). – 2008. – 25 p.

Wang X., Smith K.A., Hyndman R. Characteristic-Based clustering for time series data. Data Mining Knowl. Discov. 13. – 2006. – P. 335–364.

Adelberg B. NoDoSE: A tool for semiautomatically extracting structured and semistructured data from text documents, In Proceedings of ACM SIGMOD Conference on Management of Data, 1998. – Р. 283– 294.

Bar-Yossef Z. and Rajagopalan S., Template Detection via Data Mining and its Applications, In Proceedings of the 11th International World Wide Web Conf. (WWW2002), 2002.

Richter J. CLR via C#. – Microsoft, 2010. – 896 с.

Durstenfeld R. Algorithm 235: Random permutation. Communications of the Association for Computing Machinery, 7:420. – 1964.


Refbacks

  • There are currently no refbacks.