A Brief Survey of Web Data Extraction Tools Web数据抽取工具综述 (续)


[1] ABASCAL, R., AND SANCHEZ, J. A. X-tract: Structure extraction from botanical textual descriptions. In Proceeding of the String Processing & Information Retrieval Symposium and International Workshop on Groupware, SPIRE/CRIWG (Cancun, Mexico, 1999), pp. 2-7
[2] ABITEBOU., S. Querying semi-structured data. In Database Theory - ICDT'97, 6th International Conference, Delphi, Greece, January 8-10, 1997, Proceedings (1997), F. N. Afrati and P. Kolaitis, Eds., vol. 1186 0f Lecture Notes in Computer Science, Springer, pp. 1-18.
[3] ADELBERG: B. NoDoSE - A tool for semi-automatically extracting structured and semistructured data from text documents. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Seattle, WA, 1998), pp. 283-294.
[4] AROCENA, G. O., AND MENDELZON: A. O. WebOQL:Restructuring documents, databases, and webs. In Proceedings of the 14th International Conference on Data Engineering (Orlando, FL, 1998), pp. 24-33.
[5] BAUMGARTNER: R.: FLESCA: S.: AND GOTTLOB: G. Visual Web information extraction with Lixto. In Proceedings of the 26th International Conference on Very Large Data Bases (Rome, Italy, 2001), pp. 119-128.
[6] BRAY, T., PAOLI, J., AND SPERBERG-McQUEEN: M. Extensible markup language (XML) 1.0. http:llwww.w3.orglTRlREC-xml.
[7] BRIN, S., MOTWANI, R., PAGE: L.: AND WINOGRAD: T. What can you do with a Web in your pocket? Data Engineering Bulletin 2/; 2 (1998), 37-47.
[8] CALIFF: M. E.: AND MOONEY, R. J. Relational Learning of Pattern-Match Rules for Information Extraction. In Proceedings of the Sixteenth National Conference on Artzjicial Intelligence and Eleventh Conference on Innovative Applications of Artificial Intelligence (Orlando, FL; 1999), pp. 328-334.
[9] CRESCENZI, V., AND MECCA, G. Grammars have exceptions. Information Systems 23, 8 (1998), 539-565.
[10] CRESCENZI, V., MECCA, G.: AND MERIALDO: P. RoadRunner: Towards automatic data extraction from large Web sites. In Proceedings of the 26th International Conference on Very Large Data Bases (Rome, Italy, 2001), pp. 109-118.
[11] EMBLEY: D. W.: CAMPBELL: D. M.: JIANG: Y. S.: LIDDLE: S. W.: KAI NG: Y.: QUASS: D.: AND SMITH: R. D. Conceptual-model-based data extraction from multiple-record Web pages. Data and Knowledge Enginee'ring 3/, 3 (1999), 227-251.
[12] EMBLEY: D. W.: JIANG: Y. S.: AND NG: Y.-K. Record-boundary discovery in Web documents. In Proceedings ACM SIGMOD International Conference of Management of Data (Philadelphia, PA, 1999), pp. 467-478.
[13] FLORESCU: D.: LEVY: A. Y.: AND MENDELZON: A. O. Database techniques for the World-Wide Web: A survey. SIGMOD Record 27, 3 (1998), 59-74.
[14] FREITAG, D. Machine Learning for Information Extraction in Informal Domains. Machine Learning 39, 213 (2000), 169-202.
[15] GOLGHER; P. B.: DA SILVA: A. S.; LAENDER: A. H. F.: AND RIBEIRo-NETO, B. A. Bootstrapping for Example-Based Data Extraction. In Proceedings of the 2001 ACM CIKM International Conference on Information and Knowledge Managemen,t (Atlanta, GA, 2001), pp. 371-378.
[16] HAMMER: J.: GARCIA-MOLINA: H.: NESTOROV: S.: YERNENI: R.: BREUNIG: M.: AND VASSALOS: V. Template-based wrappers in the TSIMMIS system. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Tucson, AZ, 1997), pp. 532-535.
[17] HAMMER; J.; McHUGH; J.: AND GARCIA-MOLINA; H. Semistructured data: The TSIMMIS experience. In Proceedings of the First East-European Symposium on Advances in Databases and Information Systems (St. Petersburg, Russia, 1997), pp. 1-8.
[18] Hsu: C.-N.; AND DUNG; M.-T. Generating finite-state transducers for semi-structured data extraction from the Web. Information Systems 23, 8 (1998), 521-538.
[19] HUCK: G.: FANKHAUSER: P.: ABERER: K.: AND NEUHOLD, E. J. Jedi: Extracting and synthesizing information from the Web. In Proceedings of the 3rd IFCIS International Conference on Cooperative Information Systems (New York City, NY, 1998), pp. 32-43.
[20] KUSHMERICK, N. Wrapper induction: Efficiency and expressiveness. Artzjicial Intelligence Journal 118, 1-2 (2000), 15-68.
[21] LAENDER: A. H. F.: RIBEIRo-NETO: B.: AND DA SILVA.: A. S. DEByE - Data Extraction By Example. Data and Knowledge Engineering 40, 2 (2002), 121-154.
[22] LAENDER: A. H. F.: RIBEIRo-NETO, B., DA SILVA, A. S.; AND SILVA; E. S. Representing Web Data as Complex Objects. In Electronic Commerce and Web Technologies, K. Bauknecht, S. K. Mandria, and G. Pernul, Eds. Springer: Berlin, 2000: pp. 216-228.
[23] Liu: L.: Pu: C.: AND HAN, W. XWRAP: An XML-enabled wrapper construction system for Web information sources. In Proceedings of the 16th International Conference on Data Engineering (San Diego, CA, 2000), pp. 611-621.
[24] LUDASCHER: B.: HIMMERODER: R.: LAUSEN: G.: MAY: W.: AND SCHLEPPHORST, C. Managing semistructured data with FLORID: A deductive object-oriented perspective. Information Systems 23, 8 (1998), 589-613.
[25] MECCA: G.: ATZENI: P.: MASCI: A.: MERIALDO: P.: AND SINDONI: G. The Araneus Web-Base Management System. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Seattle, WA, 1998), pp. 544-546.
[26] MUSLEA, I. RISE: Repository of online information sources used in information extraction tasks. http.flwww.isi.edu/ m,usleal RISE/
[27] MUSLEA, I. Extraction Patterns for Information Extraction Tasks: A Survey. In Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extractiort, (Orlando, FL, 1999), pp. 1-6.
[28] MUSLEA: I.: MINTON: S.: AND KNOBLOCK: C. Hierarchical wrapper induction for semi-structured information sources. Autonomous Agents and Multi-Agent Systems 4, 1/2 (2001), 93-114.
[29] PAPAKONSTANTINOU: Y.: GARCIA-MOLINA; H.: AND WIDOM, J. Object Exchange Across Heterogenous Information Sources. In Proceedings of 11th International Conference on Data Engineering
(Taipei, Taiwan, 1995), pp. 251-260.
[30] RIBEIRo-NETO: B.: LAENDER: A. H. F.: AND DA SILVA: A. S. Extracting semi-structured data through examples. In Proceedings of the 1999 ACM CIKM International Conference on Information and Knowledge Managemen,t (Kansas City, MO, 1999), pp. 94-101.
[31] SAHUGUET: A.: AND AZAVANT, F. Building intelligentWeb applications using lightweight wrappers. Data and Knowledge Engineering 36, 3 (2001), 283-316.
[32] SODERLAND: S. Learning information extraction rules for semi-structured and free text. Machine Learning 34, 1-3 (1999), 233-272.
[33] TEIXEIRA: J. S. A Comparative Study of Approaches for Semistructured Data Extraction. Master;s thesis, Department of Computer Science, Federal University of Minas Gerais, Brazil, 2001. In Portuguese.
[34] WORLD WIDE WEB CONSORTIUM. W3C. The Document Object Model. http://www.w3.org/DOM.

  • 0
  • 0
    觉得还不错? 一键收藏
  • 0


  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


