Show simple item record

dc.contributor.authorRowe, Neil C.
dc.date.accessioned2013-09-18T17:45:15Z
dc.date.available2013-09-18T17:45:15Z
dc.date.issued2004
dc.identifier.urihttp://hdl.handle.net/10945/36457
dc.descriptionThis article is to appear in Web Mining: Applications and Techniques ed. A. Scime, 2004.en_US
dc.description.abstractWe survey research on using captions in data mining from the Web. Captions are text that describes some other information (typically, multimedia). Since text is considerably easier to index and manipulate than non-text (being usually smaller and less ambiguous), a good strategy for accessing non-text is to index its captions. However, captions are not often obvious on the Web as there are few standards. So caption references can reside within paragraphs near a media reference, in clickable text or display text for it, on names of media files, in headings or titles on the page, and in explicit references arbitrarily far from the media. We discuss the range of possible syntactic clues (such as HTML tags) and semantic clues (such as meanings of particular words). We discuss how to quantify their strength and combine their information to arrive at a consensus. We then discuss the problem of mapping information in captions to information in media objects. While it is hard, classes of mapping schemes are distinguishable, and segmentation of the media can be matched to a parsing of the caption by constraint-satisfaction methods. Active work is addressing the issue of automatically learning the clues for mapping from examples.en_US
dc.publisherMonterey, California. Naval Postgraduate Schoolen_US
dc.rightsThis publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.en_US
dc.titleExploiting Captions for Web Data Mining by Neil C. Roween_US
dc.typeArticleen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record