Absolute bounds on set intersection and union sizes from distribution information

Authors
Rowe, Neil C.
Subjects
databases, query processing, statistical computing, statistical inequalities, sets, Boolean algebra, estimation.
Advisors
Date of Issue
1985-09
Date
Publisher
Monterey, California. Naval Postgraduate School
Language
eng
Abstract
Estimation of set intersection and union sizes is important for access method selection for a database. Absolute bounds on sizes are often much easier to compute than size estimates, requiring no distributional or independence assumptions, and can answer many of the same needs. We present a large compendium of quick closed-form bounds on set intersection and union sizes, each applying to a different situation; they can be expressed as rules, and managed by rule-based or 'knowledge-base' architecture. These methods use general-purpose statistics precomputed on the data, and exploit homomorphisms (onto mappings) of the data items onto distributions that can be more easily analyzed. Our methods can be used anytime, but tend to work best when there are strong or complex correlations in the data. This circumstance is poorly addressed by the standard methods of independence-assumption and distributional-assumption estimates, and hence our methods fill a need
Type
Technical Report
Description
Series/Report No
Department
Computer Science
Organization
Computer Science (CS)
Graduate School of Operational and Information Sciences (GSOIS)
Identifiers
NPS Report Number
NPS-52-85-014
Sponsors
Foundation Research Program of the Naval Postgraduate School with funds provided by the Chief of Naval Research, Arlington, VA
Funder
61152N; RR000-01-10, N0001484WR41001
Format
Citation
Distribution Statement
Approved for public release; distribution is unlimited.
Rights