Load Balancing of Parallelized Information Filters
Loading...
Authors
Rowe, Neil C.
Zaky, Amr
Subjects
information filtering
data parallelism
load balancing
information retrieval
conjunctions
optimality
Monte Carlo methods
data parallelism
load balancing
information retrieval
conjunctions
optimality
Monte Carlo methods
Advisors
Date of Issue
2002
Date
March / April 2002
Publisher
Monterey, California. Naval Postgraduate School
Language
Abstract
We investigate data-parallel implementation of a set of "informationfilters" used to rule out uninteresting data from a database or
datastream. We develop an analytic model for the costs and advantages ofload rebalancing of the parallel filtering processes, as well as
a quick heuristic for its desirability. Our model uses binomial models of the filter processes, and fits key parameters to results of
extensive simulations. Experiments confirm our model. Rebalancing should pay off whenever processor communications costs are
high. Further experiments showed it can also pay off even with low communications costs for 16-64 processes and 1-10 data items per
processor; then imbalances can increase processing time by up to 52% in representative cases, and rebalancing can increase it by 78%,
so our quick predictive model can be valuable. Results also show our proposed heuristic rebalancing criterion gives close to optimal
balancing. We also extend our model to handle variations in filter processing time per data item.
Type
Conference Paper
Description
This paper appeared in IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 2 (March/April 2002), 456-461.
Series/Report No
Department
Computer Science (CS)
Organization
Identifiers
NPS Report Number
Sponsors
sponsored by DARPA as part of the I3 Project under AO 8939, and by the U. S. Naval Postgraduate School
Funding
funds provided by the Chief for Naval Operations
Format
Citation
Distribution Statement
Approved for public release; distribution is unlimited.
