Active Learning and Effort Estimation: Finding the Essential Content of Software Effort Estimation Data
Loading...
Authors
Kocaguneli, Ekrem
Menzies, Tim
Keung, Jacky
Cok, David
Madachy, Ray
Subjects
Software cost estimation
active learning
analogy
k-NN
active learning
analogy
k-NN
Advisors
Date of Issue
2013-08
Date
Publisher
IEEE
Language
Abstract
Background: Do we always need complex methods for software effort estimation (SEE)? Aim: To characterize the essential
content of SEE data, i.e., the least number of features and instances required to capture the information within SEE data. If the
essential content is very small, then 1) the contained information must be very brief and 2) the value added of complex learning
schemes must be minimal. Method: Our QUICK method computes the euclidean distance between rows (instances) and columns
(features) of SEE data, then prunes synonyms (similar features) and outliers (distant instances), then assesses the reduced data by
comparing predictions from 1) a simple learner using the reduced data and 2) a state-of-the-art learner (CART) using all data.
Performance is measured using hold-out experiments and expressed in terms of mean and median MRE, MAR, PRED(25), MBRE,
MIBRE, or MMER. Results: For 18 datasets, QUICK pruned 69 to 96 percent of the training data (median = 89 percent). K ¼ 1 nearest
neighbor predictions (in the reduced data) performed as well as CART’s predictions (using all data). Conclusion: The essential content
of some SEE datasets is very small. Complex estimation methods may be overelaborate for such datasets and can be simplified. We
offer QUICK as an example of such a simpler SEE method.
Type
Description
The article of record as published may be found at http://dx.doi.org/10.1109/TSE.2012.88.
Series/Report No
Department
Organization
Naval Postgraduate School (U.S.)
Identifiers
NPS Report Number
Sponsors
US National Science Foundation
US Army Research Laboratory
US Army Research Laboratory
Funder
US National Science Foundation CCF: 1017330
US National Science Foundation CCF: 1017263
W911QX-10-C-0066
US National Science Foundation CCF: 1017263
W911QX-10-C-0066
Format
14 p.
Citation
Kocaguneli, Ekrem, et al. "Active learning and effort estimation: Finding the essential content of software effort estimation data." IEEE Transactions on Software Engineering 39.8 (2013): 1040-1053.
Distribution Statement
Rights
This publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.