Active Learning and Effort Estimation: Finding the Essential Content of Software Effort Estimation Data

dc.contributor.authorKocaguneli, Ekrem
dc.contributor.authorMenzies, Tim
dc.contributor.authorKeung, Jacky
dc.contributor.authorCok, David
dc.contributor.authorMadachy, Ray
dc.contributor.corporateNaval Postgraduate School (U.S.)
dc.date.accessioned2018-09-06T20:12:54Z
dc.date.available2018-09-06T20:12:54Z
dc.date.issued2013-08
dc.descriptionThe article of record as published may be found at http://dx.doi.org/10.1109/TSE.2012.88.
dc.description.abstractBackground: Do we always need complex methods for software effort estimation (SEE)? Aim: To characterize the essential content of SEE data, i.e., the least number of features and instances required to capture the information within SEE data. If the essential content is very small, then 1) the contained information must be very brief and 2) the value added of complex learning schemes must be minimal. Method: Our QUICK method computes the euclidean distance between rows (instances) and columns (features) of SEE data, then prunes synonyms (similar features) and outliers (distant instances), then assesses the reduced data by comparing predictions from 1) a simple learner using the reduced data and 2) a state-of-the-art learner (CART) using all data. Performance is measured using hold-out experiments and expressed in terms of mean and median MRE, MAR, PRED(25), MBRE, MIBRE, or MMER. Results: For 18 datasets, QUICK pruned 69 to 96 percent of the training data (median = 89 percent). K ¼ 1 nearest neighbor predictions (in the reduced data) performed as well as CART’s predictions (using all data). Conclusion: The essential content of some SEE datasets is very small. Complex estimation methods may be overelaborate for such datasets and can be simplified. We offer QUICK as an example of such a simpler SEE method.en_US
dc.description.funderUS National Science Foundation CCF: 1017330en_US
dc.description.funderUS National Science Foundation CCF: 1017263en_US
dc.description.funderW911QX-10-C-0066en_US
dc.description.sponsorshipUS National Science Foundationen_US
dc.description.sponsorshipUS Army Research Laboratoryen_US
dc.format.extent14 p.
dc.identifier.citationKocaguneli, Ekrem, et al. "Active learning and effort estimation: Finding the essential content of software effort estimation data." IEEE Transactions on Software Engineering 39.8 (2013): 1040-1053.
dc.identifier.urihttps://hdl.handle.net/10945/59879
dc.publisherIEEE
dc.rightsThis publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.
dc.subject.authorSoftware cost estimation
dc.subject.authoractive learning
dc.subject.authoranalogy
dc.subject.authork-NN
dc.titleActive Learning and Effort Estimation: Finding the Essential Content of Software Effort Estimation Dataen_US
dspace.entity.typePublication
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Madachy_Active Learning_2013-08.pdf
Size:
2.25 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.18 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections