Accelerating EM for large databases
Oh la la
Your session has expired but don’t worry, your message
has been saved.Please log in and we’ll bring you back
to this page. You’ll just need to click “Send”.
Your evaluation is of great value to our authors and readers. Many thanks for your time.
When you're done, click "publish"
Only blue fields are mandatory.
Your mailing list is currently empty.
It will build up as you send messages
and links to your peers.
besides you has access to this list.
Enter the e-mail addresses of your recipients in the box below. Note: Peer Evaluation will NOT store these email addresses log in
Your message has been sent.
Full text for this article was not available? Send a request to the author(s)
: Accelerating EM for large databases
Abstract : The EM algorithm is a popular method for parameter estimation in a variety of problems involving missing data. However, the EM algorithm often requires signi cant computational resources and has been dismissed as impractical for large databases. We presenttwo approaches that signi cantly reduce the computational cost of applying the EM algorithm to databases with a large number of cases, including databases with large dimensionality. Both approaches are based on partial E-steps for which we can use the results of Neal and Hinton (1998) to obtain the standard convergence guarantees of EM. The rst approach is a version of the incremental EM, described in Neal and Hinton (1998), which cycles through data cases in blocks. The number of cases in each block dramatically e ects the e ciency of the algorithm. We provide a method for selecting a near optimal block size. The second approach, which we call lazy EM, will, at scheduled iterations, evaluate the signi cance of each data case and then proceed for several iterations actively using only the signi cant cases. We demonstrate that both methods can signi cantly reduce computational costs through their application to high-dimensional real-world and synthetic mixture modeling problems for large databases. Keywords: Expectation Maximization Algorithm, incremental EM, lazy EM, online EM, data blocking, mixture models, clustering.
Leave a comment
This contribution has not been reviewed yet. review?