Multi-label learning is a challenging problem which has received significant attention in the last few years. Ensemble methods have been proved to be an effective tool in traditional classification, and are also broadly used in multi-label classification. The increasing computational complexity of the algorithms that process large-scale high-dimensional datasets, in addition to the ever-growing generation of data, require new methods to scale data effectively and efficiently. MapReduce is a distributed computing framework that offers a robust paradigm to handle big data in a cluster of nodes. This seminar will outline the functionality of ensembles in multi-label classification, focusing on their characteristics that can be optimized through the use of parallel and/or distributed computing, and specifically, SPARK. The advantages and disadvantages of the application of different approaches will be highlighted. The experimental analysis show the performance and speedup of multiple implementations using benchmark datasets characterized by different number of instances, attributes, labels, and imbalance level.

Biography: Jorge Gonzalez is a PhD student at the Virginia Commonwealth University (VCU), USA. He received his M.Sc. and B.Sc. degrees in Computer Science at Universidad Carlos III de Madrid, in 2015 and 2013, respectively. Previously, he was working at SAP (Germany), Indra (Spain), and NextLimit (Spain). His research interests include multi-label learning, ensembles methods and high-performance computing.

plate with fork and knife, books, microscope and test tubes
Sponsor(s)
Engineering: Computer Science
Audience
All ( Open to the public )