Skip to content

Séminaires par date

Aot 2017
Lu Ma Me Je Ve Sa Di
31 1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31 1 2 3

Tous les séminaires

Vendredi 16 juin 2017

Intervenant : Mikhail BELKIN (Ohio State University)

De 11h à 12h15, en salle 11 à l'ENSAE : 3 avenue Pierre Larousse à Malakoff (Tram T3 : "Porte de Vanves" ou Métro 13 : "Porte de Vanves" ou "Malakoff Plateau de Vanves")

 «Subtle but not malicious? The (high) computational cost of non-smoothness in learning from big data »

 

 


 

What can we learn from big data? First, more data allows us to more precisely estimate probabilities of uncertain outcomes. Second, data provides better coverage to approximate functions more precisely. I will argue that the second is key to understanding the recent success of large scale machine learning. A useful way of thinking about this issue is that it is necessary to use many more components for principal component regression/classification, perhaps almost as many as data points. It turns out that there are fundamental computational barriers preventing some of the standard techniques (e.g., the kernel methods) from utilizing sufficiently many principal components on large datasets. These computational limitations result in over-regularization and failure to benefit from big data. I will discuss the nature of these barriers and how they can be overcome. In particular, I will show a simple kernel algorithm (EigenPro) demonstrating significant and consistent improvements over the state of the art on large datasets.

 

Based on joint work with Siyuan Ma.

 

Ce séminaire est organisé par :

 

Alexandre TSYBAKOV         (Laboratoire de Statistique-CREST)

 

Cristina BUTUCEA                (Laboratoire de Statistique-CREST)