2012. június 24.

BJMT Alkalmazott Matematikai Konferencia 2012

A BJMT Alkalmazott Matematikai Konferencia 2012-n vettem részt nem rég, köszönet érte a WebLibnek. Sok érdekes előadást hallottam, remélem hamarosan jut időm egy élménybeszámoló megírására. Itt olvashatod absztraktomat és a hozzá kapcsolódó slideot is megnézheted.

Title: The Data Deluge – Adatáradat
Abstract: ”Data is the new oil" as the saying goes. The recent developments in IT opened up the possibility of collecting, storing and analyzing large amounts of data. Norvig et al. argues [1] that given a large enough data set, naive algorithms outperform highly sophisticated ones. On the other hand, Bender and Good [2] suggest we have to review our theories about language in the light of the unprecedented amount of available empirical data. This approach is parallel to so-called probabilistic linguistics research program[3]. Using the Internet as a source of data is exciting and challenging. Information is usually encoded into text files and we have to employ natural language processing techniques to extract it. To cope with the sheer size of today’s data sets, we have to adapt our algorithms to the modern parallel distributed processing systems.

[1] Alon Halevy, Peter Norvig, and Fernando Pereira: The Unreasonable Effectiveness of Data, IEEE Intelligent Systems, March/April, 2009
[2] Emily M. Bender and Jeff Good. 2010. A Grand Challenge for Linguistics: Scaling Up and Integrating Models. White paper contributed to NSF’s SBE 2020 initiative. http://www.nsf.gov/sbe/sbe 2020/submission detail.cfm?upld id=81 (06.06.2012)
[3] Rens Bod, Jennifer Hay, and Stefanie Jannedy (eds): Probabilistic Linguistics, MIT Press, 2003

