Okko Räsänen

Okko Research Publications Resources Contact

machine learning

language acquisition

speech processing

neuroscience

context-aware computing

multimodal processing

articulatory modeling

perception & psychoacoustics

Journal articles and book chapters

Kakouros, S., Räsänen, O. & Alku, P. (in press). Comparison of spectral tilt measures for sentence prominence in speech — effects of dimensionality and adverse noise conditions. Speech Communication, accepted for publication (.pdf).

Räsänen, O., Kakouros, S. & Soderstrom, M. (2018). Is infant-directed speech interesting because it is surprising? — Linking properties of IDS to statistical learning and attention at the prosodic level. Cognition, 178, 193–206 (.pdf).

Kakouros, S., Salminen, N. & Räsänen, O. (2018). Making predictable unpredictable with style — Behavioral and electrophysiological evidence for the critical role of prosodic expectations in the perception of prominence in speech. Neuropsychologia, 109, 181–199 (.pdf).

Räsänen, O., Doyle, G., & Frank, M. C. (2018). Pre-linguistic segmentation of speech into syllable-like units. Cognition, 171, 130–150 (.pdf) (syllabifier algorithm).

Rasilo H. & Räsänen O. (2017). An online model of vowel imitation learning. Speech Communication, 86, 1–23, (.pdf) (web).

Kakouros S. & Räsänen O. (2016). Perception of sentence stress in speech correlates with the temporal unpredictability of prosodic features. Cognitive Science, 40, 1739–1774 (.pdf).

Räsänen O. & Saarinen J. P. (2016). Sequence prediction with sparse distributed hyperdimensional coding applied to the analysis of mobile phone use patterns. IEEE Transactions on Neural Networks and Learning Systems, 27, 1878–1889 (.pdf) (web).

Kakouros S. & Räsänen O. (2016). 3PRO - An unsupervised method for the automatic detection of sentence prominence in speech. Speech Communication, 82, 67–84 (.pdf) (web).

Koolen N., Dereymaeker A., Räsänen O., Jansen K., Vervisch J., Matic V., Naulaers G., De Vos M., Van Huffel S., & Vanhatalo S. (2016). Early development of synchrony in cortical activations in the human. Neuroscience, 322, 298–307 (web).

Räsänen O. & Rasilo H. (2015). A joint model of word segmentation and meaning acquisition through cross-situational learning. Psychological Review, 122(4), 792–829 (.pdf).

Pohjalainen J., Räsänen O. & Kadioglu S. (2015). Feature selection methods and their combinations in high-dimensional classification of speaker likability, intelligibility and personality traits. Computer Speech and Language, 29, 145–171 (web) (Feature selection algorithms for MATLAB).

Koolen N., Dereymaeker A., Räsänen O., Jansen K., Vervisch J., Matic V., De Vos M., Van Huffel S., Naulaers G. & Vanhatalo S. (2014). Interhemispheric synchrony in neonatal EEG revisited: Activation Synchrony Index as a promising classifier. Frontiers in Human Neuroscience, 8:1030, doi: 10.3389/fnhum.2014.01030 (web).

Räsänen O. & Kakouros S. (2014). Modeling dependencies in multiple parallel data streams with hyperdimensional computing. IEEE Signal Processing Letters, 21, 899–903 (web) (.pdf).

Räsänen O. & Laine U. K. (2013). Time-frequency integration characteristics of hearing are optimized for perception of speech-like acoustic patterns. The Journal of the Acoustical Society of America, 134, 407–419 (web).

Rasilo H., Räsänen O. & Laine U. K. (2013). Feedback and imitation by a caregiver guides a virtual infant to learn native phonemes and the skill of speech inversion. Speech Communication, 55, 909–931 (web).

Räsänen O., Metsäranta M. & Vanhatalo S. (2013). Development of a novel robust measure for interhemispheric synchrony in the neonatal EEG: Activation Synchrony Index (ASI). NeuroImage, 69, 256–266 (web).

Räsänen O. (2012). Computational modeling of phonetic and lexical learning in early language acquisition: existing models and future directions. Speech Communication, 54, 975–997 (.pdf) (web).

Räsänen O. & Laine U. K. (2012). A method for noise-robust context-aware pattern discovery and recognition from categorical sequences. Pattern Recognition, 45, 606–616 (web) (.pdf).

Räsänen O. (2011). A computational model of word segmentation from continuous speech using transitional probabilities of atomic acoustic events. Cognition, 120, 149–176 (web) (.pdf).

Räsänen O., Laine U. K. & Altosaar T. (2011). Blind segmentation of speech using non-linear filtering methods. in Ipsic I. (Ed.): Speech Technologies, InTech Publishing. (.pdf).

Papers in peer-reviewed conference proceedings

Räsänen, O., Seshadri, S. & Casillas, M. (2018). Comparison of Syllabification Algorithms and Training Strategies for Robust Word Count Estimation across Different Languages and Recording Conditions. Proc. Interspeech-2018, Hyderabad, India. (.pdf).

Airaksinen, M., Juvela, L., Räsänen, O. & Alku, P. (2018). Time-regularized Linear Prediction for Noise-robust Extraction of the Spectral Envelope of Speech. Proc. Interspeech-2018, Hyderabad, India. (.pdf).

Räsänen, O., Kakouros, S. & Soderstrom, M. (2017). Connecting stimulus-driven attention to the properties of infant-directed speech – Is exaggerated intonation also more surprising? Proceedings of the 39th Annual Conference of the Cognitive Science Society, London, UK, pp. 998–1003 (.pdf), (MATLAB scripts at GitHub) .

Seshadri, S., Remes, U., & Räsänen, O. (2017). Comparison of Non-parametric Bayesian Mixture Models for Syllable Clustering and Zero-Resource Speech Processing. Proc. Interspeech-2017, Stockholm, Sweden, pp. 2744–2748 (.pdf).

Kakouros, S., Räsänen, O. & Alku P. (2017). Evaluation of Spectral Tilt Measures for Sentence Prominence Under Different Noise Conditions. Proc. Interspeech-2017, Stockholm, Sweden, pp. 3211–3215 (.pdf).

Ramirez Lopez, A., Seshadri, S., Juvela, L., Räsänen, O. & Alku P. (2017). Speaking style conversion from normal to Lombard speech using a glottal vocoder and Bayesian GMMs. Proc. Interspeech-2017, Stockholm, Sweden, pp. 1363–1367 (.pdf).

Räsänen, O. (2017). Language is Not About Language: Towards Formalizing the Role of Extra-Linguistic Factors in Human and Machine Language Acquisition and Communication. Proc. Workshop on Grounding Language Acquisition (GLU-2017), Stockolm, Sweden, pp. 37–41 (.pdf).

Michel, P., Räsänen, O., Thiolliere, R., & Dupoux, E. (2017). Blind phoneme segmentation with temporal prediction errors. Proc. ACL SRW-2017, Vancouver, Canada (arXiv.org).

Seshadri, S., Remes, U. & Räsänen O. (2017). Dirichlet process mixture models for clustering i-vector data. Proc. ICASSP-2017, New Orleans, LA, pp. 5470–5474 (.pdf).

Räsänen O., Nagamine T. & Mesgarani N. (2016). Analyzing distributional learning of phonemic categories in unsupervised deep neural networks. Proceedings of the 38th Annual Conference of the Cognitive Science Society, Philadelphia, PA, pp. 1757–1762 (.pdf).

Kakouros S. & Räsänen O.(2016). Statistical Learning of Prosodic Patterns and Reversal of Perceptual Cues for Sentence Prominence. Proceedings of the 38th Annual Conference of the Cognitive Science Society, Philadelphia, PA, pp. 2489–2494 (.pdf).

Kakouros S., Pelemans J., Verwimp L., Wambacq P. & Räsänen O. (2016). Analyzing the contribution of top-down lexical and bottom-up acoustic cues in the detection of sentence prominence. Proc. Interspeech-2016, San Francisco, CA, pp. 1074–1078 (.pdf).

Räsänen O., Doyle G. & Frank M. C. (2015). Unsupervised word discovery from speech using automatic segmentation into syllable-like units. Proc. Interspeech-2015, Dresden, Germany, pp. 3204–3208 (.pdf).

Rasilo H. & Räsänen O. (2015). Weakly-supervised word learning is improved by an active online algorithm. Proc. Interspeech-2015, Dresden, Germany, pp. 1561–1565 (.pdf).

Kakouros S. & Räsänen O. (2015). Automatic Detection of Sentence Prominence in Speech Using Predictability of Word-level Acoustic Features. Proc. Interspeech-2015, Dresden, Germany, pp. 568–572 (.pdf).

Koolen N., Dereymaeker, A., Räsänen O., Jansen K., Vervisch J., Matic V., De Vos M., Naulaers G., Van Huffel S. & Vanhatalo S. (2015). Data-driven metric representing the maturation of preterm EEG. Proc. 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Milan, Italy, pp. 1492–1495 (.pdf).

Räsänen O. & Rasilo H. (2015). Cross-situational cues are relevant for early word segmentation. Proc. 37th Annual Conference of the Cognitive Science Society, Pasadena, California, pp. 1949–1954 (.pdf).

Räsänen O. (2015). Generating Hyperdimensional Distributed Representations from Continuous-Valued Multivariate Sensory Input. Proc. 37th Annual Conference of the Cognitive Science Society, Pasadena, California, pp. 1943–1948 (.pdf).

Rasilo H. & Räsänen O. (2015). Computational evidence for effects of memory decay, familiarity preference and mutual exclusivity in cross-situational learning. Proc. 37th Annual Conference of the Cognitive Science Society, Pasadena, California, pp. 1955–1960 (.pdf).

Kakouros S. & Räsänen O. (2015). Analyzing the Predictability of Lexeme-specific Prosodic Features as a Cue to Sentence Prominence. Proc. 37th Annual Conference of the Cognitive Science Society, Pasadena, California, pp. 1039–1044 (.pdf).

Kakouros S. & Räsänen O. (2014). Perception of Sentence Stress in English Infant Directed Speech. Proc. Interspeech-2014, Singapore, pp. 1821–1825 (.pdf).

Räsänen O. (2014). Basic cuts revisited: Temporal segmentation of speech into phone-like units with statistical learning at a pre-linguistic level. Proc. 36th Annual Conference of the Cognitive Science Society, Quebec, Canada, pp. 2817–2822 (.pdf).

Kakouros S. & Räsänen O. (2014). Statistical Unpredictability of F0 Trajectories as a Cue to Sentence Stress. Proc. 36th Annual Conference of the Cognitive Science Society, Quebec, Canada, pp. 1246–1251 (.pdf).

Räsänen O. & Pohjalainen J. (2013). Random subset feature selection in automatic recognition of developmental disorders, affective states, and level of conflict from speech. Proc. Interspeech-2013, Lyon, France, pp. 210–214 (.pdf).

Knuuttila J., Räsänen O. & Laine U. K. (2013). Automatic self-supervised learning of associations between speech and text. Proc. Interspeech-2013, Lyon, France, pp. 465–469 (.pdf).

Rasilo H., Räsänen O. & de Boer B. (2013). Virtual infant's online acquisition of vowel categories and their mapping between dissimilar bodies. Proc. Workshop on Speech Production in Automatic Speech Recognition, Lyon, France (.pdf).

Kakouros S., Räsänen O. & Laine U. (2013). Attention based temporal filtering of sensory signals for data redundancy reduction. Proc. ICASSP-2013, Vancouver, Canada, pp. 3188-3192 (.pdf).

Räsänen O. (2012). Average Spectrotemporal Structure of Continuous Speech Matches with the Frequency Resolution of Human Hearing. Proc. Interspeech-2012, Portland, Oregon (.pdf).

Räsänen O. (2012). Non-auditory cognitive capabilities in computational modeling of early language acquisition. Proc. Interspeech-2012, Portland, Oregon (.pdf).

Räsänen O., Rasilo H. & Laine, U. K. (2012). Modeling spoken language acquisition with a generic cognitive architecture for associative learning. Proc. Interspeech-2012, Portland, Oregon (.pdf).

Pohjalainen J., Kadioglu S. & Räsänen O. (2012). Feature Selection for Speaker Traits. Proc. Interspeech-2012, Portland, Oregon (.pdf).

Räsänen O. & Rasilo H. (2012). Acoustic analysis supports the existence of a single distributional learning mechanism in structural rule learning from an artificial language. Proc. 34th Annual Conference of the Cognitive Science Society (CogSci2012), Sapporo, Japan, pp. 887-892 (.pdf).

Räsänen O. (2012). Context induced merging of synonymous word models in computational modeling of early language acquisition. Proc. ICASSP-2012, Kyoto, Japan, pp. 5037-5040 (.pdf).

Räsänen O. (2012). Hierarchical unsupervised discovery of user context from multivariate sensory data. Proc. ICASSP-2012, Kyoto, Japan, pp. 2105-2108 (.pdf).

Räsänen O., Leppänen J., Laine U. K., Saarinen, J. (2011). Comparison of Classifiers in Audio and Acceleration Based Context Classification in Mobile Phones. Proc. EUSIPCO-11, Barcelona, Spain, pp. 946-950 (.pdf).

Rasilo H., Laine U. K., Räsänen O. & Altosaar T. (2011). Method for speech inversion with large scale statistical evaluation", Proc. Interspeech-11, Florence, Italy, pp. 2693-2696 (.pdf).

Rasilo H., Laine U. K. & Räsänen O. (2010). Estimation studies of vocal tract shape trajectory using a variable length and lossy Kelly-Lochbaum model. Proc. Interspeech-10, Chiba, Japan, pp. pp. 2414-2417 (.pdf).

Räsänen O. (2010). Fully Unsupervised Word Learning from Continuous Speech Using Transitional Probabilities of Atomic Acoustic Events. Proc. Interspeech-10, Chiba, Japan, pp. pp. 2922-2925 (.pdf).

ten Bosch L., Räsänen O., Driesen J., Aimetti G., Altosaar T., Boves L. (2009). Do Multiple Caregivers Speed up Language Acquisition. Proc. Interspeech-09, Brighton, England, pp. 704-707 (.pdf).

Aimetti G., Moore R., ten Bosch L., Räsänen O. & Laine U. K. (2009). Discovering Keywords from Cross-Modal Input: Ecological vs. Engineering Methods for Enhancing Acoustic Repetitions. Proc. Interspeech-09, Brighton, England, pp. 1171-1174 (.pdf).

Räsänen O., Laine U. K. & Altosaar T. (2009). A noise robust method for pattern discovery in quantized time series: the concept matrix approach. Proc. Interspeech-09, Brighton, England, pp. 3035-3038 (.pdf).

Räsänen O., Laine U. K. & Altosaar T. (2009). An Improved Speech Segmentation Quality Measure: the R-value. Proc. Interspeech-09, Brighton, England, pp. 1851-1854, (.pdf).

Räsänen O., Laine U. K. & Altosaar T. (2009). Self-learning Vector Quantization for Pattern Discovery from Speech. Proc. Interspeech-09, Brighton, England, pp. 852-855 (.pdf).

Räsänen O. & Driesen J. (2009). A comparison and combination of segmental and fixed-frame signal representations in NMF-based word recognition. Proc. 17th Nordic Conference on Computational Linguistics (NODALIDA-09), Odense, Denmark (.pdf).

Räsänen O., Laine U. K. & Altosaar T. (2008)., "Computational language acquisition by statistical bottom-up processing. Proc. Interspeech-08, Brisbane, Australia, pp. 1980-1983 (.pdf).

Other publications

Räsänen O., "Studies on unsupervised and weakly supervised methods in computational modeling of early language acquisition", Doctoral thesis, Aalto University, School of Electrical Engineering, 2013 (.pdf).

Räsänen O., Laine U. K. & Saarinen J., "Automatic Learning of a Topology of Associations from Multiple Data Streams", A technical white paper, 2012 (.pdf).

Laine U. K. & Räsänen O., "Indirect estimation of formant frequencies through mean spectral variance with application to automatic gender recognition", In Proc. 6th International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA), Firenze, Italy, 2009 (.pdf).

ten Bosch L., Boves L. & Räsänen O., "Learning meaningful units from multimodal input - the effect of interaction strategies", Proc. Workshop on Child, Computer and Interaction 2009 (WOCCI), Boston, MA, United States, 2009 (.pdf).

Räsänen O., "A Review of Missing-Feature Methods in Automatic Speech Recognition", in Palomäki K. J., Remes U. & Kurimo M. (Eds.), Studies on noise robust automatic speech recognition. Technical Report TKK-ICS-R19, Helsinki University of Technology, Dept. ICS, Finland, 2009.

Räsänen O., Altosaar T. & Laine U. K., "Comparison of prosodic features in Swedish and Finnish IDS/ADS speech", Proc. Nordic Prosody X, Helsinki, Finland, 2008 (.pdf).

Räsänen O., "Speech Segmentation and Clustering Methods for a New Speech Recognition Architecture", M.Sc. Thesis, Helsinki University of Technology, 2007 (.pdf).

Other stuff (working papers/reports/patents)

Laine U. K., Räsänen O., Fagerlund S., Altosaar T., Aimetti G. & Henter G. (2009). PD module with self-directed search, derived segmental quality measures, full integration of CMM., ACORNS project deliverable.

Laine U. K., Räsänen O., Altosaar T., Driesen J., Aimetti G. & Henter G. (2008). Methods for enhanced pattern discovery in speech processing. ACORNS project deliverable, 2008 (.pdf).

Laine U. K. & Räsänen O. (2013). Method for pattern discovery and recognition. US Patent 8560469 B2, filed 2008, approved Oct. 2013.

Laine U. K. & Räsänen O. (2007). Audiosignaalin segmentointi automaattisesti ilman opetusta (Automatic, unsupervised segmentation of audio signals). patent no. FI120223.

Contact: firstname.surname@aalto.fi