PROGRAM
Overview of the First Workshop on Language Models for Low-Resource Languages (LoResLM 2025) Hansi Hettiarachchi, Tharindu Ranasinghe, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan and Lasitha Randunu Chandrakantha Uyangodage | |
Monday, January 20, 2025 | |
08:45–09:00 | Opening Remarks |
09:00–10:00 Invited Talk: Jose Camacho-Collados (Cardiff University) | |
10:00–10:30 Session 1: Language Modelling | |
10:00–10:15 | Atlas-Chat: Adapting Large Language Models for Low-Resource Moroccan Arabic Dialect Guokan Shang, Hadi Abdine, Yousef Khoubrane, Amr Mohamed, Yassine ABBAHADDOU, Sofiane Ennadir, Imane Momayiz, Xuguang Ren, Eric Moulines, Preslav Nakov, Michalis Vazirgiannis and Eric Xing |
10:15–10:30 | Empowering Persian LLMs for Instruction Following: A Novel Dataset and Training Approach Hojjat Mokhtarabadi, Ziba Zamani, Abbas Maazallahi and Mohammad Hossein Manshaei |
10:30–11:00 | Coffee Break |
11:00–12:00 Poster Session 1: Language Model Applications/ Sentiment Analysis/ Machine Translation | |
BnSentMix: A Diverse Bengali-English Code-Mixed Dataset for Sentiment Analysis Sadia Alam, Md Farhan Ishmam, Navid Hasin Alvee, Md Shahnewaz Siddique, Md Azam Hossain and Abu Raihan Mostofa Kamal | |
Using Language Models for assessment of users’ satisfaction with their partner in Persian Zahra Habibzadeh and Masoud Asadpour | |
Enhancing Plagiarism Detection in Marathi with a Weighted Ensemble of TF-IDF and BERT Embeddings for Low-Resource Language Processing Atharva Mutsaddi and Aditya Prashant Choudhary | |
Investigating the Impact of Language-Adaptive Fine-Tuning on Sentiment Analysis in Hausa Language Using AfriBERTa Sani Abdullahi Sani, Shamsuddeen Hassan Muhammad and Devon Jarvis | |
Automated Collection of Evaluation Dataset for Semantic Search in Low-Resource Domain Language Anastasia Zhukova, Christian E. Matt and Bela Gipp | |
Filipino Benchmarks for Measuring Sexist and Homophobic Bias in Multilingual Language Models from Southeast Asia Lance Calvin Lim Gamboa and Mark Lee | |
Does Machine Translation Impact Offensive Language Identification? The Case of Indo-Aryan Languages Alphaeus Dmonte, Shrey Satapara, Rehab Alsudais, Tharindu Ranasinghe and Marcos Zampieri | |
Exploiting Word Sense Disambiguation in Large Language Models for Machine Translation Van-Hien Tran, Raj Dabre, Hour Kaing, Haiyue Song, Hideki Tanaka and Masao Utiyama | |
Low-Resource Interlinear Translation: Morphology-Enhanced Neural Models for Ancient Greek Maciej Rapacz and Aleksander Smywiński-Pohl | |
Language verY Rare for All Ibrahim Merad, Amos Wolf, Ziad Mazzawi and Yannick Léo | |
Improving LLM Abilities in Idiomatic Translation Sundesh Donthi, Maximilian Spencer, Om B. Patel, Joon Young Doh, Eid Rodan, Kevin Zhu and Sean O’Brien | |
12:00–13:00 Session 2: Language Model Applications | |
12:00–12:15 | A Comparative Study of Static and Contextual Embeddings for Analyzing Semantic Changes in Medieval Latin Charters Yifan Liu, Gelila Tilahun, Xinxiang Gao, Qianfeng Wen and Michael Gervers |
12:15–12:30 | From Arabic Text to Puzzles: LLM-Driven Development of Arabic Educational Crosswords Kamyar Zeinalipour, Moahmmad Saad, Marco Maggini and Marco Gori |
12:30–12:45 | Bridging Literacy Gaps in African Informal Business Management with Low-Resource Conversational Agents Maimouna Ouattara, Abdoul Kader Kaboré, Jacques Klein and Tegawendé F. Bissyandé |
12:45–13:00 | Social Bias in Large Language Models For Bangla: An Empirical Study on Gender and Religious Bias Jayanta Sadhu, Maneesha Rani Saha and Rifat Shahriyar |
13:00–14:00 | Lunch Break |
14:00–15:00 Poster Session 2: Language Modelling/ Linguistic Insights, Parsing and Semantic Tagging with Language Models | |
Extracting General-use Transformers for Low-resource Languages via Knowledge Distillation Jan Christian Blaise Cruz | |
Beyond Data Quantity: Key Factors Driving Performance in Multilingual Language Models Sina Bagheri Nezhad, Ameeta Agrawal and Rhitabrat Pokharel | |
BabyLMs for isiXhosa: Data-Efficient Language Modelling in a Low-Resource Context Alexis Matzopoulos, Charl Hendriks, Hishaam Mahomed and Francois Meyer | |
Mapping Cross-Lingual Sentence Representations for Low-Resource Language Pairs Using Pre-trained Language Models Tsegaye Misikir Tashu and Andreea Ioana Tudor | |
How to age BERT Well: Continuous Training for Historical Language Adaptation Anika Harju and Rob van der Goot | |
Exploiting Task Reversibility of DRS Parsing and Generation: Challenges and Insights from a Multi-lingual Perspective Muhammad Saad Amin, Luca Anselma and Alessandro Mazzei | |
BBPOS: BERT-based Part-of-Speech Tagging for Uzbek Latofat Bobojonova, Arofat Akhundjanova, Phil Sidney Ostheimer and Sophie Fellenz | |
When Every Token Counts: Optimal Segmentation for Low-Resource Language Models Vikrant Dewangan, Bharath Raj S, Garvit Suri and Raghav Sonavane | |
IsiZulu noun classification based on replicating the ensemble approach for Runyankore Zola Mahlaza, C. Maria Keet, Imaan Sayed and Alexander Van Der Leek | |
Recent Advancements and Challenges of Turkic Central Asian Language Processing Yana Veitsman and Mareike Hartmann | |
15:00–15:30 Session 3: Language Models for Question Answering | |
15:00–15:15 | CaLQuest.PT: Towards the Collection and Evaluation of Natural Causal Ladder Questions in Portuguese for AI Agents Uriel Anderson Lasheras and Vladia Pinheiro |
15:15–15:30 | PersianMCQ-Instruct: A Comprehensive Resource for Generating Multiple-Choice Questions in Persian Kamyar Zeinalipour, Neda Jamshidi, Fahimeh Akbari, Marco Maggini, Monica Bianchini and Marco Gori |
15:30–16:00 | Coffee Break |
16:00–17:00 Session 4: Language Modelling and Evaluation | |
16:00–16:15 | Stop Jostling: Adaptive Negative Sampling Reduces the Marginalization of Low-Resource Language Tokens by Cross-Entropy Loss Galim Turumtaev |
16:15–16:30 | Towards Inclusive Arabic LLMs: A Culturally Aligned Benchmark in Arabic Large Language Model Evaluation Omer Nacar, Serry Taiseer Sibaee, Samar Ahmed, Safa Ben Atitallah, Adel Ammar, Yasser Alhabashi, Abdulrahman S. Al-Batati, Arwa Alsehibani, Nour Qandos, Omar Elshehy, Mohamed Abdelkader and Anis Koubaa |
16:30–16:45 | Controlled Evaluation of Syntactic Knowledge in Multilingual Language Models Daria Kryvosheieva and Roger Levy |
16:45–17:00 | Evaluating Large Language Models for In-Context Learning of Linguistic Patterns In Unseen Low Resource Languages Hongpu Zhu, Yuqi Liang, Wenjing Xu and Hongzhi Xu |
17:00–17:30 Session 5: Machine Translation with Language Models | |
17:00–17:15 | Next-Level Cantonese-to-Mandarin Translation: Fine-Tuning and Post-Processing with LLMs Yuqian Dai, Chun Fai Chan, Ying Ki Wong and Tsz Ho Pun |
17:15–17:30 | When LLMs Struggle: Reference-less Translation Evaluation for Low-resource Languages Archchana Sindhujan, Diptesh Kanojia, Constantin Orasan and Shenbin Qian |
17:30–18:00 | Awards and Closing Remarks |