The First Workshop on Language Models for Low-Resource Languages

PROGRAM

 Overview of the First Workshop on Language Models for Low-Resource Languages (LoResLM 2025)
Hansi Hettiarachchi, Tharindu Ranasinghe, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan and Lasitha Randunu Chandrakantha Uyangodage

Monday, January 20, 2025

08:45–09:00Opening Remarks
 09:00–10:00 Invited Talk: Jose Camacho-Collados (Cardiff University)
 10:00–10:30 Session 1: Language Modelling
10:00–10:15Atlas-Chat: Adapting Large Language Models for Low-Resource Moroccan Arabic Dialect
Guokan Shang, Hadi Abdine, Yousef Khoubrane, Amr Mohamed, Yassine ABBAHADDOU, Sofiane Ennadir, Imane Momayiz, Xuguang Ren, Eric Moulines, Preslav Nakov, Michalis Vazirgiannis and Eric Xing
10:15–10:30Empowering Persian LLMs for Instruction Following: A Novel Dataset and Training Approach
Hojjat Mokhtarabadi, Ziba Zamani, Abbas Maazallahi and Mohammad Hossein Manshaei
10:30–11:00Coffee Break
 11:00–12:00 Poster Session 1: Language Model Applications/ Sentiment Analysis/ Machine Translation
 BnSentMix: A Diverse Bengali-English Code-Mixed Dataset for Sentiment Analysis
Sadia Alam, Md Farhan Ishmam, Navid Hasin Alvee, Md Shahnewaz Siddique, Md Azam Hossain and Abu Raihan Mostofa Kamal
 Using Language Models for assessment of users’ satisfaction with their partner in Persian
Zahra Habibzadeh and Masoud Asadpour
 Enhancing Plagiarism Detection in Marathi with a Weighted Ensemble of TF-IDF and BERT Embeddings for Low-Resource Language Processing
Atharva Mutsaddi and Aditya Prashant Choudhary
 Investigating the Impact of Language-Adaptive Fine-Tuning on Sentiment Analysis in Hausa Language Using AfriBERTa
Sani Abdullahi Sani, Shamsuddeen Hassan Muhammad and Devon Jarvis
 Automated Collection of Evaluation Dataset for Semantic Search in Low-Resource Domain Language
Anastasia Zhukova, Christian E. Matt and Bela Gipp
 Filipino Benchmarks for Measuring Sexist and Homophobic Bias in Multilingual Language Models from Southeast Asia
Lance Calvin Lim Gamboa and Mark Lee
 Does Machine Translation Impact Offensive Language Identification? The Case of Indo-Aryan Languages
Alphaeus Dmonte, Shrey Satapara, Rehab Alsudais, Tharindu Ranasinghe and Marcos Zampieri
 Exploiting Word Sense Disambiguation in Large Language Models for Machine Translation
Van-Hien Tran, Raj Dabre, Hour Kaing, Haiyue Song, Hideki Tanaka and Masao Utiyama
 Low-Resource Interlinear Translation: Morphology-Enhanced Neural Models for Ancient Greek
Maciej Rapacz and Aleksander Smywiński-Pohl
 Language verY Rare for All
Ibrahim Merad, Amos Wolf, Ziad Mazzawi and Yannick Léo
 Improving LLM Abilities in Idiomatic Translation
Sundesh Donthi, Maximilian Spencer, Om B. Patel, Joon Young Doh, Eid Rodan, Kevin Zhu and Sean O’Brien
 12:00–13:00 Session 2: Language Model Applications
12:00–12:15A Comparative Study of Static and Contextual Embeddings for Analyzing Semantic Changes in Medieval Latin Charters
Yifan Liu, Gelila Tilahun, Xinxiang Gao, Qianfeng Wen and Michael Gervers
12:15–12:30From Arabic Text to Puzzles: LLM-Driven Development of Arabic Educational Crosswords
Kamyar Zeinalipour, Moahmmad Saad, Marco Maggini and Marco Gori
12:30–12:45Bridging Literacy Gaps in African Informal Business Management with Low-Resource Conversational Agents
Maimouna Ouattara, Abdoul Kader Kaboré, Jacques Klein and Tegawendé F. Bissyandé
12:45–13:00Social Bias in Large Language Models For Bangla: An Empirical Study on Gender and Religious Bias
Jayanta Sadhu, Maneesha Rani Saha and Rifat Shahriyar
13:00–14:00Lunch Break
 14:00–15:00 Poster Session 2: Language Modelling/ Linguistic Insights, Parsing and Semantic Tagging with Language Models
 Extracting General-use Transformers for Low-resource Languages via Knowledge Distillation
Jan Christian Blaise Cruz
 Beyond Data Quantity: Key Factors Driving Performance in Multilingual Language Models
Sina Bagheri Nezhad, Ameeta Agrawal and Rhitabrat Pokharel
 BabyLMs for isiXhosa: Data-Efficient Language Modelling in a Low-Resource Context
Alexis Matzopoulos, Charl Hendriks, Hishaam Mahomed and Francois Meyer
 Mapping Cross-Lingual Sentence Representations for Low-Resource Language Pairs Using Pre-trained Language Models
Tsegaye Misikir Tashu and Andreea Ioana Tudor
 How to age BERT Well: Continuous Training for Historical Language Adaptation
Anika Harju and Rob van der Goot
 Exploiting Task Reversibility of DRS Parsing and Generation: Challenges and Insights from a Multi-lingual Perspective
Muhammad Saad Amin, Luca Anselma and Alessandro Mazzei
 BBPOS: BERT-based Part-of-Speech Tagging for Uzbek
Latofat Bobojonova, Arofat Akhundjanova, Phil Sidney Ostheimer and Sophie Fellenz
 When Every Token Counts: Optimal Segmentation for Low-Resource Language Models
Vikrant Dewangan, Bharath Raj S, Garvit Suri and Raghav Sonavane
 IsiZulu noun classification based on replicating the ensemble approach for Runyankore
Zola Mahlaza, C. Maria Keet, Imaan Sayed and Alexander Van Der Leek
 Recent Advancements and Challenges of Turkic Central Asian Language Processing
Yana Veitsman and Mareike Hartmann
 15:00–15:30 Session 3: Language Models for Question Answering
15:00–15:15CaLQuest.PT: Towards the Collection and Evaluation of Natural Causal Ladder Questions in Portuguese for AI Agents
Uriel Anderson Lasheras and Vladia Pinheiro
15:15–15:30PersianMCQ-Instruct: A Comprehensive Resource for Generating Multiple-Choice Questions in Persian
Kamyar Zeinalipour, Neda Jamshidi, Fahimeh Akbari, Marco Maggini, Monica Bianchini and Marco Gori
15:30–16:00Coffee Break
 16:00–17:00 Session 4: Language Modelling and Evaluation
16:00–16:15Stop Jostling: Adaptive Negative Sampling Reduces the Marginalization of Low-Resource Language Tokens by Cross-Entropy Loss
Galim Turumtaev
16:15–16:30Towards Inclusive Arabic LLMs: A Culturally Aligned Benchmark in Arabic Large Language Model Evaluation
Omer Nacar, Serry Taiseer Sibaee, Samar Ahmed, Safa Ben Atitallah, Adel Ammar, Yasser Alhabashi, Abdulrahman S. Al-Batati, Arwa Alsehibani, Nour Qandos, Omar Elshehy, Mohamed Abdelkader and Anis Koubaa
16:30–16:45Controlled Evaluation of Syntactic Knowledge in Multilingual Language Models
Daria Kryvosheieva and Roger Levy
16:45–17:00Evaluating Large Language Models for In-Context Learning of Linguistic Patterns In Unseen Low Resource Languages
Hongpu Zhu, Yuqi Liang, Wenjing Xu and Hongzhi Xu
 17:00–17:30 Session 5: Machine Translation with Language Models
17:00–17:15Next-Level Cantonese-to-Mandarin Translation: Fine-Tuning and Post-Processing with LLMs
Yuqian Dai, Chun Fai Chan, Ying Ki Wong and Tsz Ho Pun
17:15–17:30When LLMs Struggle: Reference-less Translation Evaluation for Low-resource Languages
Archchana Sindhujan, Diptesh Kanojia, Constantin Orasan and Shenbin Qian
17:30–18:00Awards and Closing Remarks