The 31st International Conference on Computational Linguistics

PROGRAM

 STAND-Guard: A Small Task-Adaptive Content Moderation Model
Minjia Wang, Pingping Lin, Siqi Cai, Shengnan An, Shengjie Ma, Zeqi Lin, Congrui Huang and Bixiong Xu
 Query-LIFE: Query-aware Language Image Fusion Embedding for E-Commerce Relevance
Hai Zhu, Yuankai Guo, Ronggang Dou and Kai Liu
 Improving Tool Retrieval by Leveraging Large Language Models for Query Generation
Mohammad Kachuee, Sarthak Ahuja, Vaibhav Kumar, Puyang Xu and Xiaohu Liu
 Know Your RAG: Dataset Taxonomy and Generation Strategies for Evaluating RAG Systems
Rafael Teixeira de Lima, Shubham Gupta, Cesar Berrospi Ramis, Lokesh Mishra, Michele Dolfi, Peter Staar and Panagiotis Vagenas
 RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Linguistic Classifiers
David Farr, Nico Manzonelli, Iain Cruickshank and Jevin West
 Beyond Visual Understanding Introducing PARROT-360V for Vision Language Model Benchmarking
Harsha Vardhan Khurdula, Basem Rizk and Indus Khaitan
 PDC & DM-SFT: A Road for LLM SQL Bug-Fix Enhancing
Yiwen Duan, Yonghong Yu, Xiaoming Zhao, Yichang Wu and Wenbo Liu
 Multilingual Continual Learning using Attention Distillation
Sanjay Agrawal, Deep Nayak and Vivek Varadarajan Sembium
 FS-DAG: Few Shot Domain Adapting Graph Networks for Visually Rich Document Understanding
Amit Agarwal, Srikant Panda and Kulbhushan Pachauri
 OKG: On-the-Fly Keyword Generation in Sponsored Search Advertising
Zhao Wang, Briti Gangopadhyay, Mengjie Zhao and Shingo Takamatsu
 Best Practices for Distilling Large Language Models into BERT for Web Search Ranking
Dezhi Ye, Junwei Hu, Jiabin Fan, Bowen Tian, Jie Liu, Haijin Liang and Jin Ma
 Rationale-Guided Distillation for E-Commerce Relevance Classification: Bridging Large Language Models and Lightweight Cross-Encoders
Sanjay Agrawal, Faizan Ahemad and Vivek Varadarajan Sembium
 Automated Clinical Data Extraction with Knowledge Conditioned LLMs
Diya Li, Asim Kadav, Aijing Gao, Rui Li and Richard Bourgon
 Can Large Language Models Serve as Effective Classifiers for Hierarchical Multi-Label Classification of Scientific Documents at Industrial Scale?
Seyed Amin Tabatabaei, Sarah Fancher, Michael Parsons and Arian Askari
 EDAR: A pipeline for Emotion and Dialogue Act Recognition
Elie Dina, Rania Ayachi Kibech and Miguel Couceiro
 No Size Fits All: The Perils and Pitfalls of Leveraging LLMs Vary with Company Size
Ashok Urlana, Charaka Vinayak Kumar, Bala Mallikarjunarao Garlapati, Ajeet Kumar Singh and Rahul Mishra
 Predicting Fine-tuned Performance on Larger Datasets Before Creating Them
Toshiki Kuramoto and Jun Suzuki
 A Recipe For Building a Compliant Real Estate Chatbot
Navid Madani, Anusha Bagalkotkar, Supriya Anand, Gabriel Arnson, Rohini K. Srihari and Kenneth Joseph
 Geo-Spatially Informed Models for Geocoding Unstructured Addresses
Uddeshya Singh, Devanapalli Ravi Shankar, Gowtham Bellala and Vikas Goel
 Resource-Efficient Anonymization of Textual Data via Knowledge Distillation from Large Language Models
Tobias Deußer, Max Hahnbück, Tobias Uelwer, Cong Zhao, Christian Bauckhage and Rafet Sifa
 Fine-Tuning Medium-Scale LLMs for Joint Intent Classification and Slot Filling: A Data-Efficient and Cost-Effective Solution for SMEs
Maia Aguirre, Ariane Méndez, Arantza del Pozo, Maria Ines Torres and Manuel Torralbo
 Enhancing Large Language Models for Scientific Multimodal Summarization with Multimodal Output
Zusheng TAN, Xinyi Zhong, Jing-Yu Ji, Wei JIANG and Billy Chiu
 "Stupid robot, I want to speak to a human!" User Frustration Detection in Task-Oriented Dialog Systems
Mireia Hernandez Caralt, Ivan Sekulic, Filip Carevic, Nghia Khau, Diana Nicoleta Popa, Bruna Guedes, Victor Guimaraes, Zeyu Yang, Andre Manso, Meghana Reddy, Paolo Rosso and Roland Mathis
 LLM Evaluate: An Industry-Focused Evaluation Tool for Large Language Models
Harsh Saini, Md Tahmid Rahman Laskar, Cheng Chen, Elham Mohammadi and David Rossouw
 Enhancing Future Link Prediction in Quantum Computing Semantic Networks through LLM-Initiated Node Features
Gilchan Park, Paul Baity, Byung-Jun Yoon and Adolfy Hoisie
 Page Stream Segmentation with LLMs: Challenges and Applications in Insurance Document Automation
Hunter Heidenreich, Ratish Dalvi, Nikhil Verma and Yosheb Getachew
 Graph-Augmented Open-Domain Multi-Document Summarization
Xiaoping SHEN and Yekun Chai
 Improve Speech Translation Through Text Rewrite
Jing Wu, Shushu Wang, Kai Fan, Wei Luo, Minpeng Liao and Zhongqiang Huang
 CarMem: Enhancing Long-Term Memory in LLM Voice Assistants through Category-Bounding
Johannes Kirmayr, Lukas Stappen, Phillip Schneider, Florian Matthes and Elisabeth Andre
 XTR meets ColBERTv2: Adding ColBERTv2 Optimizations to XTR
Riyaz Ahmad Bhat and Jaydeep Sen
 sDPO: Don’t Use Your Data All at Once
Dahyun Kim, Yungi Kim, Wonho Song, Hyeonwoo Kim, Yunsu Kim, SANGHOON KIM and Chanjun Park
 Contextual ASR Error Handling with LLMs Augmentation for Goal-Oriented Conversational AI
Yuya Asano, Sabit Hassan, Paras Sharma, Anthony B. Sicilia, Katherine Atwell, Diane Litman and Malihe Alikhani
 Federated Retrieval Augmented Generation for Multi-Product Question Answering
Parshin Shojaee, Sai Sree Harsha, Dan Luo, Akash Maharaj, Tong Yu and Yunyao Li
 Luna: A Lightweight Evaluation Model to Catch Language Model Hallucinations with High Accuracy and Low Cost
Masha Belyi, Robert Friel, Shuai Shao and Atindriyo Sanyal
 Seeing Beyond: Enhancing Visual Question Answering with Multi-Modal Retrieval
Boqi Chen, Anuj Khare, Gaurav Kumar, Arjun Akula and Pradyumna Narayana
 AutoProteinEngine: A Large Language Model Driven Agent Framework for Multimodal AutoML in Protein Engineering
Yungeng Liu, Zan Chen, Yuguang Wang and Yiqing Shen
 Building a Family of Data Augmentation Models for Low-cost LLM Fine-tuning on the Cloud
Chengyu Wang, Yuanhao Yue, jun huang and Peng Wang
 Where do LLMs Encode the Knowledge to Assess the Ambiguity?
Hancheol Park and Geonmin Kim
 On the effective transfer of knowledge from English to Hindi Wikipedia
Paramita Das, Amartya Roy, Ritabrata Chakraborty and Animesh Mukherjee
 BackMATH: Towards Backward Reasoning for Solving Math Problems Step by Step
Shaowei Zhang and Deyi Xiong
 Deploying Multi-task Online Server with Large Language Model
Yincen Qu, Hengyue Liu, Kun Wang, Xiangying Dai, Xiaoou Lu, Hui Zhou and Chao Ma
 LLM-Friendly Knowledge Representation for Customer Support
Hanchen Su, Wei Luo, Yashar Mehdad, Wei Han, Elaine Liu, Wayne Zhang, Mia Zhao and Joy Zhang
 Leveraging Multilingual Models for Robust Grammatical Error Correction Across Low-Resource Languages
Divesh Ramesh Kubal and Apurva Shrikant Nagvenkar
 A Simple yet Efficient Prompt Compression Method for Text Classification Data Annotation Using LLM
Yiran Xie, Debin Xiao, Ping Wang and Shuming Liu
 AMAN: Agent for Mentoring and Assisting Newbies in MMORPG
Jeehyun Lee, Seung-Moo Yang and Won Ik Cho
 KARRIEREWEGE: A large scale Career Path Prediction Dataset
Elena Senger, Yuri Campbell, Rob van der Goot and Barbara Plank
 Transforming Code Understanding: Clustering-Based Retrieval for Improved Summarization in Domain-Specific Languages
Baban Gain, Dibyanayan Bandyopadhyay, Samrat Mukherjee, Aryan Sahoo, Saswati Dana, Palanivel Kodeswaran, Sayandeep Sen, Asif Ekbal and Dinesh Garg
 Is my Meeting Summary Good? Estimating Quality with a Multi-LLM Evaluator
Frederic Thomas Kirstein, Terry Lima Ruas and Bela Gipp
 Learning to Rewrite Negation Queries in Product Search
Mengtian Guo, Mutasem Al-Darabsah, Choon Hui Teo, Jonathan May, Tarun Agarwal and Rahul Bhagat
 LAW: Legal Agentic Workflows for Custody and Fund Services Contracts
William Watson, Nicole Cho, Nishan Srishankar, Zhen Zeng, Lucas Cecchi, Daniel Scott, Suchetha Siddagangappa, Rachneet Kaur, Tucker Balch and Manuela Veloso
 UR2N: Unified Retriever and ReraNker
Riyaz Ahmad Bhat, Jaydeep Sen, Rudra Murthy and Vignesh P
 An Automatic Method to Estimate Correctness of RAG
Chi Zhang, Vivek V. Datla, Aditya Shrivastava, Alfy Samuel, Zhiqi Huang, Anoop Kumar and Daben Liu
 DaCoM: Strategies to Construct Domain-specific Low-resource Language Machine Translation Dataset
Junghoon Kang, Keunjoo Tak, Joungsu Choi, Myunghyun Kim, Junyoung Jang and Youjin Kang
 ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild
Ahmed Masry, Megh Thakkar, Aayush Bajaj, Aaryaman Kartha, Enamul Hoque and Shafiq Joty
 LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Akshara Prabhakar, Yuanzhi Li, Karthik Narasimhan, Sham Kakade, Eran Malach and Samy Jelassi
 Aurora-M: Open Source Continual Pre-training for Multilingual Language and Code
Taishi Nakamura, Mayank Mishra, Simone Tedeschi, Yekun Chai, Jason T. Stillerman, Felix Friedrich, Prateek Yadav, Tanmay Laud, Vu Minh Chien, Terry Yue Zhuo, Diganta Misra, Ben Bogin, Xuan-Son Vu, Marzena Karpinska, Arnav Varma Dantuluri, Wojciech Kusa, Tommaso Furlanello, Rio Yokota, Niklas Muennighoff, Suhas Pai, Tosin Adewumi, Veronika Laippala, Xiaozhe Yao, Adalberto Barbosa Junior, Aleksandr Drozd, Jordan Clive, Kshitij Gupta, Liangyu Chen, Qi Sun, Ken Tsui, Nour Moustafa-Fahmy, Nicolo Monti, Tai Dang, Ziyang Luo, Tien-Tung Bui, Roberto Navigli, Virendra Mehta, Matthew Blumberg, Victor May, Hiep Nguyen and Sampo Pyysalo
 UCTG: A Unified Controllable Text Generation Framework for Query Auto-Completion
Zhipeng Li, Shuang Zheng, Jiaping Xiao, Xianneng Li and Lei Wang
 Lightweight Safety Guardrails Using Fine-tuned BERT Embeddings
Aaron Zheng, Mansi Rana and Andreas Stolcke
 Zero-shot Slot Filling in the Age of LLMs for Dialogue Systems
Mansi Rana, Kadri Hacioglu, Sindhuja Gopalan and Maragathamani Boothalingam
 LionGuard: A Contextualized Moderation Classifier to Tackle Localized Unsafe Content
Jessica Foo and Shaun Khoo
 REVerSum: A Multi-staged Retrieval-Augmented Generation Method to Enhance Wikipedia Tail Biographies through Personal Narratives
Sayantan Adak, Pauras Mangesh Meher, Paramita Das and Animesh Mukherjee
 RE-FIN: Retrieval-based Enrichment for Financial data
Lorenzo Malandri, Fabio Mercorio, Mario Mezzanzanica and Filippo Pallucchini
 SCV: Light and Effective Multi-Vector Retrieval with Sequence Compressive Vectors
Cheoneum Park, Seohyeong Jeong, Minsang Kim, KyungTae Lim and Yong-Hun Lee
 Efficient Vocabulary Reduction for Small Language Models
Yuta Nozaki, Dai Nakashima, Ryo Sato and Naoki Asaba
 Towards Boosting LLMs-driven Relevance Modeling with Progressive Retrieved Behavior-augmented Prompting
Zeyuan Chen, Haiyan Wu, Kaixin Wu, Wei Chen, Mingjie Zhong, Jia Xu, Zhongyi Liu and Wei Zhang
 LLM ContextBridge: A Hybrid Approach for Intent and Dialogue Understanding in IVSR
Changwoo Chun, Daniel Rim and Juhee Park
 Neural Document Segmentation Using Weighted Sliding Windows with Transformer Encoders
Saeed Abbasi, Aijun An, Heidar Davoudi, Ron Di Carlantonio and Gary Farmaner
 RecStream: Graph-aware Stream Management for Concurrent Recommendation Model Online Serving
Shuxi Guo, Qi Qi, haifeng sun, Jianxin Liao and Jingyu Wang
 Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models
Wenting Tan, Dongxiao Chen, Jieting Xue, Zihao Wang and Taijie Chen