BERT-VBD: Vietnamese Multi-Document Summarization Framework
Published in CITA 2024: The 13th Conference on Information Technology and its Applications, 2024
Abstract
Multi-document summarization (MDS) remains a challenging task in natural language processing, particularly for low-resource languages like Vietnamese. This paper introduces BERT-VBD, a novel framework specifically designed for Vietnamese multi-document summarization that leverages the power of pre-trained language models.
Our approach integrates a fine-tuned BERT model for Vietnamese with an innovative document representation technique and extractive summarization method. The BERT-VBD framework processes multiple Vietnamese documents by first encoding semantic information using the adapted language model, then employs a modified ranking algorithm to identify and extract the most informative sentences across documents.
Experimental results on Vietnamese news article datasets demonstrate that BERT-VBD outperforms traditional extractive summarization methods and general-purpose language models not specifically adapted for Vietnamese. The framework shows significant improvements in ROUGE scores and human evaluation metrics, highlighting the effectiveness of our approach for Vietnamese language summarization. Our work contributes to advancing natural language processing capabilities for Vietnamese and provides a foundation for developing more sophisticated summarization systems for low-resource languages.
Keywords
Natural Language Processing, Multi-Document Summarization, BERT, Vietnamese Language Processing, Extractive Summarization
Recommended citation: Vuong Tuan-Cuong, Trang Mai Xuan, Luong Thien Van. (2024). "BERT-VBD: Vietnamese Multi-Document Summarization Framework." CITA 2024: The 13th Conference on Information Technology and its Applications, pp. 1798-1804.
Download Paper