HIBERT: Document Level Pre-training Of Hierarchical Bidirectional Transformers For Document Summarization | Awesome LLM Papers

HIBERT: Document Level Pre-training Of Hierarchical Bidirectional Transformers For Document Summarization

Xingxing Zhang, Furu Wei, Ming Zhou Β· Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics Β· 2019

Neural extractive summarization models usually employ a hierarchical encoder for document encoding and they are trained using sentence-level labels, which are created heuristically using rule-based methods. Training the hierarchical encoder with these inaccurate labels is challenging. Inspired by the recent work on pre-training transformer sentence encoders \cite{devlin:2018:arxiv}, we propose {\sc Hibert} (as shorthand for {\bf HI}erachical {\bf B}idirectional {\bf E}ncoder {\bf R}epresentations from {\bf T}ransformers) for document encoding and a method to pre-train it using unlabeled data. We apply the pre-trained {\sc Hibert} to our summarization model and it outperforms its randomly initialized counterpart by 1.25 ROUGE on the CNN/Dailymail dataset and by 2.0 ROUGE on a version of New York Times dataset. We also achieve the state-of-the-art performance on these two datasets.

Similar Work
Loading…