Single or double: On number of classifier layers for small language models

期刊论文会议论文科研项目获奖成果本校成果最新成果

文献求助
导出
导出题录

CPCI-S

Link by DOI

作者： Unal, Muhammed Cihat;Zaval, Mounes;Gerek, Aydin

通讯作者： nal, MC

作者机构： Huawei Turkey, Ctr Res & Dev, Istanbul, Turkiye.

通讯机构： Huawei Turkey, Ctr Res & Dev, Istanbul, Turkiye.

语种：英文

关键词： BERT,Distill-Bert,AlBert,TinyBert,Text classification,MLP

期刊： 2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU

ISSN： 2165-0608

年： 2023

会议名称： 31st IEEE Conference on Signal Processing and Communications Applications (SIU)

会议时间： JUL 05-08, 2023

会议地点： Istanbul Tech Univ, Ayazaga Campus, Istanbul, TURKEY

DOI： 10.1109/SIU59756.2023.10223825

摘要： Pretrained transformer language models such as BERT have become very popular among NLP practitioners, and as such have been applied to a variety of tasks. BERT's strength derives from producing contextual vector representations for text, which are then traditionally used with a linear classifier (a single fully connected layer). One of the less famous NLP tasks that BERT has been applied to is address parsing, which deals with breaking an address into its various components, such as street name or door number. In an earlier study on address parsing it was observed that replacing the traditional single layer linear classifier with a double layer MLP (Multi-Layer Perceptron...

导出

关闭