Single or double: On number of classifier layers for small language models

  • CPCI-S
作者: Unal, Muhammed Cihat;Zaval, Mounes;Gerek, Aydin
通讯作者: nal, MC
作者机构: Huawei Turkey, Ctr Res & Dev, Istanbul, Turkiye.
通讯机构: Huawei Turkey, Ctr Res & Dev, Istanbul, Turkiye.
语种: 英文
关键词: BERT,Distill-Bert,AlBert,TinyBert,Text classification,MLP
期刊: 2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU
ISSN: 2165-0608
年: 2023
会议名称: 31st IEEE Conference on Signal Processing and Communications Applications (SIU)
会议时间: JUL 05-08, 2023
会议地点: Istanbul Tech Univ, Ayazaga Campus, Istanbul, TURKEY
摘要: Pretrained transformer language models such as BERT have become very popular among NLP practitioners, and as such have been applied to a variety of tasks. BERT's strength derives from producing contextual vector representations for text, which are then traditionally used with a linear classifier (a single fully connected layer). One of the less famous NLP tasks that BERT has been applied to is address parsing, which deals with breaking an address into its various components, such as street name or door number. In an earlier study on address parsing it was observed that replacing the traditional single layer linear classifier with a double layer MLP (Multi-Layer Perceptron...

文件格式:
导出字段:
导出
关闭