Author Profiling Using Semantic and Syntactic Features
Notebook for PAN at CLEF 2019
Document identifier: oai:DiVA.org:ltu-76936
Keyword: Natural Sciences,
Computer and Information Sciences,
Computer Sciences,
Naturvetenskap,
Data- och informationsvetenskap,
Datavetenskap (datalogi),
Maskininlärning,
Machine LearningPublication year: 2019Relevant Sustainable Development Goals (SDGs):
The SDG label(s) above have been assigned by OSDG.aiAbstract: In this paper we present an approach for the PAN 2019 Author Profiling challenge. The task here is to detect Twitter bots and also to classify the gender of human Twitter users as male or female, based on a hundred select tweets from their profile. Focusing on feature engineering, we explore the semantic categories present in tweets. We combine these semantic features with part of speech tags and other stylistic features – e.g. character floodings and the use of capital letters – for our eventual feature set. We have experimented with different machine learning techniques, including ensemble techniques, and found AdaBoost to be the most successful (attaining an F1-score of 0.99 on the development set). Using this technique, we achieved an accuracy score of 89.17% for English language tweets in the bot detection subtask
Authors
György Kovács
Luleå tekniska universitet; EISLAB; MTA-SZTE Research Group on Artificial Intelligence, Szeged, Hungary
Other publications
>>
Vanda Balogh
Institute of Informatics, University of Szeged, Szeged, Hungary
Other publications
>>
Purvnashi Mehta
MindGarage, Kaiserslautern, Germany
Other publications
>>
Kumar Shridhar
MindGarage, Kaiserslautern, Germany
Other publications
>>
Pedro Alonso
Luleå tekniska universitet; EISLAB
Other publications
>>
Marcus Liwicki
Luleå tekniska universitet; EISLAB
Other publications
>>
Record metadata
Click to view metadata
header:
identifier: oai:DiVA.org:ltu-76936
datestamp: 2021-04-19T12:56:03Z
setSpec: SwePub-ltu
metadata:
mods:
@attributes:
version: 3.7
recordInfo:
recordContentSource: ltu
recordCreationDate: 2019-11-28
identifier: http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-76936
titleInfo:
@attributes:
lang: eng
title: Author Profiling Using Semantic and Syntactic Features
subTitle: Notebook for PAN at CLEF 2019
abstract: In this paper we present an approach for the PAN 2019 Author Profiling challenge. The task here is to detect Twitter bots and also to classify the gender of human Twitter users as male or female based on a hundred select tweets from their profile. Focusing on feature engineering we explore the semantic categories present in tweets. We combine these semantic features with part of speech tags and other stylistic features – e.g. character floodings and the use of capital letters – for our eventual feature set. We have experimented with different machine learning techniques including ensemble techniques and found AdaBoost to be the most successful (attaining an F1-score of 0.99 on the development set). Using this technique we achieved an accuracy score of 89.17% for English language tweets in the bot detection subtask
subject:
@attributes:
lang: eng
authority: uka.se
topic:
Natural Sciences
Computer and Information Sciences
Computer Sciences
@attributes:
lang: swe
authority: uka.se
topic:
Naturvetenskap
Data- och informationsvetenskap
Datavetenskap (datalogi)
@attributes:
lang: swe
authority: ltu
topic: Maskininlärning
genre: Research subject
@attributes:
lang: eng
authority: ltu
topic: Machine Learning
genre: Research subject
language:
languageTerm: eng
genre:
conference/other
ref
note:
Published
6
name:
@attributes:
type: personal
authority: ltu
namePart:
Kovács
György
1984-
role:
roleTerm: aut
affiliation:
Luleå tekniska universitet
EISLAB
MTA-SZTE Research Group on Artificial Intelligence Szeged Hungary
nameIdentifier:
gyokov
0000-0002-0546-116X
@attributes:
type: personal
namePart:
Balogh
Vanda
role:
roleTerm: aut
affiliation: Institute of Informatics University of Szeged Szeged Hungary
@attributes:
type: personal
namePart:
Mehta
Purvnashi
role:
roleTerm: aut
affiliation: MindGarage Kaiserslautern Germany
@attributes:
type: personal
namePart:
Shridhar
Kumar
role:
roleTerm: aut
affiliation: MindGarage Kaiserslautern Germany
@attributes:
type: personal
authority: ltu
namePart:
Alonso
Pedro
1986-
role:
roleTerm: aut
affiliation:
Luleå tekniska universitet
EISLAB
nameIdentifier:
pedalo
0000-0002-6785-4356
@attributes:
type: personal
authority: ltu
namePart:
Liwicki
Marcus
role:
roleTerm: aut
affiliation:
Luleå tekniska universitet
EISLAB
nameIdentifier:
marliw
0000-0003-4029-6574
originInfo:
dateIssued: 2019
publisher: RWTH Aachen University
relatedItem:
@attributes:
type: host
titleInfo:
title: CLEF 2019 Working Notes
subTitle: Working Notes of CLEF 2019 - Conference and Labs of the Evaluation Forum
@attributes:
type: series
titleInfo:
title: CEUR Workshop Proceedings
partNumber: 2380
identifier: 1613-0073
location:
url: http://ceur-ws.org/Vol-2380/
physicalDescription:
form: print
typeOfResource: text