WP1's primary goal is to gather textual data from online debates in social media platforms like Reddit, Twitter, public political datasets, and niche community forums. English is the chosen language due to resource availability and project time constraints. Data collection methods, either automated (API-based) or semi-automated, will be established, adhering to legal and privacy considerations. The collected data will undergo cleaning, annotation, and segmentation for research purposes and will be made publicly accessible for further studies. The initial data collection will center around contentious topics, including public health (specifically related to COVID-19 and associated conspiracy theories) and climate change. This data will be the starting point for the development of models and annotation schemes for ethos, pathos, and reframing in WP2.
WP2 builds upon WP1 data to provide reliable annotations of online discussions for further extensive mining and analytics. It employs a dual theory-driven and corpus-driven methodology. The analysis extends the concept of argumentative patterns, originally provided in Argumentation Theory by Fransvan Eemeren and applied to large-scale online discussions. INCEpTION, a human-in-the-loop tool, is used, with machine learning recommenders based on pre-trained language models like BERT and GPT-3, enhancing efficiency with human quality control.
In WP2, an agile corpus creation method is used to generate the corpus and annotation guidelines. This iterative process begins with a corpus query, followed by guideline development, annotation, and evaluation. Early-stage analysis detects design errors, annotation issues, and guideline inadequacies. The second iteration revisits the corpus query, informed by the first's analysis, addressing any low inter-annotator agreement through guideline redesign and re-annotation. This iterative process continues until a satisfactory level of annotation reliability is achieved.
We mine implicit patterns in social media text that signal misbehaviour such as the dissemination of fake news and hate speech including the interpretation of ‘cant’ (also known as doublespeak). Recent techniques regard neural network models applied on text features that were pretrained with a contextual language model. Social media mining poses additional challenges when dealing with ill-formed and implicit language. Moreover, state-of-the-art models are good at detecting explicit content, but implicit content is much more challenging. The goal is to discover latent patterns in the data and uncover sentence meaning from other contextual data such as community characteristics and temporal information.
Building upon Argument Analytics technology (Lawrence et al 2016, see publication), this work package aims to establish a new methodology of trust analytics that allows for sense-making of patterns of communication behaviour on social media that lead to polarisation of society (Gajewska et al 2023 see publication). We focus on rhetorical strategies of ethos (appeals to speakers’ character) & pathos (appeals to emotions) such as types of emotional load, ethotic attacks, and appeals to authorities. The online fora boil down to discussing who should be trusted, i.e. who is a ‘good guy’ and who should be distrusted, i.e. who is a ‘bad guy’, what to follow and who wants to use us, and whose camp we belong to. Using trust metrics, such as intellectual humility and interpersonal civility, trust analytics will determine statistical patterns and trends in data, revealing, e.g., frequencies of personal attacks on wisdom, virtue and goodwill in debates on climate change or trends in shifting to emotionally loaded frames in conspiracy theories on vaccination.
We will develop an agent-based model and implement it in a multi-agent system to examine the impact of individual rhetorical strategies upon the spread of fake news and impact of hate speech on communities. Experiments and simulations over the evolution of models representing societies of artificial agents in which different assumptions and configurations discovered in the previous stages of the project will be executed. The dynamics of the evolution of MAS will be observed in order to provide explanations of each effect and build XAI module for recommender systems.
WP6 aims to develop technologies for mixed-initiative debates (between an artificial agent and a user, cf. Lawrence et al 2012 see publication) that will allow interventions against effects of misbehaviour of hate speech and polarisation effects. This work package will design two dialogue protocols which formalise the Socratic method of collective (guided) critical thinking. Build upon the elenctic and the maieutic method, a player S (Student) will be guided by another player T (Teacher, Socrates) by asking specific questions (according to these methods) so that in answering: (i) S is able to diagnose misbehaviour of hate speech, by steadily identifying and eliminating hypotheses that lead to contradictions; and (ii) S is able to go through a therapy in order to understand strategies used in misbehaviour and identify manipulation behind it, by critically examining, e.g. emotional appeals (pathos) and appeals to authorities (ethos).