ImageArg-Shared-Task-2023

Co-located with The 10th ArgMining Workshop at EMNLP 2023 in Singapore.

View Project Code on GitHub ImageArg/ImageArg-Shared-Task

Important Announcement:
[10/01] Full corpus is now available here.
[09/01] Paper submission due date is extended to 9/2 PM (AoE).
[08/01] Shared task leaderboard is out!
[07/19] Submission (predicted results) deadline is extended to 07/26 (AoE).
[07/17] Task submission (predicted results) portal is open here.
[07/08] Test data is out here.
[07/03] Update data-downloading scripts. Please pull the new code.

The First Shared Task in Multimodal Argument Mining

There has been a recent surge of interest in developing methods and corpora to improve and evaluate persuasiveness in natural language applications. However, these efforts have mainly focused on the textual modality, neglecting the influence of other modalities. Liu et al. introduced a new multimodal dataset called ImageArg. This dataset includes persuasive tweets along with associated images, aiming to identify the image's stance towards the tweet and determine its persuasiveness score concerning a specific topic. The ImageArg dataset is a significant step towards advancing multimodal persuasive text analysis and opens up avenues for exploring the persuasive impact of images in social media. To further this goal, we designed this shared task, which utilizes the ImageArg dataset to advance multimodal persuasiveness techniques.

Participants are welcome to submit system description papers for the shared task. Accepted papers will be published in the proceedings of The 10th ArgMining Workshop. To participate, please fill in this registration form (Closed) and feel free to join ImageArg Slack for conversations.


1. The ImageArg Shared Task

The ImageArg dataset is composed of tweets (images and text) from controversial topics, namely gun control and abortion. The dataset contains 2-dimensions of annotations: Argumentative Stance (AS), Image Persuasiveness (IP), of which each dimension addresses a unique research question: 1) AS: does the tweet have an argumentative stance? 2) IP: does the tweet image make the tweet more persuasive?

ImageArg Shared Task is divided into two subtasks. Participants can choose Task A or Task B, or both. Please be aware that some tweet content may be upsetting or triggering. Please read details about AS and IP in the paper if you are interested.

Subtask A: Argumentative Stance (AS) Classification

Given a tweet composed of a text and image, predict whether the given tweet Supports or Opposes the given topic, which is a binary classification. Two examples are shown below.

Example of the <strong> AS </strong> classification

The left tweets express strong stance towards support gun control by indicating a house bill about the requirement of background check of all gun sales. The right tweet opposes gun control because it is inclined to self-defense.

Subtask A (AS Classification) Leaderboard

Rank Team Name Attempt Score
1 KnowComp 4 0.8647    ★
2 KnowComp 5 0.8571
3 KnowComp 1 0.8528
4 Semantists 4 0.8506    ★
5 Semantists 3 0.8462
6 Semantists 5 0.8417
7 KnowComp 2 0.8365
8 Semantists 1 0.8365
9 Semantists 2 0.8365
10 KnowComp 3 0.8346
11 Mohammad Soltani 2 0.8273    ★
12 Pitt Pixels 2 0.8168    ★
13 Mohammad Soltani 1 0.8142
14 Mohammad Soltani 4 0.8093
15 GC-HUNTER 2 0.8049    ★
16 Mohammad Soltani 3 0.8000
17 Pitt Pixels 1 0.7910
18 Mohammad Soltani 5 0.7782
19 GC-HUNTER 1 0.7766
20 IUST 1 0.7754    ★
21 IUST 2 0.7752
22 Pitt Pixels 4 0.7710
23 Pitt Pixels 5 0.7415
24 KPAS 1 0.7097    ★
25 ACT-CS 4 0.6325    ★
26 ACT-CS 3 0.6178
27 ACT-CS 2 0.6116
28 ACT-CS 1 0.5863
29 IUST 3 0.5680
30 Pitt Pixels 3 0.5285
31 feeds* 1 0.4418    ★
Note: Attempt denotes submission attempt number; Score denotes F1-score. The best submission attempt from each team is labeled with a star (★) and will be used for final ranking. *Team feeds submitted predictions for only one topic by the submission deadline so only partial results are evaluated.

Subtask B: Image Persuasiveness (IP) Classification

Given a tweet composed of text and image, predict whether the image makes the tweet text more Persuasive or Not, which is a binary classification task. Two examples are shown below.

Example of the <strong> IP </strong> classification

The left tweet has an image not even relevant to gun control topic. It does not improve the persuasiveness of the left tweet that argues to focus on mental health instead of gun restriction. The tweet image on the right makes the tweet text (and its stance) more persuasive because it provides strong evidence to show the statistics of the murder rate in major U.S. cities due to restrict gun control laws, so citizens cannot easily arm themselves.

Subtask B (IP Classification) Leaderboard

Rank Team Name Attempt Score
1 feeds 1 0.5561    ★
2 KPAS 1 0.5417    ★
3 feeds 2 0.5392
4 Mohammad Soltani 5 0.5281    ★
5 Semantists 1 0.5045    ★
6 ACT-CS 1 0.5000    ★
7 Mohammad Soltani 1 0.4875
8 Mohammad Soltani 4 0.4778
9 Mohammad Soltani 3 0.4762
10 Semantists 5 0.4659
11 IUST 1 0.4609    ★
12 Mohammad Soltani 2 0.4545
13 ACT-CS 4 0.4432
14 ACT-CS 3 0.4348
15 Semantists 4 0.4222
16 Semantists 2 0.4141
17 KnowComp 1 0.3922    ★
18 GC-HUNTER 1 0.3832    ★
19 ACT-CS 2 0.3125
20 Semantists 3 0.2838
21 Pitt Pixels 1 0.1217    ★
Note: Attempt denotes submission attempt number; Score denotes F1-score. The best submission attempt from each team is labeled with a star (★) and will be used for final ranking.


2. Accepted Papers

The following system description papers submitted by the shared task participants have been peer-reviewed and accepted to the 10th ArgMining workshop, co-located at EMNLP 2023. The full proceedings are here.


3. Dataset and Shared Task Submission

The dataset to download should only be used for scientific or research purposes. Any other use is explicitly prohibited. The datasets must not be redistributed or shared in part or full with any third party per Twitter Developer Policy. Redirect interested parties to this website.

All the tweets are instantly crawled from Twitter. Organizers are aware some tweets could not be available when participants start to download (e.g., a tweet could be deleted by its author). Organizers will regularly monitor the dataset to provide data patches that will replace invalid tweets with new annotated ones. Participants are required to fill out the Google Form (Closed) in order to receive data patches and the shared task updates.

Participants are allowed to extend only the training set with further (synthetic) samples. However, if do that, participants have to describe and the algorithm which extends the training set in the system description paper submission. This algorithm must be automatically executable without any human interaction (hence, without further manual annotation or manual user feedback).

Shared Task Evaluation: F1-score of participating teams will be used for ranking, but participants are free to include other metrics (e.g., AUC) in the system description paper submissions.

Shared Task Submission: There are up to 5 submissions from different approaches (systems) allowed per team and per subtask. Participants are allowed to withdraw your submission at anytime until the final deadline by contacting the organizers.


4. Call for Papers (System Description Papers)

The ImageArg Shared Task invites the submission of system description papers from all the teams that have a successful submission to the leaderboard. Accepted papers will be published in the proceedings of The 10th ArgMining Workshop.

By default, we only accept short papers (at most 4 pages, including references and optional appendix). References do not count the 4-page limit and Appendices have no page limit. Authors will have one extra page to address reviewers' comments in their camera-ready versions. Please note that all papers will be treated equally in the workshop proceedings. Authors are expected to adhere to the ethical code set out in the ACL Code of Ethics. Submissions that violate any of the policies will be rejected without review.

At least one of the authors is required to be a reviewer and fill out this Reviewer Form. We will implement a double-blind review.

Structure of a system description paper could look as follows:

Please cite the following two papers:

@inproceedings{liu-etal-2022-imagearg,
    title = "{I}mage{A}rg: A Multi-modal Tweet Dataset for Image Persuasiveness Mining",
    author = "Liu, Zhexiong  and Guo, Meiqi  and Dai, Yue  and Litman, Diane",
    booktitle = "Proceedings of the 9th Workshop on Argument Mining",
    month = oct,
    year = "2022",
    address = "Online and in Gyeongju, Republic of Korea",
    publisher = "International Conference on Computational Linguistics",
    url = "https://aclanthology.org/2022.argmining-1.1",
    pages = "1--18"
    }
Copied
@inproceedings{liu-etal-2023-overview,
    title = "Overview of {I}mage{A}rg-2023: The First Shared Task in Multimodal Argument Mining",
    author = "Liu, Zhexiong  and Elaraby, Mohamed  and Zhong, Yang  and Litman, Diane",
    booktitle = "Proceedings of the 10th Workshop on Argument Mining",
    month = Dec,
    year = "2023",
    address = "Online and in Singapore",
    publisher = "Association for Computational Linguistics"
    }
Copied

Paper Format: EMNLP 2023 style sheets.

Paper Submission: Please select ImageArg Shared Task from the Submission Categories dropdown located in the middle of the Submission Form; otherwise, the organizers may not be able to receive your paper.


5. Timeline


6. Terms and Conditions

By participating in this task you agree to these terms and conditions. If, however, one or more of these conditions is a concern for you, email us, and we will consider if an exception can be made.


7. Organizing Committee


8. Program Committee