Link

Persian Named Entity Recognition

Table of contents

  1. PEYMA
  2. ARMAN

PEYMA

PEYMA dataset includes 7,145 sentences with a total of 302,530 tokens from which 41,148 tokens are tagged with seven different classes.

  1. Organization
  2. Money
  3. Location
  4. Date
  5. Time
  6. Person
  7. Percent
Label #
Organization 16964
Money 2037
Location 8782
Date 4259
Time 732
Person 7675
Percent 699

Download You can download the dataset from here

Cite

Please cite the following paper in your publication if you are using this dataset in your research:

@article{shahshahani2018peyma,
    title={PEYMA: A Tagged Corpus for Persian Named Entities},
    author={Mahsa Sadat Shahshahani and Mahdi Mohseni and Azadeh Shakery and Heshaam Faili},
    year=2018,
    journal={ArXiv},
    volume={abs/1801.09936}
}

ARMAN

ARMAN dataset holds 7,682 sentences with 250,015 sentences tagged over six different classes.

  1. Organization
  2. Location
  3. Facility
  4. Event
  5. Product
  6. Person
Label #
Organization 30108
Location 12924
Facility 4458
Event 7557
Product 4389
Person 15645

Download You can download the dataset from here

Cite

Please cite the following paper in your publication if you are using this dataset in your research:

@inproceedings{poostchi2018bilstm,
    title={BiLSTM-CRF for Persian Named-Entity Recognition ArmanPersoNERCorpus: the First Entity-Annotated Persian Dataset},
    author={Hanieh Poostchi and Ehsan Zare Borzeshi and Massimo Piccardi},
    year=2018,
    booktitle={LREC}
}