Persian Named Entity Recognition
Table of contents
PEYMA
PEYMA dataset includes 7,145 sentences with a total of 302,530 tokens from which 41,148 tokens are tagged with seven different classes.
- Organization
- Money
- Location
- Date
- Time
- Person
- Percent
Label | # |
---|---|
Organization | 16964 |
Money | 2037 |
Location | 8782 |
Date | 4259 |
Time | 732 |
Person | 7675 |
Percent | 699 |
Download You can download the dataset from here
Cite
Please cite the following paper in your publication if you are using this dataset in your research:
@article{shahshahani2018peyma,
title={PEYMA: A Tagged Corpus for Persian Named Entities},
author={Mahsa Sadat Shahshahani and Mahdi Mohseni and Azadeh Shakery and Heshaam Faili},
year=2018,
journal={ArXiv},
volume={abs/1801.09936}
}
ARMAN
ARMAN dataset holds 7,682 sentences with 250,015 sentences tagged over six different classes.
- Organization
- Location
- Facility
- Event
- Product
- Person
Label | # |
---|---|
Organization | 30108 |
Location | 12924 |
Facility | 4458 |
Event | 7557 |
Product | 4389 |
Person | 15645 |
Download You can download the dataset from here
Cite
Please cite the following paper in your publication if you are using this dataset in your research:
@inproceedings{poostchi2018bilstm,
title={BiLSTM-CRF for Persian Named-Entity Recognition ArmanPersoNERCorpus: the First Entity-Annotated Persian Dataset},
author={Hanieh Poostchi and Ehsan Zare Borzeshi and Massimo Piccardi},
year=2018,
booktitle={LREC}
}