Persian Named Entity Recognition
Table of contents
PEYMA
PEYMA dataset includes 7,145 sentences with a total of 302,530 tokens from which 41,148 tokens are tagged with seven different classes.
- Organization
- Money
- Location
- Date
- Time
- Person
- Percent
| Label | # |
|---|---|
| Organization | 16964 |
| Money | 2037 |
| Location | 8782 |
| Date | 4259 |
| Time | 732 |
| Person | 7675 |
| Percent | 699 |
Download You can download the dataset from here
Cite
Please cite the following paper in your publication if you are using this dataset in your research:
@article{shahshahani2018peyma,
title={PEYMA: A Tagged Corpus for Persian Named Entities},
author={Mahsa Sadat Shahshahani and Mahdi Mohseni and Azadeh Shakery and Heshaam Faili},
year=2018,
journal={ArXiv},
volume={abs/1801.09936}
}
ARMAN
ARMAN dataset holds 7,682 sentences with 250,015 sentences tagged over six different classes.
- Organization
- Location
- Facility
- Event
- Product
- Person
| Label | # |
|---|---|
| Organization | 30108 |
| Location | 12924 |
| Facility | 4458 |
| Event | 7557 |
| Product | 4389 |
| Person | 15645 |
Download You can download the dataset from here
Cite
Please cite the following paper in your publication if you are using this dataset in your research:
@inproceedings{poostchi2018bilstm,
title={BiLSTM-CRF for Persian Named-Entity Recognition ArmanPersoNERCorpus: the First Entity-Annotated Persian Dataset},
author={Hanieh Poostchi and Ehsan Zare Borzeshi and Massimo Piccardi},
year=2018,
booktitle={LREC}
}