Persian Named Entity Recognition

PEYMA
ARMAN

PEYMA

PEYMA dataset includes 7,145 sentences with a total of 302,530 tokens from which 41,148 tokens are tagged with seven different classes.

Organization
Money
Location
Date
Time
Person
Percent

Label	#
Organization	16964
Money	2037
Location	8782
Date	4259
Time	732
Person	7675
Percent	699

Download You can download the dataset from here

Cite

Please cite the following paper in your publication if you are using this dataset in your research:

@article{shahshahani2018peyma,
    title={PEYMA: A Tagged Corpus for Persian Named Entities},
    author={Mahsa Sadat Shahshahani and Mahdi Mohseni and Azadeh Shakery and Heshaam Faili},
    year=2018,
    journal={ArXiv},
    volume={abs/1801.09936}
}

ARMAN

ARMAN dataset holds 7,682 sentences with 250,015 sentences tagged over six different classes.

Organization
Location
Facility
Event
Product
Person

Label	#
Organization	30108
Location	12924
Facility	4458
Event	7557
Product	4389
Person	15645