Persian Text Classification
Table of contents
Digikala Magazine (DigiMag)
A total of 8,515 articles scraped from Digikala Online Magazine. This dataset includes seven different classes.
- Video Games
- Shopping Guide
- Health Beauty
- Science Technology
- General
- Art Cinema
- Books Literature
Label | # |
---|---|
Video Games | 1967 |
Shopping Guide | 125 |
Health Beauty | 1610 |
Science Technology | 2772 |
General | 120 |
Art Cinema | 1667 |
Books Literature | 254 |
Download You can download the dataset from here
Cite
Please cite the following paper in your publication if you are using this dataset in your research:
@article{ParsBERT,
title={ParsBERT: Transformer-based Model for Persian Language Understanding},
author={Mehrdad Farahani, Mohammad Gharachorloo, Marzieh Farahani, Mohammad Manthouri},
journal={ArXiv},
year={2020},
volume={abs/2005.12515}
}
Persian News
A dataset of various news articles scraped from different online news agencies’ websites. The total number of articles is 16,438, spread over eight different classes.
- Economic
- International
- Political
- Science Technology
- Cultural Art
- Sport
- Medical
Label | # |
---|---|
Social | 2170 |
Economic | 1564 |
International | 1975 |
Political | 2269 |
Science Technology | 2436 |
Cultural Art | 2558 |
Sport | 1381 |
Medical | 2085 |
Download You can download the dataset from here
Cite
Please cite the following paper in your publication if you are using this dataset in your research:
@article{ParsBERT,
title={ParsBERT: Transformer-based Model for Persian Language Understanding},
author={Mehrdad Farahani, Mohammad Gharachorloo, Marzieh Farahani, Mohammad Manthouri},
journal={ArXiv},
year={2020},
volume={abs/2005.12515}
}