Persian Text Classification
Table of contents
Digikala Magazine (DigiMag)
A total of 8,515 articles scraped from Digikala Online Magazine. This dataset includes seven different classes.
- Video Games
- Shopping Guide
- Health Beauty
- Science Technology
- General
- Art Cinema
- Books Literature
| Label | # |
|---|---|
| Video Games | 1967 |
| Shopping Guide | 125 |
| Health Beauty | 1610 |
| Science Technology | 2772 |
| General | 120 |
| Art Cinema | 1667 |
| Books Literature | 254 |
Download You can download the dataset from here
Cite
Please cite the following paper in your publication if you are using this dataset in your research:
@article{ParsBERT,
title={ParsBERT: Transformer-based Model for Persian Language Understanding},
author={Mehrdad Farahani, Mohammad Gharachorloo, Marzieh Farahani, Mohammad Manthouri},
journal={ArXiv},
year={2020},
volume={abs/2005.12515}
}
Persian News
A dataset of various news articles scraped from different online news agencies’ websites. The total number of articles is 16,438, spread over eight different classes.
- Economic
- International
- Political
- Science Technology
- Cultural Art
- Sport
- Medical
| Label | # |
|---|---|
| Social | 2170 |
| Economic | 1564 |
| International | 1975 |
| Political | 2269 |
| Science Technology | 2436 |
| Cultural Art | 2558 |
| Sport | 1381 |
| Medical | 2085 |
Download You can download the dataset from here
Cite
Please cite the following paper in your publication if you are using this dataset in your research:
@article{ParsBERT,
title={ParsBERT: Transformer-based Model for Persian Language Understanding},
author={Mehrdad Farahani, Mohammad Gharachorloo, Marzieh Farahani, Mohammad Manthouri},
journal={ArXiv},
year={2020},
volume={abs/2005.12515}
}