Link

Persian Text Classification

Table of contents

  1. Digikala Magazine (DigiMag)
  2. Persian News

Digikala Magazine (DigiMag)

A total of 8,515 articles scraped from Digikala Online Magazine. This dataset includes seven different classes.

  1. Video Games
  2. Shopping Guide
  3. Health Beauty
  4. Science Technology
  5. General
  6. Art Cinema
  7. Books Literature
Label #
Video Games 1967
Shopping Guide 125
Health Beauty 1610
Science Technology 2772
General 120
Art Cinema 1667
Books Literature 254

Download You can download the dataset from here

Cite

Please cite the following paper in your publication if you are using this dataset in your research:

@article{ParsBERT,
    title={ParsBERT: Transformer-based Model for Persian Language Understanding},
    author={Mehrdad Farahani, Mohammad Gharachorloo, Marzieh Farahani, Mohammad Manthouri},
    journal={ArXiv},
    year={2020},
    volume={abs/2005.12515}
}

Persian News

A dataset of various news articles scraped from different online news agencies’ websites. The total number of articles is 16,438, spread over eight different classes.

  1. Economic
  2. International
  3. Political
  4. Science Technology
  5. Cultural Art
  6. Sport
  7. Medical
Label #
Social 2170
Economic 1564
International 1975
Political 2269
Science Technology 2436
Cultural Art 2558
Sport 1381
Medical 2085

Download You can download the dataset from here

Cite

Please cite the following paper in your publication if you are using this dataset in your research:

@article{ParsBERT,
    title={ParsBERT: Transformer-based Model for Persian Language Understanding},
    author={Mehrdad Farahani, Mohammad Gharachorloo, Marzieh Farahani, Mohammad Manthouri},
    journal={ArXiv},
    year={2020},
    volume={abs/2005.12515}
}