

The datasets are made available as various sorted types and subtypes. They are made available for searching, depositing and accessing through interfaces like Open API. The datasets are ported on open data portals. The datasets from various governmental-bodies are presented in List of open government data sites.

The datasets are classified, based on the licenses, as Open data and Non-Open data. Many organizations including governments publish and share their datasets. Although they do not need to be labeled, high-quality datasets for unsupervised learning can also be difficult and costly to produce. High-quality labeled training datasets for supervised and semi-supervised machine learning algorithms are usually difficult and expensive to produce because of the large amount of time needed to label the data. Major advances in this field can result from advances in learning algorithms (such as deep learning), computer hardware, and, less-intuitively, the availability of high-quality training datasets. Datasets are an integral part of the field of machine learning. These datasets are applied for machine learning (ML) research and have been cited in peer-reviewed academic journals.
