| Data source | Description | Data access | Further information |
|---|---|---|---|
| Registry of Open Data on Amazon Web Services | Usage examples for all datasets listed in the Registry of Open Data on AWS | Further Information | |
| arXiv Bulk Data Access | Open access to 1,467,637 e-prints in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance, Statistics, Electrical Engineering and Systems Science, and Economics | Further Information | |
| Australian Bureau of Statistics | Population data from the Australian Bureau of Statistics (ABS) |
API
|
ABS.Stat web services user guide |
| Australian Data Archive | National repository for digital research data |
Direct download and API
|
|
| Australian Government's open data | Find, explore and reuse Australia's public data | Further Information | |
| Australian Institute of Health and Welfare | Data on health and welfare issues in Australia | Further information | |
| BioMed Central | Scholarly articles in STM (Science, Technology, and Medicine) fields from peer-reviewed open access journals | Direct FTP download, web harvest, or API | Using BioMed Central’s open access full-text corpus for text mining research |
| BOM data feeds | Weather data from the Bureau of Meteorology | Direct FTP download | BOM weather data services |
| Canada Open Data | Search open data that is relevant to Canadians, learn how to work with datasets, and see what people have done with open data across the country | Open Government Portal | |
| CIA factbook | The United States' Central Intelligence Agency's most popular publication -- The World Factbook | Downloading Instructions | |
| CrossRef Text and Data Mining Services | Crossref metadata helps researchers get access to this content and enables publishers to provide it | Further Information | |
| Data.gov.au | Public datasets from Australian government agencies |
API
|
CKAN API guide |
| DataCite | DataCite gathers metadata for each DOI assigned to an object. The metadata is used for a large index of research data that can be queried directly to find data, obtain stats and explore connections. All the metadata is free to access and review. | https://search.datacite.org/ | |
| DBPedia | DBpedia is a crowd-sourced community effort to extract structured content from the information created in various Wikimedia projects. | Further Information | |
| Facebook Graph API | Social media data from online social networking service | API with access token | Using the graph API |
| Figshare | Open access scientific data |
|
Monash figshare |
| Google Public Data Explorer | Public Data |
|
Google Dataset Search |
| HathiTrust Datasets | Public domain works from full text digital library | Online computational environment, direct download, or API | |
| IMF eLibrary data | International Monetary Fund eLibrary data | API | Data services |
| Pew Research Centre | Pew Research Center regularly makes available the full datasets that underlie most of their reports. | How to access Pew Research Center survey data | |
| PDB Protein Data Bank | Since 1971, the Protein Data Bank archive (PDB) has served as the single repository of information about the 3D structures of proteins, nucleic acids, and complex assemblies | Archive download instructions | |
| PLOS | Open access content and metrics from PLOS journals | API | Text and Data Mining at PLOS |
| PubMed Central Open Access Subset | Open access subset of full text archive of biomedical and life sciences journal literature at the US National Institutes of Health’s National Library of Medicine | ||
| re3data | A global registry of research data repositories | FAQ | |
| Research Data Australia | Data discovery service of the Australian Research Data Commons (ARDC) | ||
| Trove | Digital content from Australian libraries, hosted by the National Library of Australia | API with key | Building with Trove |
| Social media data from online social networking service | API with key and access token or Intersect’s Twitter Scraper | API overview | |
| US Government’s Open data | Includes data from Library of Congress, Census Bureau, etc and tools and resources to conduct research, develop web and mobile applications, design data visualizations, and more. | API and various | https://www.data.gov/ |
| US Health open data | Includes National Institutes of Health, National Library of Medicine, and data on a range of topics, like environmental health, medical devices, Medicare & Medicaid, social services, community health, mental health, and substance abuse. | https://healthdata.gov/ | |
| WikiData | Structured data from Wikipedia and other open knowledge bases | Direct download or API | Wikidata: data access |