Before diving into data analysis, you need to have data to analyze. Many people feel like this is a Catch-22 — you can't become a data analyst unless you have experience working with data, but you can't get experience working with data unless you are an analyst with access to company data.
Luckily, there are tons of online resources that you can use to find and download datasets for analysis. Throughout this blog post, we will touch on 11 websites that you can use to find datasets. Let's get right into it!
Kaggle is an online community for data scientists. Using Kaggle, you can find tons of datasets, and even publish your own. Aside from hosting datasets, you can enter competitions to practice your data science skills and even earn some money if you win. Kaggle is a great platform to maximize leaning as you can collaborate with other users and view past project and code to learn best practices.
With over 300,000 datasets available, Data.gov can be used to explore datasets covering a wide variety of topics. You don't even need to create an account to download a dataset. Datasets are well documented and relatively easy to search for as well.
If you have a specific topic in mind, Google's Dataset Search can be extremely useful. Dataset Search is like Google's normal search engine, except it is limited to data only. If you know what you are looking for, this is a great option as you will get aggregated results from across the web. Each search result provides a summary of what the data is, where it's from, and when it was last updated.
FiveThirtyEight is a politics and sports news website that provides interesting data analysis and access to their underlying datasets. Although they have data on a wide variety of topics, if you are interested in analyzing sports or polls, you'll surely find some great datasets here.
UCI hosts over 500 datasets for the ML community. These datasets are quite clean and categorized by data types, tasks, and attribute types. If you are looking for great datasets for ML projects, this is a great place to start. You don't need to sign up or create an account and all datasets are completely free.
Google BigQuery has many datasets that have been made available to the public. Google pays for the storage of these datasets, but you will be charged for the queries (though the first 1 TB per month is free). One of the most popular datasets available is the COVID-19 Public Dataset, something with which many of us have interacted. For this dataset, all queries are free.
BuzzFeed News publishes data, analyses, libraries, tools, and guides to their public Github repo. This is a great resource to see how a newsroom performs their data analysis.
NASA aggregates tons of earth data from different sources and makes it available to the public. If you are looking for datasets related to climate, vegetation mapping, or pretty much anything related to our planet, this is a great place to look. If you are looking for some out-of-this-world (i.e. space) data, check out NASA's Planetary Data System.
DataHub has data on many topics, but they are focused mostly on economic and financial data. If you are looking to analyze things like inflation or wealth inequality, DataHub is a great place to start. Data is regularly updated so that you can perform relevant analyses.
Quandl is a repository for financial data. As is sometimes the case with financial data, not all datasets on Quandl are free, some need to be purchased. When searching for datasets, you can apply lots of filters to find exactly what you are looking for.
The Awesome Public Datasets repo on Github links to hundreds of datasets from a wide variety of topics. Datasets are sorted by category so that you can easily find what you're looking for.
Although we only listed 11 resources, you'll be able to find lots more if you keep digging, these are just some of our favorites. With these resources, you should be able to find plenty of data to kickstart projects and derive some great insights.