How to find datasets for R programming?
Work on small amount of data is great to begin and understand the basics of R programming. But the big asset of R is the treatment of large datasets. We don't use to work on these large and varied datasets.
To continue to learn we've selected 3 websites to find large and qualitative datasets to experiment in R programming.
Kaggle, the reference for data science
The website offers thousands of datasets and notebooks, available for free for your research. These datasets can be updated, and the community is very active to talk over about their processing. Moreover, the datasets are about science, but you can find to data about industry, sport or economy.
The particularity of Kaggle? Suggest challenge to its members to confront a data analytics issue: create a machine learning model to predict the survivors of the Titanic, predict the selling price of a property or identify numbers on handwriting.
There is many problematics to confront and it's a great resource to keep your knowledge updated.
Go to the website: https://www.kaggle.com/
Data.go: the American open data
Of course, the open databases are very useful for data science. With data.gov, you have access to a huge database, with thousands of datasets with various topics: agriculture, education, energy, health…
Disadvantage: sometimes the data format is random and can slow down your data processing. But this project is really interesting to discover a lot of data from the American administration.
Go to the website: https://www.data.gov
Data.gouv.fr: the open data in France
The website data.gouv.fr provide thousands of datasets from the French administration, but also citizens of associations. The topics may vary: demography, communication, sport, real estate…
The file formats are generally easy to process, the data are often in formats like JSON, CSV or XLS.
Advantage: the website provides some usages of these datasets like applications or case study. A good way to find concrete uses of open data projects.
Go to the website: https://www.data.gouv.fr/fr/
Have fun with these datasets and the power of data processing in R!