Machine Learning

Practitioner

We are on your path to a practice in Machine Learning and Artificial Intelligence.
Consider MLP your  resource and information site for news, tools and techniques to maintain your skillset.
Home | Data

Data

We see data. There are many different ways to visualize or crunch data.

Share some datasets, munging techniques, scraping, structuring, storing and retrieving

Data

is the fuel for our practice. Share yours and we will share ours, naturally. Some of these you probably already have or have used.  As we acquire or become familiar with more, we will post them here.  You will also see a visual pertaining to the dataset that is consistent with its structure and typical use-cases.

Most Up-voted Kaggle Datasets

Kagglers are very active in data science.  Here are the top-10 datasets as voted by the community. We’ll try to keep this updated on a regular basis.  If you do something cool with one of these, let us know.

Date Range for ratings : 10/1/19 – 10/12/19

 

#1 | Credit Card Fraud Dataset

Banks and CC companies are in need of a smart way to discern between authorized and fraudalant transactions.

The dataset is based on mining and fraud detection of anonymized EU credit card transactions from 2013.

Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles)  collaborated

Download Credit Card Dataset

#4 | European Soccer Database

This is a monster of a dataset requiring a sophisticated crawler to acquire upwards of 25,000 matches, 10,000 players over 11 European countries.

It spans 2008-2016 seasons with player attributes sourced from EA Sports FIFA video games.

Also includes betting odds from 10 providers.

Download the European Soccer Database

#7 | FIFA 19 complete player dataset

This is similar to the 08-2016 Fifa data except it covers only 2019 stats.

Download the FIFA 19 dataset

#10 | Suicide Rates 1985 to 2016

Suicide prevention has been an art more than a science up to now. Finding the signals that relate to increased suicide rates has been statistically challenging. This data compares socio-economic info with suicide rates by year and country.

Use of this database which spans 1985-2016, has 234 kaggle kernels dedicated to it.

Download the Suicide Rates Dataset

#2 | Heart Disease UCI

Using 14 of 76 attributes, this dataset from Cleveland turns out to be the primary Machine Learning database for heart disease. Target is the presence integer grade value 0 for not present to 4 for definitely present.

Download Heart Disease Dataset

#5 | Wine Reviews

The wine dataset strives to identify wines as a Sommelier would. The data comes from winemag.com reviews.

The objective of the acquirer was  create a model that can identify the variety, winery, and location of a wine based on a description.

Using text-related prediction, the dataset offers a rich corpus to model wine identification like a taster would only without actually tasting them.

Download the wine reviews dataset

#8 | Global Terrorism Database

Terrorist attacks have been collected between 1970 and the present on the Global Terrorism Database. This dataset includes information through 2017.

Thanks to the National Consortium for the Study of Terrorism and Responses to Terrorism (START), headquartered at the University of Maryland, the db contains 180,000 attacks showing location, tactics, perpetrators, targets, and outcomes.

Download the Global Terrorism Database

#3 | Google Play Store Apps

Google Play store data scraped from pages with dynamic page load with JQuery making this dataset a hard to get project. An app takes one row containing category, rating, size, version, installs and the like.

This dataset allow the analyst to extract market insights for product developers attacking the Android market.

Download Google Play Store dataset

#6 | TMDB 5000 Movie Dataset

Having been asked to remove the iMdb dataset that was previously posted on Kaggle, the acquirer turned to TMdb which does have an open API.

From this dataset, one could model the success or rating of a film based on information about the crew, cast, budget, revenue and popularity. There are 20 features to make predictions from.

Download the movie dataset

#9 | Trending YouTube Statistics

The YouTube Trending Video Statistics represents a slice of pop-culture writ live online.

Using the API, Mitchell J made the effort to capture number of views, shares, comments and likes daily in order to determine how they are trending.

One can also examine or use the code that was written to gather the data.

There are 487 kernels using the dataset.

Download the YouTube Trending Video Stats dataset

Data Bank

A collection of datasets, category, description and links to save some time searching. These are datasets that have been used by beginners and pro practitioners.

We’ll also share machine-learning models that are reloadable into your projects, notebooks and scripts. 

If you know of an interesting dataset that you or an associate has worked with and it is open-sourced, please let us know and we will add it to the collection.

Well-worn datasets used for learning

 

NameLinkDescriptionSizeMode
TitanicTitanic DatasetTitanic: Machine Learning from Disaster
Predicting survival rates on the Titanic Liner Sinking
891 Observations
12 Features
Probability
IrisIris Dataset4 different species of the Iris flower. Predict species from size of different parts of the flower.150 Observations
4 Features
Numerical
With categorical classification target
Breast Cancer Wisconsin (Diagnostic) Breast Cancer DatasetBiopsy data used to characterize cellular state569 Observations
32 Features
Classification
NYC AirBnBNYC AirBnB DatasetListing activity and metrics in NYC for 2019.49,000 Observations
16 Features
Numerical and Categorical Data
Price, Geo
Github Awesome Datasets RepoAwesome Datasets on GithubTopic-centric public data sources in high-qualityHundreds of datasetsAll kinds
Fer2013Facial Emotion Recognition Challenge Dataset


30,000+ emotion-labeled face images. Part of the Kaggle challenge in Representation Learning: Facial Expression Recognition Challenge92 MBImage arrays
Trump PhotosTrump PhotosA set of jpg images captured of trump's faceAll different sizesDeep Learning Emotion Detection
16 Trump Photos for Emotion Projecttrump_16_faces2.zip16 cropped, color Trump facial expressions for the emotion-detection projectAll different sizesDeep Learning projects

Machine Learning Models To Share

 

NameLinkDescriptionCategory
Emotion Detection CNNFacial Emotion Detection CNN ModelPriya Dwivedi's Face_and_emotion_detection model built as a deep-learning CNNFacial rec, emotion rec, deep learning