Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

BigML Education - Datasets

303 Aufrufe

Veröffentlicht am

Datasets are the fundamental building block for your BigML workflows. Learn how to filter, sample, add new fields, or split a dataset into training and test datasets.

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

BigML Education - Datasets

  1. 1. BigML Education Datasets June 2017
  2. 2. BigML Education Program 2Datasets In This Video • Introduction • Typical workflow: 1-click creation • Purpose of datasets in BigML • Exploration • Pre-flight check • Basic Features • Other ways to create datasets • Train/Test split • More Exploration • Advanced Features • Filtering • Feature engineering with Flatline
  3. 3. BigML Education Program 3Datasets Sources Introduction
  4. 4. BigML Education Program 4Datasets What is a Dataset? • Datasets are the fundamental building blocks • Models, Clusters, etc. derive from datasets • Sources can only become datasets • Data exploration / Pre-flight check • Missing/Errors • Summary statistics • Non-preferred fields • Default objective for 1-click actions
  5. 5. BigML Education Program 5Datasets Datasets Basic Features
  6. 6. BigML Education Program 6Datasets Dataset Features • Immutable - “dataset/5943226f01440401bf0003bd” • Creating Datasets • From a source • From a dataset: sampling, training/test • From a batch output • Dynamic scatterplot
  7. 7. BigML Education Program 7Datasets Datasets Advanced Features
  8. 8. BigML Education Program 8Datasets Advanced Configuration • Dataset Filtering • Feature Engineering
  9. 9. BigML Education Program 9Datasets Loan Status Charged Off Current Default Fully Paid In Grace Late (16-30) Late (31-120) Filter Current In Grace Late (16-30) Late (31-120) Open Charged Off Default Fully Paid Closed Engineer Good Bad Quality
  10. 10. BigML Education Program 10Datasets Summary • Dataset Purpose • Fundamental building block • Pre-flight check: counts, histograms, scatterplot • Creating dataset • From source: 1-click and sampling • Training / Test split • From batch output • From dataset: sampling, filtering, new features

×