BigML.io is a RESTful API for creating and managing BigML resources programmatically. These slide explain how to create, retrieve, update and delete BigML Sources, Datasets, Models, and Predictions.
1. BigML.io: The BigML API
October 12, 2012
BigML Inc BigML.io: The BigML API October 12, 2012 1 / 66
2. 1 Introduction
2 BigML Resources
3 Sources
4 Datasets
5 Models
6 Predictions
7 Evaluations
8 Bindings
9 Final Remarks
BigML Inc BigML.io: The BigML API October 12, 2012 2 / 66
3. BigML.io: Base URL
Base URL
https://bigml.io
A RESTful API for creating and managing BigML resources
programmatically.
All accesses are performed over HTTPS.
BigML Inc BigML.io: The BigML API October 12, 2012 3 / 66
4. BigML.io: Development Mode
Dev Mode
https://bigml.io/dev/
No credits are charged.
Limited to 1MB per resource but unlimited in the number of resources.
BigML Inc BigML.io: The BigML API October 12, 2012 4 / 66
5. BigML.io: Version
Version
https://bigml.io/andromeda/
BigML.io first version is named andromeda.
If you omit the version name in your API requests, you will get access to
the latest API version.
BigML Inc BigML.io: The BigML API October 12, 2012 5 / 66
6. BigML.io: Authentication
Authentication
1 BIGML_USERNAME=alfred
2 BIGML_API_KEY=62270d2ad14eba4e349432e80d749342de5550a4
3 BIGML_AUTH="username=$BIGML_USERNAME;api_key=$BIGML_API_KEY"
All accesses to BigML.io need to be authenticated.
Authentication is performed including your username and your BigML API
Key in every request.
If you use an environment variable (e.g. BIGML AUTH) you can keep your
credentials out of your source code.
BigML Inc BigML.io: The BigML API October 12, 2012 6 / 66
7. BigML Resources
Source Dataset Model Prediction
A source is a file A dataset is a A model is A prediction is
containing the structured created using a created using a
raw data that version of a dataset as model and the
you want to use data source input, selecting new instance
to create a where each which fields to that you want to
predictive column has use as input classify as input
model been assigned a and which field
type will be the
objective
BigML Inc BigML.io: The BigML API October 12, 2012 7 / 66
8. BigML.io: Source
Create a New Source
sepal length,sepal width,petal length,petal width,species
5.1,3.5,1.4,0.2,Iris-setosa
7.0,3.2,4.7,1.4,Iris-versicolor
5.8,2.7,5.1,1.9,Iris-virginica
A source is the raw data that you want to use to create a predictive
model.
A source is usually a (big) file in tabular format.
Each column in the file represents a feature or field.
By default, the last column represents the class or objective field.
The file may have a first row or header with a name for each field.
BigML Inc BigML.io: The BigML API October 12, 2012 8 / 66
9. BigML.io: Source
Source Base URL
https://bigml.io/source
Datasources can be created using several data sources:
Local files
Remote data accessed via HTTP or HTTPs
Files in S3 buckets
Blobs in Windows Azure storage
Inline data contained in the datasource creation request
Data must be in tabular format, cannot be bigger than 64GB, and
can be compressed (.Z or .gz, but not .zip)
BigML Inc BigML.io: The BigML API October 12, 2012 9 / 66
10. BigML.io: Creating a Source using a local file
Creating a Source
curl https://bigml.io/source?$BIGML_AUTH -F file=@iris.csv
The file must be attached in the post as a file upload
The Content-Type in your HTTP request must be
multipart/form-data, as specified by RFC2388.
BigML Inc BigML.io: The BigML API October 12, 2012 10 / 66
11. BigML.io: Creating a Source using a remote URL
Creating a Remote Source
curl https://bigml.io/source?$BIGML_AUTH
-X "POST"
-H "content-type: application/json"
-d '{"remote": "https://static.bigml.com/csv/iris.csv"}'
The Content-Type in your HTTP request must be application/json.
URLs can be HTTP or HTTPS with realm authentication, public or
private Amazon S3, or Windows Azure files.
BigML Inc BigML.io: The BigML API October 12, 2012 11 / 66
12. BigML.io: Creating a Source using inline data
Creating an Inline Source
curl https://bigml.io/source?$BIGML_AUTH
-X "POST"
-H "content-type: application/json"
-d '{"data": "a,b,c,dn1,2,3,4n5,6,7,8"}'
The Content-Type in your HTTP request must be application/json.
Source data is included in the JSON body as a string with key
“data”.
Maximum size of inline sources is 10MB.
BigML Inc BigML.io: The BigML API October 12, 2012 12 / 66
14. BigML.io: Source Arguments
One Required Type Description
file multipart form data File.
remote String URL of the remote source.
data String Inline data in tabular format.
Optional Type Description
category Integer The category that best describes the data.
description String A description of the source of up to 8192 characters.
name String The name you want to give to the new source.
private Boolean Whether you want your source to be private or not.
source parser Object Set of parameters to parse the source.
tags List A list of strings that help classify and index your source.
Table : Source Arguments
BigML Inc BigML.io: The BigML API October 12, 2012 14 / 66
15. BigML.io: Creating a Source with args
Creating a Source with args
curl https://bigml.io/source?$BIGML_AUTH
-X "POST"
-H "content-type: application/json"
-d '{"remote": "https://static.bigml.com/csv/iris.csv", "name": "iris"}'
BigML Inc BigML.io: The BigML API October 12, 2012 15 / 66
17. BigML.io: Updating a Source
Updating a Source
curl https://bigml.io/source/4f64191d03ce89860a000000?$BIGML_AUTH
-X PUT
-H 'content-type: application/json'
-d '{"name": "a new name", "source_parser": {"locale": "es-ES"}}'
BigML Inc BigML.io: The BigML API October 12, 2012 17 / 66
18. BigML.io: Deleting a Source
Deleting a Source
curl "https://bigml.io/source/4f603fe203ce89bb2d000000?$BIGML_AUTH"
-X DELETE
Response HTTP/1.1 204 NO CONTENT
BigML Inc BigML.io: The BigML API October 12, 2012 18 / 66
19. BigML.io: Retrieving a Source
Retrieving a Source via BigML.io
curl
"https://bigml.io/source/4eee50b90a590f7d5c000008?$BIGML_AUTH"
Visualizing a Source via BigML.com
https://bigml.com/dashboard/source/4eee50b90a590f7d5c000008
BigML Inc BigML.io: The BigML API October 12, 2012 19 / 66
20. BigML.io: Source Properties
property type filterable sortable updatable
category Integer yes yes yes
code Integer no no no
content type String yes yos no
created Datetime yes yes no
credits Float yes yes no
description String yes yes yes
fields Object no no no
file name String yes yes no
md5 String no no no
name String yes yes yes
number of datasets Integer yes yes no
number of models Integer yes yes no
number of predictions Integer yes yes no
private Boolean yes yes yes
resource String no no no
rows Integer yes yes no
size Integer yes yes no
source String yes yes no
source status String yes yes no
status Object no no no
tags List yes yes yes
updated Datetime yes yes no
Table : Source Properties
BigML Inc BigML.io: The BigML API October 12, 2012 20 / 66
21. BigML.io: Listing Sources
Listing Sources
curl "https://bigml.io/source?limit=10;offset=10;$BIGML_AUTH"
limit Specifies the number of sources to retrieve. Must be less
than or equal to 200.
offset The position of the whole source list at which the retrieved
source list will start off.
BigML Inc BigML.io: The BigML API October 12, 2012 21 / 66
23. BigML.io: Filtering Sources
Retrieving sources bigger than 1 MB
curl "https://bigml.io/source?size_gt=1048576;$BIGML_AUTH"
Filter Description
lt Less than
lte Less than or equal to
gt Greater than
gte Greater than or equal to
Table : Filtering Arguments
BigML Inc BigML.io: The BigML API October 12, 2012 23 / 66
24. BigML.io: Sorting Sources
Sorting sources by size
curl "https://bigml.io/source?order_by=-size;$BIGML_AUTH"
order by Specifies the order of the sources to retrieve. Must be one
of the sortable fields. If you prefix the field name with “-”,
the order will be descending.
BigML Inc BigML.io: The BigML API October 12, 2012 24 / 66
25. BigML.io: Dataset
Dataset Base URL
https://bigml.io/dataset
A dataset is a structured version of a source where each field has
been processed and serialized according to its type.
A field can be numeric or categorical.
Datetime and text fields are coming down the pike.
BigML Inc BigML.io: The BigML API October 12, 2012 25 / 66
26. BigML.io: Create a New Dataset
Create a New Dataset
curl "https://bigml.io/andromeda/dataset?$BIGML_AUTH"
-X POST
-H 'content-type: application/json'
-d '{"source": "/source/4ee5761c80e1c664f1000000"}'
BigML Inc BigML.io: The BigML API October 12, 2012 26 / 66
28. BigML.io: Dataset Arguments
Required Type Description
source String Valid source/id
Optional Type Description
category Integer The category that best describes the dataset.
description String A description of the dataset of up to 8192 characters.
fields Object The fields that you want to use to create the dataset.
name String Name of the dataset.
private Boolean Whether you want your dataset to be private or not.
size Integer Maximum number of bytes to process.
tags List A list of strings that help classify and index your dataset.
Table : Dataset Arguments
BigML Inc BigML.io: The BigML API October 12, 2012 28 / 66
29. BigML.io: Creating a Dataset with args
Creating a Dataset with args
curl "https://bigml.io/dataset?$BIGML_AUTH"
-X POST
-H 'content-type: application/json'
-d '{"source": "/source/4ee5761c80e1c664f1000000", "name": "my dataset"}'
BigML Inc BigML.io: The BigML API October 12, 2012 29 / 66
30. BigML.io: Updating a Dataset
Updating a Dataset
curl https://bigml.io/dataset/4f66a0b903ce8940c5000000?$BIGML_AUTH
-X PUT
-H 'content-type: application/json'
-d '{"name": "a new name"}'
BigML Inc BigML.io: The BigML API October 12, 2012 30 / 66
31. BigML.io: Deleting a Dataset
Deleting a Dataset
curl "https://bigml.io/dataset/4f66a0b903ce8940c5000000?$BIGML_AUTH"
-X DELETE
Response HTTP/1.1 204 NO CONTENT
BigML Inc BigML.io: The BigML API October 12, 2012 31 / 66
32. BigML.io: Retrieving a Dataset
Retrieving a Dataset via BigML.io
curl "https://bigml.io/dataset/4f66a0b903ce8940c5000000?$BIGML_AUTH"
Retrieving a Dataset via BigML.com
https://bigml.com/dashboard/dataset/4f66a0b903ce8940c5000000
BigML Inc BigML.io: The BigML API October 12, 2012 32 / 66
33. BigML.io: Dataset Properties
property type filterable sortable updatable
category Integer yes yes yes
code Integer no no no
columns Integer yes yes no
created Datetime yes yes no
credits Float yes yes no
description String yes yes yes
fields Object no no no
locale String no no no
name String yes yes yes
number of models Integer yes yes no
number of predictions Integer yes yes no
private Boolean yes yes yes
resource String no no no
rows Integer yes yes no
size Integer yes yes no
source String yes yes no
source status Boolean yes yes no
status Object no no no
tags List yes yes yes
updated Datetime yes yes no
Table : Dataset Properties
BigML Inc BigML.io: The BigML API October 12, 2012 33 / 66
34. BigML.io: Listing Datasets
Listing Datasets
curl "https://bigml.io/dataset?limit=10;offset=10;$BIGML_AUTH"
limit The total number of datasets to retrieve (≤ 200).
offset The offset at which the dataset listing will start.
BigML Inc BigML.io: The BigML API October 12, 2012 34 / 66
36. BigML.io: Filtering Datasets
Retrieving datasets bigger than 1 MB
curl "https://bigml.io/dataset?size_gt=1048576;$BIGML_AUTH"
Filter Description
lt Less than
lte Less than or equal to
gt Greater than
gte Greater than or equal to
Table : Filtering Arguments
BigML Inc BigML.io: The BigML API October 12, 2012 36 / 66
37. BigML.io: Sorting Datasets
Sorting datasets by size
curl "https://bigml.io/dataset?order_by=-size;$BIGML_AUTH"
order by Specifies the order of the datasets to retrieve. Must be one
of the sortable fields. If you prefix the field name with “-”,
they will be given in descending order.
BigML Inc BigML.io: The BigML API October 12, 2012 37 / 66
38. BigML.io: Model
Model Base URL
https://bigml.io/model
A model is a tree-like representation of your dataset with
predictive power.
You can create a model selecting which fields from your dataset
you want to use as input fields (or predictors) and which field you
want to predict, the objective field.
BigML Inc BigML.io: The BigML API October 12, 2012 38 / 66
39. BigML.io: Create a New Model
Create a New Model
curl https://bigml.io/model?$BIGML_AUTH
-X POST
-H 'content-type: application/json'
-d '{"dataset": "dataset/4f66a80803ce8940c5000006"}'
BigML Inc BigML.io: The BigML API October 12, 2012 39 / 66
40. New Model
1 { "category": 0,
2 "code": 201,
3 "columns": 5,
4 "created": "2012-05-25T07:13:07.243623",
5 "credits": 0.03515625,
6 "dataset": "dataset/4f66a80803ce8940c5000006",
7 "dataset_status": true,
8 "description": "",
9 "holdout": 0.0,
10 "input_fields": [],
11 "locale": "en_US",
12 "max_columns": 5,
13 "max_rows": 150,
14 "name": "iris' dataset model",
15 "number_of_predictions": 0,
16 "objective_fields": [],
17 "private": true,
18 "range": [
19 1, 150
20 ],
21 "resource": "model/4f67c0ee03ce89c74a000006",
22 "rows": 150,
23 "size": 4608,
24 "source": "source/4f665b8103ce8920bb000006",
25 "source_status": true,
26 "status": {
27 "code": 1, "message": "The model is being processed and will be created soon"
28 },
29 "tags": [],
30 "updated": "2012-05-25T07:13:07.243658" }
BigML Inc BigML.io: The BigML API October 12, 2012 40 / 66
41. BigML.io: Model Arguments
Required Type Description
dataset String Valid dataset/id
Optional Type Description
category Integer The category that best describes the dataset.
description String A description of the dataset of up to 8192 characters.
input fields List The fields that you want to use to create the model.
name String Name of the dataset.
objective fields List The field that you want to predict.
private Boolean Whether you want your dataset to be private or not.
range List The range of successive instances to build the model.
tags List A list of strings that help classify your dataset.
Table : Model Arguments
BigML Inc BigML.io: The BigML API October 12, 2012 41 / 66
42. BigML.io: Creating a Model with args
Creating a Model with args
curl https://bigml.io/andromeda/model?$BIGML_AUTH
-X POST
-H 'content-type: application/json'
-d '{"dataset": "dataset/4f66a80803ce8940c5000006", "input_fields": ["000001", "000003"]}'
BigML Inc BigML.io: The BigML API October 12, 2012 42 / 66
43. BigML.io: Updating a Model
Updating a Model
curl https://bigml.io/model/4f67c0ee03ce89c74a000006?$BIGML_AUTH
-X PUT
-H 'content-type: application/json'
-d '{"name": "a new name"}'
BigML Inc BigML.io: The BigML API October 12, 2012 43 / 66
44. BigML.io: Deleting a Model
Deleting a Model
curl "https://bigml.io/model/4f67c0ee03ce89c74a000006?$BIGML_AUTH"
-X DELETE
Response HTTP/1.1 204 NO CONTENT
BigML Inc BigML.io: The BigML API October 12, 2012 44 / 66
45. BigML.io: Retrieving a Model
Retrieving a Model via BigML.io
curl "https://bigml.io/model/4f66a80803ce8940c5000006?$BIGML_AUTH"
Retrieving a Model via BigML.com
https://bigml.com/dashboard/model/4f66a80803ce8940c5000006
BigML Inc BigML.io: The BigML API October 12, 2012 45 / 66
46. BigML.io: Model Properties
property type filterable sortable updatable
category Integer yes yes yes
code Integer no no no
columns Integer yes yes no
created Datetime yes yes no
credits Float yes yes no
dataset String yes yes no
dataset status Boolean yes yes no
description String yes yes yes
input fields Object no no no
locale String no no no
max columns Integer yes yes no
max rows Integer yes yes no
model Object no no no
name String yes yes yes
number of predictions Integer yes yes no
objective fields List no no no
private Boolean yes yes yes
range List no no no
resource String no no no
size Integer yes yes no
statistical pruning Boolean yes yes no
status Object no no no
tags List yes yes yes
updated Datetime yes yes no
Table : Model Properties
BigML Inc BigML.io: The BigML API October 12, 2012 46 / 66
47. BigML.io: Listing Models
Listing Models
curl "https://bigml.io/model?limit=10;offset=10;$BIGML_AUTH"
limit The number of models to retrieve (≤ 200).
offset The offset at which the model listing will start off.
BigML Inc BigML.io: The BigML API October 12, 2012 47 / 66
49. BigML.io: Filtering Models
Retrieving models bigger than 1 MB
curl "https://bigml.io/model?size_gt=1048576;$BIGML_AUTH"
Filter Description
lt Less than
lte Less than or equal to
gt Greater than
gte Greater than or equal to
Table : Filtering Arguments
BigML Inc BigML.io: The BigML API October 12, 2012 49 / 66
50. BigML.io: Sorting Models
Sorting models by size
curl "https://bigml.io/model?order_by=-size;$BIGML_AUTH"
order by Specifies the order of the models to retrieve. Must be one
of the sortable fields. If you prefix the field name with “-”,
they will be given in descending order.
BigML Inc BigML.io: The BigML API October 12, 2012 50 / 66
51. BigML.io: Prediction
Prediction Base URL
https://bigml.io/prediction
A prediction is created using a model/id and the properties of the
new instance (input data) for which you wish to create a prediction.
To create a new prediction, BigML.io will automatically navigate the
corresponding model to find the leaf node that best classifies the
new instance.
BigML Inc BigML.io: The BigML API October 12, 2012 51 / 66
52. BigML.io: Create a New Prediction
Create a New Prediction
curl https://bigml.io/prediction?$BIGML_AUTH
-X POST
-H 'content-type: application/json'
-d '{"model": "model/4f67c0ee03ce89c74a000006",
"input_data": {"000001": 3}}'
BigML Inc BigML.io: The BigML API October 12, 2012 52 / 66
54. BigML.io: Prediction Arguments
Required Type Description
model String Valid model/id.
input data Object Field’s id/value pairs representing the instance.
Optional Type Description
category Integer The category that best describes the dataset.
description String A description of the dataset of up to 8192 characters.
name String Name of the dataset.
private Boolean Whether you want your dataset to be private or not.
tags List A list of strings that help classify and index your dataset.
Table : Prediction Arguments
BigML Inc BigML.io: The BigML API October 12, 2012 54 / 66
55. BigML.io: Creating a Prediction with args
Creating a Prediction with args
curl https://bigml.io/andromeda/prediction?$BIGML_AUTH
-X POST
-H 'content-type: application/json'
-d '{"input_data": {"000001": 3},
"model": "model/4f67c0ee03ce89c74a000006",
"name": "my prediction"}'
BigML Inc BigML.io: The BigML API October 12, 2012 55 / 66
56. BigML.io: Updating a Prediction
Updating a Prediction
curl https://bigml.io/prediction/4f6a014b03ce89584500000f?$BIGML_AUTH
-X PUT
-H 'content-type: application/json'
-d '{"name": "a new name"}'
BigML Inc BigML.io: The BigML API October 12, 2012 56 / 66
57. BigML.io: Deleting a Prediction
Deleting a Prediction
curl "https://bigml.io/prediction/4f6a014b03ce89584500000f?$BIGML_AUTH"
-X DELETE
Response HTTP/1.1 204 NO CONTENT
BigML Inc BigML.io: The BigML API October 12, 2012 57 / 66
58. BigML.io: Retrieving a Prediction
Retrieving a Prediction via BigML.io
curl "https://bigml.io/prediction/4f6a014b03ce89584500000f?$BIGML_AUTH"
Retrieving a Prediction via BigML.com
https://bigml.com/dashboard/prediction/4f6a014b03ce89584500000f
BigML Inc BigML.io: The BigML API October 12, 2012 58 / 66
59. BigML.io: Prediction Properties
property type filterable sortable updatable
category Integer yes yes yes
code Integer no no no
created Datetime yes yes no
credits Float yes yes no
dataset String yes yes no
dataset status Boolean yes yes no
description String yes yes yes
fields Object no no no
input data Object no no no
locale String no no no
model String yes yes no
model status Boolean yes yes no
name String yes yes yes
objective fields List yes yes no
prediction Object yes yes no
prediction path Object no no no
private Boolean yes yes yes
resource String no no no
source String yes yes no
source status Boolean yes yes no
status Object no no no
tags List yes yes yes
updated Datetime yes yes no
Table : Prediction Properties
BigML Inc BigML.io: The BigML API October 12, 2012 59 / 66
60. BigML.io: Listing Predictions
Listing Predictions
curl "https://bigml.io/prediction?limit=10;offset=10;$BIGML_AUTH"
limit The number of predictions to retrieve (≤ 200).
offset The offset at which the prediction listing will start off.
BigML Inc BigML.io: The BigML API October 12, 2012 60 / 66
62. BigML.io: Filtering Predictions
Retrieving predictions created after 12/1/2012
curl "https://bigml.io/prediction?created__gt=2012-01-12;$BIGML_AUTH"
Filter Description
lt Less than
lte Less than or equal to
gt Greater than
gte Greater than or equal to
Table : Filtering Arguments
BigML Inc BigML.io: The BigML API October 12, 2012 62 / 66
63. BigML.io: Sorting Predictions
Sorting predictions by name
curl "https://bigml.io/prediction?order_by=-name;$BIGML_AUTH"
order by Specifies the order of the predictions to retrieve. Must be
one of the sortable fields. If you prefix the field name with
“-”, they will be given in descending order.
BigML Inc BigML.io: The BigML API October 12, 2012 63 / 66
64. BigML.io: Evaluation
Evaluation Base URL
https://bigml.io/evaluation
An evaluation automatically measures the performance of a model
correctly predicting the objective field for a pre-labeled test set.
An evaluation is created using the model/id of the model under
evaluation and the a dataset/id of the testset.
BigML Inc BigML.io: The BigML API October 12, 2012 64 / 66
65. BigML.io: Public Bindings
Bash https://github.com/bigmlcom/bigml-bash
Python https://github.com/bigmlcom/python
R https://github.com/bigmlcom/bigml-r
iOS https://github.com/fgarcialainez/ML4iOS
Java https://github.com/javinp/bigml-java
Ruby http://vigosan.github.com/big ml/
BigML Inc BigML.io: The BigML API October 12, 2012 65 / 66
66. BigML.io: Final Remarks
dev mode Remember to include /dev in your URL requests to avoid
credit charges.
version Remember to include the current version name
/andromeda in your URL requests to make sure that
future versions of the BigML API do not interfere with your
application.
BigML Inc BigML.io: The BigML API October 12, 2012 66 / 66