SlideShare a Scribd company logo
1 of 26
QMiner
BLAŽ FORTUNA, JAN RUPNIK
Overview
QMiner is a data analytics platform for processing of large-scale real-
time streams containing structured and unstructured data
◦ Connecting storage, indexing and analytics: direct conversions from storage
to feature vectors and back
◦ Native support for unstructured (text, graphs) and streaming (time series,
text streams) data
◦ Fast prototyping from data, to models to web-service APIs
Open-sourced under AGPL
◦ http://qminer.ijs.si/
◦ https://github.com/qminer/qminer
2014-06-11 HTTP://QMINER.IJS.SI/ 2
Architecture
2014-06-11 HTTP://QMINER.IJS.SI/ 3
QMiner Server
Storage
Index
Feature Extractors (Stream) Aggregates
Analytics
JavaScriptAPI
Storage and Index layer
Simple storage system
◦ Requires predefined schema
Implemented search index:
◦ Inverted Index for indexing discrete values and text
◦ Geospatial Index for indexing geographic locations
◦ B-tree for indexing linearly ordered data types (to be included)
◦ Local Proximity Hashing used to answer nearest neighbour queries on high-
dimensional data such as sparse vectors (to be included)
NoSQL-like Query language:
◦ MongoDB and Freebase JSon-like query languages
2014-06-11 HTTP://QMINER.IJS.SI/ 4
Example schema definition
{
"name": "Movies",
"fields": [
{ "name": "Title", "type": "string" },
{ "name": "Plot", "type": "string", "store" : "cache" },
{ "name": "Year", "type": "int" },
{ "name": "Rating", "type": "float" },
{ "name": "Genres", "type": "string_v", "codebook" : true }
],
"joins": [
{ "name": "Actor", "type": "index", "store": "People", "inverse" : "ActedIn" },
{ "name": "Director", "type": "field", "store": "People", "inverse" : "Directed" }
],
"keys": [
{ "field": "Title", "type": "value" },
{ "field": "Title", "name": "TitleTxt", "type": "text", "vocabulary" : "voc_01" },
{ "field": "Plot", "type": "text", "vocabulary" : "voc_01" },
{ "field": "Genres", "type": "value" }
]
}
2014-06-11 HTTP://QMINER.IJS.SI/ 5
https://github.com/qminer/qminer/wiki/Store-definition
Query Language
Selectors over indexed keys
◦ { $from: "Movies", $or: [{ Title: "lost" }, { Plot: "lost" }]}
Probabilistic joins
◦ { $join: { $name: "Actor",
$query: { $from: "Movies", Genres: "Horror"}}}
Aggregates over results
◦ { name: "Plot", type: "keywords", field: "Plot" }
◦ { name: "Rating", type: "histogram", field: "Rating" }
◦ { name: "Genres", type: "count", field: "Genres" }
2014-06-11 HTTP://QMINER.IJS.SI/ 6
https://github.com/qminer/qminer/wiki/Query-Language
Example: Twitter search “beer”
2014-06-11 HTTP://QMINER.IJS.SI/ 7
drinking, day, tonight, time,
good, night, lol, mate, lovely,
haha, christmas, work, home, ll,
nice, yeah, food, back, today, feel,
curry, wine, football, pint, opener,
watch
beer, perfect, cheers, yolo,
merrychristmas, fb, christmas, photo,
camrgb, bliss, coyi, decent, lad,
nightclubfails, coyg, superbowl,
suffolk, buzzing, curry, vodka,
becauseican, hangoverinthemorning
Example: Twitter search “hangover”
2014-06-11 HTTP://QMINER.IJS.SI/ 8
cure, day, feeling, drink,
night, good, work, year,
today, morning, haha, worst,
love, tomorrow, time,
christmas, bad, wake, food,
bed, drunk
hangover, winning, happynewyear,
perfect, food, nye, notfair,
toooldforthisshit, dedication, sick,
fucked, badtimes, backtobed,
goodnight, yay, ouch, beer, fresh, dying,
bed, death
Aggregators
Batch mode
◦ Work on static record sets and produce one-time result
◦ Accessible via query language
Streaming mode (Stream Aggregators)
◦ Updated in real-time as new data added to storage layer
◦ Can be composed into pipelines
Integrated stream aggregators:
◦ Time series indicators (MA, EMA, double EMA, …)
◦ Resampling of input stream
◦ Merging of two or more input streams
◦ Delay
◦ …
2014-06-11 HTTP://QMINER.IJS.SI/ 9
Store
Tick
MA EMA
dEMA
https://github.com/qminer/qminer/wiki/Stream-Aggregates
Feature Extractors
Mappings from data records to (sparse) feature vectors
◦ Defined using declarative language
◦ Work on stream data
Built-in functionality for extraction of features:
◦ Numeric, Categorical, Multinomial, Bag-of-Words, Join, Pair
◦ Include all Glib text processing machinery (stemmer, stop-words, hashing)
2014-06-11 HTTP://QMINER.IJS.SI/ 10
https://github.com/qminer/qminer/wiki/Feature-Extractors
Example
Feature extractors:
◦ { type: "text", source: "Movies", field: "Title" }
◦ { type: "text", source: "Movies", field: "Plot" }
◦ { type: "multinomial", source: "Movies", field: "Genres" }
◦ { type: "join", source: { store: "Movies", join: "Actor" }}
2014-06-11 HTTP://QMINER.IJS.SI/ 11
Title Body Genres Actors
{
"Title": "Every Day",
"Plot": "This day really isn't all that different than...",
"Year": 2010,
"Rating": 5.6,
"Genres": [ "Comedy", "Drama" ],
"Director": {"Name": "Levine Richard (III)", "Gender": "Male" },
"Actor": [ { "Name": "Beetem Chris", "Gender": "Male" }, ... ]
}
Analytics – Linear Algebra
◦ Wrapped parts of C++ linalg library. Most functions can benefit from high
performance libraries such as intel MKL or open blas.
◦ Computationally light parts and gluing scripts can be implemented directly in
JS (examples: conjugate gradient, number nonzero elements in sparse
matrices)
◦ Five main classes: la (linear algebra), full vectors and matrices and dense
vectors and matrices.
◦ Supported functionality enables constructing elements in various ways,
computing linear combinations, multiplication, transposition, norm
computations,...
◦ We have also exposed some important building blocks: large scale SVD
(dense, sparse), solving linear systems (LU decomposition for dense systems,
conjugate gradient for symmetric positive definite matrices)
2014-06-11 HTTP://QMINER.IJS.SI/ 12
Analytics – Learning
Works on top of extracted features
Implemented Techniques:
◦ Classification:
◦ SVM (batch)
◦ Perceptron (updates)
◦ Hoeffding trees (updates)
◦ Active learning (uncertainty sampling + SVM)
◦ Regression:
◦ SVMR (batch)
◦ Ridge regression (batch)
◦ Ridge regression (updates)
◦ Clustering:
◦ k-means (batch)
◦ Lloyd algorithm (updates),
2014-06-11 HTTP://QMINER.IJS.SI/ 13
JavaScript API
Major functionality exposed via JavaScript API
◦ Using Google V8 JavaScript engine
◦ Current status: More then 20 objects and 300 functions
Exposed APIs
◦ Data layer – storage, indexing, retrieval
◦ Linear algebra – full and sparse vector and matrix, matrix operations
◦ Learning algorithms – supervised, unsupervised, active learning
◦ Stream aggregates – definition, access to real-time values
◦ Input/Output – file system, web services (easy RESTful APIs)
Documentation:
◦ https://github.com/qminer/qminer/wiki/JavaScript
2014-06-11 HTTP://QMINER.IJS.SI/ 14
Installation
Installation:
◦ git clone https://github.com/qminer/qminer.git
◦ cd qminer
◦ make lib
◦ make
◦ ./test/javascript/test.sh
Main build results (qminer/build):
◦ qm - QMiner executable
◦ *.js – QMiner JavaScript support functions
◦ gui/ - administration GUI
◦ lib/ - available JavaScript libraries (can be included using 'require')
Environment variable:
◦ QMINER_HOME=($QMINER)/build
2014-06-11 HTTP://QMINER.IJS.SI/ 15
Quick start
Configure:
◦ qm config -port=8080
Initialize storage according to provided schema:
◦ qm create -def=schema.def
Start QMiner:
◦ qm start
◦ qm start –noserver
◦ qm start –rdonly
Stop Qminer
◦ qm stop
2014-06-11 HTTP://QMINER.IJS.SI/ 16
Documentation
Home
Quick Start
◦ Linux Installation
◦ Windows Installation
Example
JavaScript API
Store Definition
Query Language
Stream Aggregates
Feature Extractors
Configuration
Restore and Failover
2014-06-11 HTTP://QMINER.IJS.SI/ 17
Example – Movies.js
2014-06-11 HTTP://QMINER.IJS.SI/ 18
// Import analytics module
var analytics = require("analytics.js");
// Loading in the dataset.
qm.load.jsonFile(Movies, "./sandbox/movies/movies.json");
// Declare the features we will use to build genre classification models
var genreFeatures = [
{ type: "text", source: "Movies", field: "Title" },
{ type: "text", source: "Movies", field: "Plot" },
{ type: "join", source: { store: "Movies", join: "Actor" } },
{ type: "join", source: { store: "Movies", join: "Director"} }
];
// Create a model for the Genres field, using all the movies as training set.
var genreModel = analytics.newBatchModel(Movies.recs,
genreFeatures, Movies.field("Genres"));
// Predict genres of a new movie
var newMovie = qm.store("Movies").newRec({...});
var result = genreModel.predict(newMovie);
http://htmlpreview.github.io/?https://raw.github.com/qminer/qminer/master/docjs/movies.html
Example – TimeSeries.js
2014-06-11 HTTP://QMINER.IJS.SI/ 19
Raw store
Resampler
Tick
EMA 1m
EMA 10m
Resampled storeDelay
http://htmlpreview.github.io/?https://raw.github.com/qminer/qminer/master/docjs/timeseries.html
Time Value
2012-01-08T22:00:18.623 1.26957
2012-01-08T22:00:18.950 1.26952
2012-01-08T22:00:19.310 1.26953
… …
Time Value
2012-01-08T22:00:18 1.26957
2012-01-08T22:00:28 1.26947
2012-01-08T22:00:38 1.26956
… …
EMA1m EMA10mEMA1m
0.00000
0.00000
0.19490
…
EMA10m
0.000000
0.000000
0.020984
…
Example – TimeSeries.js
2014-06-11 HTTP://QMINER.IJS.SI/ 20
// Initialize resamper from Raw to Resampled store. This results in
// in an equaly spaced time series with 10 second interval.
Raw.addStreamAggr({ name: "Resample10second", type: "resampler",
outStore: "Resampled", timestamp: "Time",
fields: [ { name: "Value", interpolator: "previous" } ],
createStore: false, interval: 10 * 1000
});
// Initialize stream aggregates on Resampled store for computing
// 1 minute and 10 minute exponential moving averages.
Resampled.addStreamAggr({ name: "tick", type: "timeSeriesTick",
timestamp: "Time", value: "Value" });
Resampled.addStreamAggr({ name: "ema1m", type: "ema",
inAggr: "tick", emaType: "previous", interval: 60000, initWindow: 10000 });
Resampled.addStreamAggr({ name: "ema10m", type: "ema",
inAggr: "tick", emaType: "previous", interval: 600000, initWindow: 10000
});
// Buffer for keeping track of the record from 1 minute ago
Resampled.addStreamAggr({ name: "delay", type: "recordBuffer", size: 6});
http://htmlpreview.github.io/?https://raw.github.com/qminer/qminer/master/docjs/timeseries.html
Example – TimeSeries.js
2014-06-11 HTTP://QMINER.IJS.SI/ 21
// Declare features from the resampled timeseries
var ftrSpace = analytics.newFeatureSpace([
{ type: "numeric", source: "Resampled", field: "Value" },
{ type: "numeric", source: "Resampled", field: "Ema1" },
{ type: "numeric", source: "Resampled", field: "Ema2" },
{ type: "multinomial", source: "Resampled", field: "Time", datetime: true }
]);
// Initialize linear regression model.
var linreg = analytics.newRecLinReg({ dim: ftrSpace.dim, forgetFact: 0.9999 });
// We register a trigger to Resampled store
Resampled.addTrigger({ onAdd: function (val) {
// Get the latest value for EMAs
val.Ema1 = Resampled.getStreamAggr("ema1m").EMA;
val.Ema2 = Resampled.getStreamAggr("ema10m").EMA;
// Get the id of the record from a minute ago.
var trainRecId = Resampled.getStreamAggr("delay").last;
// Update the model, once we have at leats 1 minute worth of data
linreg.learn(ftrSpace.ftrVec(Resampled[trainRecId]), val.Value);
}
});
http://htmlpreview.github.io/?https://raw.github.com/qminer/qminer/master/docjs/timeseries.html
Example – linalg.js - CG
2014-06-11 HTTP://QMINER.IJS.SI/ 22
la.conjgrad = function (A, b, x) {
var r = b.minus(A.multiply(x));
var p = la.newVec(r); //clone
var rsold = r.inner(r);
for (var i = 0; i < 2*x.length; i++) {
var Ap = A.multiply(p);
var alpha = rsold / Ap.inner(p);
x = x.plus(p.multiply(alpha));
r = r.minus(Ap.multiply(alpha));
var rsnew = r.inner(r);
console.say("resid = " + rsnew);
if (Math.sqrt(rsnew) < 1e-6) {
break;
}
p = r.plus(p.multiply(rsnew/rsold));
rsold = rsnew;
}
return x;
}
Example – Twitter.js – AL
2014-06-11 HTTP://QMINER.IJS.SI/ 23
// Load tweets from a file (toy example)
var tweetsFile = "./sandbox/twitter/toytweets.txt";
var Tweets = qm.store("Tweets");
qm.load.jsonFile(Tweets, tweetsFile);
// Select all tweets
var recSet = Tweets.recs;
// Active learning settings: start svm when 2 positive and 2 negative examples are provided
var nPos = 2; var nNeg = 2; //active learning query mode
// Initial query for "relevant" documents
var relevantQuery = "nice bad";
// Create feature space
var ftrSpace = analytics.newFeatureSpace([
{ type: "text", source: "Tweets", field: "Text" },
]);
// Builds a new feature space
ftrSpace.updateRecords(recSet);
// Constructs the active learner
var AL = new analytics.activeLearner(ftrSpace, "Text", recSet, nPos, nNeg, relevantQuery);
// Starts the active learner (use the keyword stop to quit)
AL.selectQuestion();
// Save the model
AL.saveSvmModel(fs.openWrite('./sandbox/twitter/svmFilter.bin'));
http://htmlpreview.github.io/?https://raw.github.com/qminer/qminer/master/docjs/twitter.html
Example – Twitter.js : filtering
2014-06-11 HTTP://QMINER.IJS.SI/ 24
// Load the model from disk
var fin = fs.openRead("./sandbox/twitter/svmFilter.bin");
var svmFilter = analytics.loadSvmModel(fin);
// Filter relevant records: records are dropped if svmFilter predicts a v negative value
recSet.filter(function (rec) { return svmFilter.predict(ftrSpace.ftrSpVec(rec)) > 0; });
// Filter the record set of by time
// Clone the rec set two times
var recSet1 = recSet.clone();
var recSet2 = recSet.clone();
// Set the cutoff date
var tm = time.parse("2011-08-01T00:05:06");
// Get a record set with tweets older than tm
recSet1.filter(function (rec) { return rec.Date.timestamp < tm.timestamp })
// Get a record set with tweets newer than tm
recSet2.filter(function (rec) { return rec.Date.timestamp > tm.timestamp })
http://htmlpreview.github.io/?https://raw.github.com/qminer/qminer/master/docjs/twitter.html
Usage
Applications:
◦ Event registry
◦ Event Type classification
◦ News recommendation
◦ Web audience segmentation
Projects:
◦ XLike
◦ Sophocles
◦ SMER+
◦ Mobis
◦ ProaSense
◦ Symphony
2014-06-11 HTTP://QMINER.IJS.SI/ 25
Thank you!
2014-06-11 HTTP://QMINER.IJS.SI/ 26
https://github.com/qminer/qminerhttp://qminer.ijs.si/

More Related Content

What's hot

MongoDB Miami Meetup 1/26/15: Introduction to WiredTiger
MongoDB Miami Meetup 1/26/15: Introduction to WiredTigerMongoDB Miami Meetup 1/26/15: Introduction to WiredTiger
MongoDB Miami Meetup 1/26/15: Introduction to WiredTigerValeri Karpov
 
Hosting huge amount of binaries in JCR
Hosting huge amount of binaries in JCRHosting huge amount of binaries in JCR
Hosting huge amount of binaries in JCRWoonsan Ko
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDBMongoDB
 
MongoDB Internals
MongoDB InternalsMongoDB Internals
MongoDB InternalsSiraj Memon
 
Scalable XQuery Processing with Zorba on top of MongoDB
Scalable XQuery Processing with Zorba on top of MongoDBScalable XQuery Processing with Zorba on top of MongoDB
Scalable XQuery Processing with Zorba on top of MongoDBWilliam Candillon
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data Omid Vahdaty
 
MongoDB .local Toronto 2019: MongoDB Atlas Search Deep Dive
MongoDB .local Toronto 2019: MongoDB Atlas Search Deep DiveMongoDB .local Toronto 2019: MongoDB Atlas Search Deep Dive
MongoDB .local Toronto 2019: MongoDB Atlas Search Deep DiveMongoDB
 
Building Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDBBuilding Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDBAshnikbiz
 
Storage talk
Storage talkStorage talk
Storage talkchristkv
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
 
Webinar: Serverless Architectures with AWS Lambda and MongoDB Atlas
Webinar: Serverless Architectures with AWS Lambda and MongoDB AtlasWebinar: Serverless Architectures with AWS Lambda and MongoDB Atlas
Webinar: Serverless Architectures with AWS Lambda and MongoDB AtlasMongoDB
 
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB
 
GlobusWorld 2021 Tutorial: Globus for System Administrators
GlobusWorld 2021 Tutorial: Globus for System AdministratorsGlobusWorld 2021 Tutorial: Globus for System Administrators
GlobusWorld 2021 Tutorial: Globus for System AdministratorsGlobus
 
Moskva Architecture Highload
Moskva Architecture HighloadMoskva Architecture Highload
Moskva Architecture HighloadOntico
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSMongoDB
 
CouchDB – A Database for the Web
CouchDB – A Database for the WebCouchDB – A Database for the Web
CouchDB – A Database for the WebKarel Minarik
 
MongoFr : MongoDB as a log Collector
MongoFr : MongoDB as a log CollectorMongoFr : MongoDB as a log Collector
MongoFr : MongoDB as a log CollectorPierre Baillet
 

What's hot (20)

MongoDB Miami Meetup 1/26/15: Introduction to WiredTiger
MongoDB Miami Meetup 1/26/15: Introduction to WiredTigerMongoDB Miami Meetup 1/26/15: Introduction to WiredTiger
MongoDB Miami Meetup 1/26/15: Introduction to WiredTiger
 
Hosting huge amount of binaries in JCR
Hosting huge amount of binaries in JCRHosting huge amount of binaries in JCR
Hosting huge amount of binaries in JCR
 
CouchDB
CouchDBCouchDB
CouchDB
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
Presto+MySQLで分散SQL
Presto+MySQLで分散SQLPresto+MySQLで分散SQL
Presto+MySQLで分散SQL
 
MongoDB Internals
MongoDB InternalsMongoDB Internals
MongoDB Internals
 
Level DB - Quick Cheat Sheet
Level DB - Quick Cheat SheetLevel DB - Quick Cheat Sheet
Level DB - Quick Cheat Sheet
 
Scalable XQuery Processing with Zorba on top of MongoDB
Scalable XQuery Processing with Zorba on top of MongoDBScalable XQuery Processing with Zorba on top of MongoDB
Scalable XQuery Processing with Zorba on top of MongoDB
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data
 
MongoDB .local Toronto 2019: MongoDB Atlas Search Deep Dive
MongoDB .local Toronto 2019: MongoDB Atlas Search Deep DiveMongoDB .local Toronto 2019: MongoDB Atlas Search Deep Dive
MongoDB .local Toronto 2019: MongoDB Atlas Search Deep Dive
 
Building Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDBBuilding Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDB
 
Storage talk
Storage talkStorage talk
Storage talk
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
Webinar: Serverless Architectures with AWS Lambda and MongoDB Atlas
Webinar: Serverless Architectures with AWS Lambda and MongoDB AtlasWebinar: Serverless Architectures with AWS Lambda and MongoDB Atlas
Webinar: Serverless Architectures with AWS Lambda and MongoDB Atlas
 
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
 
GlobusWorld 2021 Tutorial: Globus for System Administrators
GlobusWorld 2021 Tutorial: Globus for System AdministratorsGlobusWorld 2021 Tutorial: Globus for System Administrators
GlobusWorld 2021 Tutorial: Globus for System Administrators
 
Moskva Architecture Highload
Moskva Architecture HighloadMoskva Architecture Highload
Moskva Architecture Highload
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWS
 
CouchDB – A Database for the Web
CouchDB – A Database for the WebCouchDB – A Database for the Web
CouchDB – A Database for the Web
 
MongoFr : MongoDB as a log Collector
MongoFr : MongoDB as a log CollectorMongoFr : MongoDB as a log Collector
MongoFr : MongoDB as a log Collector
 

Similar to QMiner - Data analytics platform for processing large-scale real-time streams containing structured and unstructured data

Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructureharendra_pathak
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteJulian Hyde
 
Building APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformBuilding APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformAntonio Peric-Mazar
 
Powering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraphPowering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraphScyllaDB
 
Play Framework and Activator
Play Framework and ActivatorPlay Framework and Activator
Play Framework and ActivatorKevin Webber
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleSean Chittenden
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API'sNatalino Busa
 
AWS as platform for scalable applications
AWS as platform for scalable applicationsAWS as platform for scalable applications
AWS as platform for scalable applicationsRoman Gomolko
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Iulian Pintoiu
 
Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsAntonio Severien
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on KubernetesDatabricks
 
Media_Entertainment_Veriticals
Media_Entertainment_VeriticalsMedia_Entertainment_Veriticals
Media_Entertainment_VeriticalsPeyman Mohajerian
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveTorsten Steinbach
 
Shaping serverless architecture with domain driven design patterns - py web-il
Shaping serverless architecture with domain driven design patterns - py web-ilShaping serverless architecture with domain driven design patterns - py web-il
Shaping serverless architecture with domain driven design patterns - py web-ilAsher Sterkin
 
Webinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and GraphWebinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and GraphLucidworks
 
Dissecting Open Source Cloud Evolution: An OpenStack Case Study
Dissecting Open Source Cloud Evolution: An OpenStack Case StudyDissecting Open Source Cloud Evolution: An OpenStack Case Study
Dissecting Open Source Cloud Evolution: An OpenStack Case StudySalman Baset
 

Similar to QMiner - Data analytics platform for processing large-scale real-time streams containing structured and unstructured data (20)

Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructure
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
 
Building APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformBuilding APIs in an easy way using API Platform
Building APIs in an easy way using API Platform
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
Cubes 1.0 Overview
Cubes 1.0 OverviewCubes 1.0 Overview
Cubes 1.0 Overview
 
Powering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraphPowering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraph
 
Play Framework and Activator
Play Framework and ActivatorPlay Framework and Activator
Play Framework and Activator
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at Scale
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API's
 
REST easy with API Platform
REST easy with API PlatformREST easy with API Platform
REST easy with API Platform
 
AWS as platform for scalable applications
AWS as platform for scalable applicationsAWS as platform for scalable applications
AWS as platform for scalable applications
 
KSQL Intro
KSQL IntroKSQL Intro
KSQL Intro
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019
 
Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data Streams
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 
Media_Entertainment_Veriticals
Media_Entertainment_VeriticalsMedia_Entertainment_Veriticals
Media_Entertainment_Veriticals
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep Dive
 
Shaping serverless architecture with domain driven design patterns - py web-il
Shaping serverless architecture with domain driven design patterns - py web-ilShaping serverless architecture with domain driven design patterns - py web-il
Shaping serverless architecture with domain driven design patterns - py web-il
 
Webinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and GraphWebinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and Graph
 
Dissecting Open Source Cloud Evolution: An OpenStack Case Study
Dissecting Open Source Cloud Evolution: An OpenStack Case StudyDissecting Open Source Cloud Evolution: An OpenStack Case Study
Dissecting Open Source Cloud Evolution: An OpenStack Case Study
 

Recently uploaded

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 

Recently uploaded (20)

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 

QMiner - Data analytics platform for processing large-scale real-time streams containing structured and unstructured data

  • 2. Overview QMiner is a data analytics platform for processing of large-scale real- time streams containing structured and unstructured data ◦ Connecting storage, indexing and analytics: direct conversions from storage to feature vectors and back ◦ Native support for unstructured (text, graphs) and streaming (time series, text streams) data ◦ Fast prototyping from data, to models to web-service APIs Open-sourced under AGPL ◦ http://qminer.ijs.si/ ◦ https://github.com/qminer/qminer 2014-06-11 HTTP://QMINER.IJS.SI/ 2
  • 3. Architecture 2014-06-11 HTTP://QMINER.IJS.SI/ 3 QMiner Server Storage Index Feature Extractors (Stream) Aggregates Analytics JavaScriptAPI
  • 4. Storage and Index layer Simple storage system ◦ Requires predefined schema Implemented search index: ◦ Inverted Index for indexing discrete values and text ◦ Geospatial Index for indexing geographic locations ◦ B-tree for indexing linearly ordered data types (to be included) ◦ Local Proximity Hashing used to answer nearest neighbour queries on high- dimensional data such as sparse vectors (to be included) NoSQL-like Query language: ◦ MongoDB and Freebase JSon-like query languages 2014-06-11 HTTP://QMINER.IJS.SI/ 4
  • 5. Example schema definition { "name": "Movies", "fields": [ { "name": "Title", "type": "string" }, { "name": "Plot", "type": "string", "store" : "cache" }, { "name": "Year", "type": "int" }, { "name": "Rating", "type": "float" }, { "name": "Genres", "type": "string_v", "codebook" : true } ], "joins": [ { "name": "Actor", "type": "index", "store": "People", "inverse" : "ActedIn" }, { "name": "Director", "type": "field", "store": "People", "inverse" : "Directed" } ], "keys": [ { "field": "Title", "type": "value" }, { "field": "Title", "name": "TitleTxt", "type": "text", "vocabulary" : "voc_01" }, { "field": "Plot", "type": "text", "vocabulary" : "voc_01" }, { "field": "Genres", "type": "value" } ] } 2014-06-11 HTTP://QMINER.IJS.SI/ 5 https://github.com/qminer/qminer/wiki/Store-definition
  • 6. Query Language Selectors over indexed keys ◦ { $from: "Movies", $or: [{ Title: "lost" }, { Plot: "lost" }]} Probabilistic joins ◦ { $join: { $name: "Actor", $query: { $from: "Movies", Genres: "Horror"}}} Aggregates over results ◦ { name: "Plot", type: "keywords", field: "Plot" } ◦ { name: "Rating", type: "histogram", field: "Rating" } ◦ { name: "Genres", type: "count", field: "Genres" } 2014-06-11 HTTP://QMINER.IJS.SI/ 6 https://github.com/qminer/qminer/wiki/Query-Language
  • 7. Example: Twitter search “beer” 2014-06-11 HTTP://QMINER.IJS.SI/ 7 drinking, day, tonight, time, good, night, lol, mate, lovely, haha, christmas, work, home, ll, nice, yeah, food, back, today, feel, curry, wine, football, pint, opener, watch beer, perfect, cheers, yolo, merrychristmas, fb, christmas, photo, camrgb, bliss, coyi, decent, lad, nightclubfails, coyg, superbowl, suffolk, buzzing, curry, vodka, becauseican, hangoverinthemorning
  • 8. Example: Twitter search “hangover” 2014-06-11 HTTP://QMINER.IJS.SI/ 8 cure, day, feeling, drink, night, good, work, year, today, morning, haha, worst, love, tomorrow, time, christmas, bad, wake, food, bed, drunk hangover, winning, happynewyear, perfect, food, nye, notfair, toooldforthisshit, dedication, sick, fucked, badtimes, backtobed, goodnight, yay, ouch, beer, fresh, dying, bed, death
  • 9. Aggregators Batch mode ◦ Work on static record sets and produce one-time result ◦ Accessible via query language Streaming mode (Stream Aggregators) ◦ Updated in real-time as new data added to storage layer ◦ Can be composed into pipelines Integrated stream aggregators: ◦ Time series indicators (MA, EMA, double EMA, …) ◦ Resampling of input stream ◦ Merging of two or more input streams ◦ Delay ◦ … 2014-06-11 HTTP://QMINER.IJS.SI/ 9 Store Tick MA EMA dEMA https://github.com/qminer/qminer/wiki/Stream-Aggregates
  • 10. Feature Extractors Mappings from data records to (sparse) feature vectors ◦ Defined using declarative language ◦ Work on stream data Built-in functionality for extraction of features: ◦ Numeric, Categorical, Multinomial, Bag-of-Words, Join, Pair ◦ Include all Glib text processing machinery (stemmer, stop-words, hashing) 2014-06-11 HTTP://QMINER.IJS.SI/ 10 https://github.com/qminer/qminer/wiki/Feature-Extractors
  • 11. Example Feature extractors: ◦ { type: "text", source: "Movies", field: "Title" } ◦ { type: "text", source: "Movies", field: "Plot" } ◦ { type: "multinomial", source: "Movies", field: "Genres" } ◦ { type: "join", source: { store: "Movies", join: "Actor" }} 2014-06-11 HTTP://QMINER.IJS.SI/ 11 Title Body Genres Actors { "Title": "Every Day", "Plot": "This day really isn't all that different than...", "Year": 2010, "Rating": 5.6, "Genres": [ "Comedy", "Drama" ], "Director": {"Name": "Levine Richard (III)", "Gender": "Male" }, "Actor": [ { "Name": "Beetem Chris", "Gender": "Male" }, ... ] }
  • 12. Analytics – Linear Algebra ◦ Wrapped parts of C++ linalg library. Most functions can benefit from high performance libraries such as intel MKL or open blas. ◦ Computationally light parts and gluing scripts can be implemented directly in JS (examples: conjugate gradient, number nonzero elements in sparse matrices) ◦ Five main classes: la (linear algebra), full vectors and matrices and dense vectors and matrices. ◦ Supported functionality enables constructing elements in various ways, computing linear combinations, multiplication, transposition, norm computations,... ◦ We have also exposed some important building blocks: large scale SVD (dense, sparse), solving linear systems (LU decomposition for dense systems, conjugate gradient for symmetric positive definite matrices) 2014-06-11 HTTP://QMINER.IJS.SI/ 12
  • 13. Analytics – Learning Works on top of extracted features Implemented Techniques: ◦ Classification: ◦ SVM (batch) ◦ Perceptron (updates) ◦ Hoeffding trees (updates) ◦ Active learning (uncertainty sampling + SVM) ◦ Regression: ◦ SVMR (batch) ◦ Ridge regression (batch) ◦ Ridge regression (updates) ◦ Clustering: ◦ k-means (batch) ◦ Lloyd algorithm (updates), 2014-06-11 HTTP://QMINER.IJS.SI/ 13
  • 14. JavaScript API Major functionality exposed via JavaScript API ◦ Using Google V8 JavaScript engine ◦ Current status: More then 20 objects and 300 functions Exposed APIs ◦ Data layer – storage, indexing, retrieval ◦ Linear algebra – full and sparse vector and matrix, matrix operations ◦ Learning algorithms – supervised, unsupervised, active learning ◦ Stream aggregates – definition, access to real-time values ◦ Input/Output – file system, web services (easy RESTful APIs) Documentation: ◦ https://github.com/qminer/qminer/wiki/JavaScript 2014-06-11 HTTP://QMINER.IJS.SI/ 14
  • 15. Installation Installation: ◦ git clone https://github.com/qminer/qminer.git ◦ cd qminer ◦ make lib ◦ make ◦ ./test/javascript/test.sh Main build results (qminer/build): ◦ qm - QMiner executable ◦ *.js – QMiner JavaScript support functions ◦ gui/ - administration GUI ◦ lib/ - available JavaScript libraries (can be included using 'require') Environment variable: ◦ QMINER_HOME=($QMINER)/build 2014-06-11 HTTP://QMINER.IJS.SI/ 15
  • 16. Quick start Configure: ◦ qm config -port=8080 Initialize storage according to provided schema: ◦ qm create -def=schema.def Start QMiner: ◦ qm start ◦ qm start –noserver ◦ qm start –rdonly Stop Qminer ◦ qm stop 2014-06-11 HTTP://QMINER.IJS.SI/ 16
  • 17. Documentation Home Quick Start ◦ Linux Installation ◦ Windows Installation Example JavaScript API Store Definition Query Language Stream Aggregates Feature Extractors Configuration Restore and Failover 2014-06-11 HTTP://QMINER.IJS.SI/ 17
  • 18. Example – Movies.js 2014-06-11 HTTP://QMINER.IJS.SI/ 18 // Import analytics module var analytics = require("analytics.js"); // Loading in the dataset. qm.load.jsonFile(Movies, "./sandbox/movies/movies.json"); // Declare the features we will use to build genre classification models var genreFeatures = [ { type: "text", source: "Movies", field: "Title" }, { type: "text", source: "Movies", field: "Plot" }, { type: "join", source: { store: "Movies", join: "Actor" } }, { type: "join", source: { store: "Movies", join: "Director"} } ]; // Create a model for the Genres field, using all the movies as training set. var genreModel = analytics.newBatchModel(Movies.recs, genreFeatures, Movies.field("Genres")); // Predict genres of a new movie var newMovie = qm.store("Movies").newRec({...}); var result = genreModel.predict(newMovie); http://htmlpreview.github.io/?https://raw.github.com/qminer/qminer/master/docjs/movies.html
  • 19. Example – TimeSeries.js 2014-06-11 HTTP://QMINER.IJS.SI/ 19 Raw store Resampler Tick EMA 1m EMA 10m Resampled storeDelay http://htmlpreview.github.io/?https://raw.github.com/qminer/qminer/master/docjs/timeseries.html Time Value 2012-01-08T22:00:18.623 1.26957 2012-01-08T22:00:18.950 1.26952 2012-01-08T22:00:19.310 1.26953 … … Time Value 2012-01-08T22:00:18 1.26957 2012-01-08T22:00:28 1.26947 2012-01-08T22:00:38 1.26956 … … EMA1m EMA10mEMA1m 0.00000 0.00000 0.19490 … EMA10m 0.000000 0.000000 0.020984 …
  • 20. Example – TimeSeries.js 2014-06-11 HTTP://QMINER.IJS.SI/ 20 // Initialize resamper from Raw to Resampled store. This results in // in an equaly spaced time series with 10 second interval. Raw.addStreamAggr({ name: "Resample10second", type: "resampler", outStore: "Resampled", timestamp: "Time", fields: [ { name: "Value", interpolator: "previous" } ], createStore: false, interval: 10 * 1000 }); // Initialize stream aggregates on Resampled store for computing // 1 minute and 10 minute exponential moving averages. Resampled.addStreamAggr({ name: "tick", type: "timeSeriesTick", timestamp: "Time", value: "Value" }); Resampled.addStreamAggr({ name: "ema1m", type: "ema", inAggr: "tick", emaType: "previous", interval: 60000, initWindow: 10000 }); Resampled.addStreamAggr({ name: "ema10m", type: "ema", inAggr: "tick", emaType: "previous", interval: 600000, initWindow: 10000 }); // Buffer for keeping track of the record from 1 minute ago Resampled.addStreamAggr({ name: "delay", type: "recordBuffer", size: 6}); http://htmlpreview.github.io/?https://raw.github.com/qminer/qminer/master/docjs/timeseries.html
  • 21. Example – TimeSeries.js 2014-06-11 HTTP://QMINER.IJS.SI/ 21 // Declare features from the resampled timeseries var ftrSpace = analytics.newFeatureSpace([ { type: "numeric", source: "Resampled", field: "Value" }, { type: "numeric", source: "Resampled", field: "Ema1" }, { type: "numeric", source: "Resampled", field: "Ema2" }, { type: "multinomial", source: "Resampled", field: "Time", datetime: true } ]); // Initialize linear regression model. var linreg = analytics.newRecLinReg({ dim: ftrSpace.dim, forgetFact: 0.9999 }); // We register a trigger to Resampled store Resampled.addTrigger({ onAdd: function (val) { // Get the latest value for EMAs val.Ema1 = Resampled.getStreamAggr("ema1m").EMA; val.Ema2 = Resampled.getStreamAggr("ema10m").EMA; // Get the id of the record from a minute ago. var trainRecId = Resampled.getStreamAggr("delay").last; // Update the model, once we have at leats 1 minute worth of data linreg.learn(ftrSpace.ftrVec(Resampled[trainRecId]), val.Value); } }); http://htmlpreview.github.io/?https://raw.github.com/qminer/qminer/master/docjs/timeseries.html
  • 22. Example – linalg.js - CG 2014-06-11 HTTP://QMINER.IJS.SI/ 22 la.conjgrad = function (A, b, x) { var r = b.minus(A.multiply(x)); var p = la.newVec(r); //clone var rsold = r.inner(r); for (var i = 0; i < 2*x.length; i++) { var Ap = A.multiply(p); var alpha = rsold / Ap.inner(p); x = x.plus(p.multiply(alpha)); r = r.minus(Ap.multiply(alpha)); var rsnew = r.inner(r); console.say("resid = " + rsnew); if (Math.sqrt(rsnew) < 1e-6) { break; } p = r.plus(p.multiply(rsnew/rsold)); rsold = rsnew; } return x; }
  • 23. Example – Twitter.js – AL 2014-06-11 HTTP://QMINER.IJS.SI/ 23 // Load tweets from a file (toy example) var tweetsFile = "./sandbox/twitter/toytweets.txt"; var Tweets = qm.store("Tweets"); qm.load.jsonFile(Tweets, tweetsFile); // Select all tweets var recSet = Tweets.recs; // Active learning settings: start svm when 2 positive and 2 negative examples are provided var nPos = 2; var nNeg = 2; //active learning query mode // Initial query for "relevant" documents var relevantQuery = "nice bad"; // Create feature space var ftrSpace = analytics.newFeatureSpace([ { type: "text", source: "Tweets", field: "Text" }, ]); // Builds a new feature space ftrSpace.updateRecords(recSet); // Constructs the active learner var AL = new analytics.activeLearner(ftrSpace, "Text", recSet, nPos, nNeg, relevantQuery); // Starts the active learner (use the keyword stop to quit) AL.selectQuestion(); // Save the model AL.saveSvmModel(fs.openWrite('./sandbox/twitter/svmFilter.bin')); http://htmlpreview.github.io/?https://raw.github.com/qminer/qminer/master/docjs/twitter.html
  • 24. Example – Twitter.js : filtering 2014-06-11 HTTP://QMINER.IJS.SI/ 24 // Load the model from disk var fin = fs.openRead("./sandbox/twitter/svmFilter.bin"); var svmFilter = analytics.loadSvmModel(fin); // Filter relevant records: records are dropped if svmFilter predicts a v negative value recSet.filter(function (rec) { return svmFilter.predict(ftrSpace.ftrSpVec(rec)) > 0; }); // Filter the record set of by time // Clone the rec set two times var recSet1 = recSet.clone(); var recSet2 = recSet.clone(); // Set the cutoff date var tm = time.parse("2011-08-01T00:05:06"); // Get a record set with tweets older than tm recSet1.filter(function (rec) { return rec.Date.timestamp < tm.timestamp }) // Get a record set with tweets newer than tm recSet2.filter(function (rec) { return rec.Date.timestamp > tm.timestamp }) http://htmlpreview.github.io/?https://raw.github.com/qminer/qminer/master/docjs/twitter.html
  • 25. Usage Applications: ◦ Event registry ◦ Event Type classification ◦ News recommendation ◦ Web audience segmentation Projects: ◦ XLike ◦ Sophocles ◦ SMER+ ◦ Mobis ◦ ProaSense ◦ Symphony 2014-06-11 HTTP://QMINER.IJS.SI/ 25
  • 26. Thank you! 2014-06-11 HTTP://QMINER.IJS.SI/ 26 https://github.com/qminer/qminerhttp://qminer.ijs.si/