Traditional Big Data is done on Data you have. You load the data into a repository and perform map reduce or other style calculations on the data. However, certain industries need to perform complex operations on data you might not have. Data you can acquire, Data that can be shared with you, and Data that you can model are all types of data you may not have but may need to integrate instantly into a complex data analysis. Problem is: you may not even know you need this data until deep into the execution stack at runtime. This talk discusses a new functional language paradigm for dealing naturally with data you don’t have and about how to make all data first-class citizens, regardless of whether you have it or you don’t, and we will give a demo of a project written in Scala to deal exactly with this issue.
4. Information Classification: GENERAL
THERE ARE FOUR SOURCES OF DATA
4
Data I have
(traditional
“Big Data”)
Data I can model
Data I can
acquire
Data someone
else can acquire
or model
8. Information Classification: GENERAL
MSCI PLATFORM – A NEXT GENERATION LEAP
8
Big
Data
Repository
Hadoop / Cloudera etc
Slice/Dice
Traditional Big Data “Data you Have” Paradigm
Beon
New Front End
NEW Big Data Paradigm
Calculation and Data Services
On
Demand
Data
Expressions
The
Morning
Load
Virtual
fields
Dynamic
new data
10. Information Classification: GENERAL
WHAT IS A COMPLEX QUESTION VERSUS A SPECIFIC QUESTION?
10
Specific questions can be hard, for example:
• What happens to sea level if the temperature goes up 1.5 degrees by 2035?
• What properties are on the beach and over x meters above sea level in Marbella?
• What are the biggest real estate bargains in a portfolio.
Complex questions are combinations of specific questions.
• What should I buy if I believe that temperatures are going to raise 1.5 degrees by 2035 and I
only want property that will be at least 1 meter above sea level in 2035 but still on the beach.
11. Information Classification: GENERAL
HOW TO ANSWER A COMPLEX QUESTION
11
So to answer a complex question you need something that can answer this
Let Portfolio = All the houses in Marbella
safeHouses = Filter( SeaLevel >= 1.0 + seaLevelRise(1.5 c)) Portfolio
BestBargains = BargainFinder safeHouses
It does this by calling the services below for certain calculations.
Platform
Marbella
Houses
Planet
Simulator
Sea Level RaiseHouse Database
Execute the question
above, Filtering, etc..
Bargain
Finder
13. Information Classification: GENERAL
𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ → ℝ
Just for simplicity, lets assume we only care about real numbers (obviously, we could have tuples, strings,
dictionaries, any valid type honestly…)
Standard map reduce, Gamma is your class object/structure/thing
𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ, ℂ → {ℝ, ℂ}
First things first, we need a context.
14. Information Classification: GENERAL
?
Yesterday Today
My Portfolio is worth $43 My Portfolio is worth $40
Result
I lost $3
I lost $3/1.1 = € 2.72
My Portfolio is worth €
35.83
My Portfolio is worth €
36.36
I made € .53
The reason for the error is that this is a lie. You DID NOT LOSE $3.
The answer is “I have made or lost ($40 in todays context - $43 in yesterdays context)”
15. Information Classification: GENERAL
Now we also toss in some services…….
𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ, ℂ → {ℝ, ℂ}
Becomes
𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ, 𝑆, ℂ → {ℝ, ℂ}
𝑤ℎ𝑒𝑟𝑒 𝑆 = 𝑆1, 𝑆2, … , 𝑆𝑛 𝑜𝑢𝑟 𝑠𝑒𝑟𝑣𝑖𝑐𝑒𝑠
But what our services??? This is a functional language conference soooo, we use functions
to access services.
𝑙𝑒𝑡 Ϝ = Ϝ𝑖, 𝑗 𝑎𝑙𝑙 𝑖, 𝑗 𝑤𝑖𝑡ℎ Ϝ𝑖, 𝑗: {Γ, 𝑆1,𝑆2,…., 𝑆𝑖,ℂ} → {ℝ, ℂ}
𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ, Ϝ, ℂ → {ℝ, ℂ}
So new services
can leverage old
services
16. Information Classification: GENERAL
𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ, Ϝ, ℂ → {ℝ, ℂ}
𝓏: ⊕𝑖=1…𝑚 Γ, Ϝ, ℂ →
𝑘=1…𝑛
{ℝ, ℂ}
Data You Have Data You Can
Acquire
Data You Can
Model
Obvious Extensions…
17. Information Classification: GENERAL
𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ, Ϝ, ℂ → {ℝ, ℂ}
𝑎𝑏𝑠𝑡𝑟𝑎𝑐𝑡 𝑜𝑏𝑗𝑒𝑐𝑡 𝑠𝑝𝑎𝑐𝑒 Γ, Ϝ, ℂ , 𝑏𝑢𝑡 𝑡ℎ𝑒𝑟𝑒 𝑎𝑟𝑒 𝑜𝑡ℎ𝑒𝑟𝑠
Example:
• Γ = Customer Records
• F = purchasesOfWine(tenor)
• ℂ = Date
Example:
• Γ = CountryList +
wineSales
• F =
• weather()
• totalWineSales(tenor)
• ℂ = Date, weather
Customer Space Country Space
TRANSFORM
18. Information Classification: GENERAL
18
Customer Location Wine purchasesOfWine(tenor)
Bob Spain 1/1/2019 – 3btl
15/3/2019 – 2btl
Mary France 15/1/2019 – 2btl
Juan Spain 12/5/2019 –
6 btl
Edward England 13/4/2019 –
8 btl
TRANSFORM
Country Purchases totalWineSales(tenor) Weather()
Spain 11 bottles
France 2 bottles
England 8 bottles
Γ1, Ϝ1, ℂ1
Γ2, Ϝ2, ℂ2
𝓣 𝟏 Γ2, Ϝ2, ℂ2 = 𝒯1 ∘ Γ1, Ϝ1, ℂ1
23. Information Classification: GENERAL
rootVFP
|> scenario (asOf(15-05-2019))
|> Load “position” (filter(“MSCI USA – Daily”))
|> filter (instrument.ESG.WomenOnBoard = true)
THIS IS NOT AN IMPERATIVE ORDERING!!!!!!!!!!!!
Companies with Women on
Board
MSCI
IBM
Apple
|> scenario (timeseries(Date(1,1,2019),Date(15,5,2019) ) )
Companies with Women on
Board
1/1/2019 – {list of companies}
2/1/2019 – (list of companies)
3/1/2019 – (list of companies)
27. Information Classification: GENERAL
Service API layer
MSCI BEON – A NEW PARADIGM
27
Framework based on the Beon Engine
Functions Library
Process X
I’m Process X
and I can
provide x
Process Y
I’m Process Y
and I can
provide y
Process S Process T Process C
x -> ProcessX
y -> ProcessY
s -> ProcessS
t -> ProcessT
c -> ProcessC
Beon Engine
a = x + y
b = s / t
28. Information Classification: GENERAL
Service API layer
MSCI BEON – A NEW PARADIGM
28
Everything starts with a question …
Functions Library
Process X Process Y Process S Process T Process C
x -> ProcessX
y -> ProcessY
s -> ProcessS
t -> ProcessT
c -> ProcessC
Beon Engine
a = x + y
b = s / t
Query API
ResultSpec request
29. Information Classification: GENERAL
Service API layer
MSCI BEON – A NEW PARADIGM
29
The question is then expanded, compiled into byte code, and then parametrized with a context …
Functions Library
Process X Process Y Process S Process T Process C
x -> ProcessX
y -> ProcessY
s -> ProcessS
t -> ProcessT
c -> ProcessC
Beon Engine
a = x + y
b = s / t
Query API
ResultSpec request
Compiler Execution Enginea
s
w d
t
m o u
c
h
p
a
s
w d
m o
c
h
p
a
s
w d
c
a
s c
Context
Compiler
30. Information Classification: GENERAL
Service API layer
MSCI BEON – A NEW PARADIGM
30
Then executed against the various data services. Results are then recombined and presented back.
Functions Library
Process X Process Y Process S Process T Process C
x -> ProcessX
y -> ProcessY
s -> ProcessS
t -> ProcessT
c -> ProcessC
Beon Engine
a = x + y
b = s / t
Query API
ResultSpec request
Compiler Execution Enginea
s
w d
t
m o u
c
h
p
a
s
w d
m o
c
h
p
a
s
w d
c
a
s c
Conte
xt
Processing …