SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Payments to grow your world
Unlocking AI:
Navigating Open Source
vs. Commercial Frontiers
Raphaël Semeteys
Head of DevRel, Senior Architect at Worldline
March 16th
Centrul Regional de Afaceri, Timișoara
We design payments technology
that powers the growth of millions
of businesses around the world.
7000+ engineers
in over 40 countries
Managing 43+ billion
transactions per year
€250M spent in R&D
every year
Handling 150+
payment methods
The early days of LLMs
From rule-based and simpler statistical models to LLMs
2010’s 2020’s
2017-2018
Word embeddings
such as Word2Vec
and GloVe
“Attention is All You
Need"
Transformers, BERT
Generative AI,
ChatGPT responsibility
concerns
GenAI is having its Linux Moment
• Just like open source and Internet, bust much faster!
• Dynamics between collaborative openness and commercial ownership
• Need of clarity on licenses
Labs &
Universities
Individuals
Enterprises
Commodities
Defining Openness of a LLM
Pre-training
Dataset
Fine-tuning
Dataset
Reward
Model
Model
Data Processing Code
Defining Openness of a LLM
Score Level Description
Model
(weights)
Pre-
training
Dataset
Fine-
tuning
Dataset
Reward
model
Data
Processing
Code
0 Closed
No access to any public
information, data or asset
1
Published
research
only
Research papers(s) published but
with no more information, data or
asset
2
Restricted
access
Access to asset is possible only
with special agreement
(commercial, research…)
3
Open with
limitations
Access and reuse of asset is
possible but with certain
limitations on usage
4 Totally open
Access and reuse of asset is
possible without restriction on
usage (ex. open source license)
Market-Leading Player: OpenAI
Deviation from original vision of research transparency & openness
Non/For-profit (US)
Component Score
Level
description
Model 4 Totally open
Dataset 1
Published
research
only
Code 1
Published
research
only
0 Closed
→
GPT-1 & 2 GPT-3 & 4
ChatGPT
research paper only
Market-Leading Player: OpenAI
Deviation from original vision of research transparency & openness
Non/For-profit (US)
Component Score
Level
description
Model 4 Totally open
Dataset 1
Published
research
only
Code 1
Published
research
only
0 Closed
→
GPT-1 & 2 GPT-3 & 4
ChatGPT
research paper only
No training of other commercial LLMs
You may not: […] Use Output to
develop models that compete with
OpenAI.
Market-Leading Player: Google
Transition from open research to a pragmatic approach
Enterprise (US)
Component Score
Level
description
Model 4 Totally open
Dataset 2
Restricted
access
Code 4 Totally open
1
Published
research only
1
Published
research only
0 Closed
→
3
Open with
limitations
1
Published
research only
4
Toolchain
available
→
Market-Leading Player: Google
Transition from open research to a pragmatic approach
Enterprise (US)
Component Score
Level
description
Model 4 Totally open
Dataset 2
Restricted
access
Code 4 Totally open
1
Published
research only
1
Published
research only
0 Closed
→
3
Open with
limitations
1
Published
research only
4
Toolchain
available
→
You may not use nor allow others to use Gemma or
Model Derivatives to: [illegals activities, unlicensed
practices of profession, abuse, security bypass and
promotion of hatred, abuse, violence, monitoring
people without consent,
misinformation/defamation, automate decisions
concerning human rights and well-being, etc.]
Responsible AI contradicts Open Source Definition
Market-Leading Player: Meta
Journey to openness
Enterprise (US)
Component Score
Level
description
Model 4 Totally open
Dataset 3
Open with
limitations
Code 4 Totally open
RoBERTa
3
Open with
limitations
1
Published
research only
1
Published
research only
→
Market-Leading Player: Meta
Journey to openness
Enterprise (US)
Component Score
Level
description
Model 4 Totally open
Dataset 3
Open with
limitations
Code 4 Totally open
RoBERTa
3
Open with
limitations
1
Published
research only
1
Published
research only
→
Restriction on usage: license for platforms with 700+ M users
Additional Commercial Terms. If, on the Llama 2 version release date,
the monthly active users of the products or services made available by or
for Licensee, or Licensee’s affiliates, is greater than 700 million monthly
active users in the preceding calendar month, you must request a license
from Meta, which Meta may grant to you in its sole discretion, and you
are not authorized to exercise any of the rights under this Agreement
unless or until Meta otherwise expressly grants you such rights.
Llama offspring’s: Alpaca and Vicuna
Fine-tuned models from Llama 2 by universities
Research (US)
Component Score
Level
description
Model 3
Open with
limitations
Pre-training
Dataset
1
Published
research only
Fine-tuning
Dataset
2
Research use
only
Code 4
Under Apache
2 license
Restrictions from both Llama 2 and OpenAI (ShareGPT)
Collaborative foundational LLMs
Non-profit (US) Research (UAE) Research (EU) Research (US) Enterprise (FR)
EleutherAI GPT-J Falcon BLOOM OpenLLaMa Mistral
Model 4
Access and
reuse
without
restriction
3
Open with
limitations
3
Open RAIL
license
4
Access and
reuse
without
restriction
4
Access and
reuse
without
restriction
Dataset 3
Open with
limitations
4
Access and
reuse
without
restriction
3
Open with
limitations
4
Access and
reuse
without
restriction
0
No public
information
or access
Code 4
Completely
open
1
General
instructions
4
Completely
open
1
Just
examples
4
Completely
open
Dataset fuzziness: please refer to the specific license depending on the subset you use
Notion of responsible usage
Collaborative foundational LLMs
Dataset fuzziness: please refer to the specific license depending on the subset you use
Notion of responsible usage
Non-profit (US) Research (UAE) Research (EU) Research (US) Enterprise (FR)
EleutherAI GPT-J Falcon BLOOM OpenLLaMa Mistral
Model 4
Access and
reuse
without
restriction
3
Open with
limitations
3
Open RAIL
license
4
Access and
reuse
without
restriction
4
Access and
reuse
without
restriction
Dataset 3
Open with
limitations
4
Access and
reuse
without
restriction
3
Open with
limitations
4
Access and
reuse
without
restriction
0
No public
information
or access
Code 4
Completely
open
1
General
instructions
4
Completely
open
1
Just
examples
4
Completely
open
This license is, in part, based on the Apache License Version 2.0, with a
series of modifications. The contribution of the Apache License 2.0 to
the framing of this document is acknowledged. Please read this license
carefully, as it is different to other ‘open access’ licenses you may have
encountered previously. Use of Falcon180B for hosted services may
require a separate license.
Collaborative fine-tuned LLMs
Enterprise (US) Enterprise (US) Enterprise (US) Consortium (UAE/US)
Dolly BLOOMChat Zephyr LLM360
Model 4 Based on GPT-J 3
Based on
BLOOM
4
Based on
Mistral
4 Open source
Pre-training
Dataset
3 Based on GPT-J 3
Based on
BLOOM
0
Based on
Mistral
4
RedPajama,
Falcon, StarCoder
Fine-tuning
Dataset
4
Access and
reuse without
restriction
4
Dolly and
LAION
2
Research use
only (OpenAI)
2
Research use only
(OpenAI)
Reward model 0
No public
information
available
0
No public
information
available
3
Paper and code
examples
0
No public
information
available
Code 4 Open source 3 OpenRAIL 3
Example code
available
4 Open source
Impact of foundational model or pre-training datasets
Collaborative fine-tuned LLMs
Enterprise (US) Enterprise (US) Enterprise (US) Consortium (UAE/US)
Dolly BLOOMChat Zephyr LLM360
Model 4 Based on GPT-J 3
Based on
BLOOM
4
Based on
Mistral
4 Open source
Pre-training
Dataset
3 Based on GPT-J 3
Based on
BLOOM
0
Based on
Mistral
4
RedPajama,
Falcon, StarCoder
Fine-tuning
Dataset
4
Access and
reuse without
restriction
4
Dolly and
LAION
2
Research use
only (OpenAI)
2
Research use only
(OpenAI)
Reward model 0
No public
information
available
0
No public
information
available
3
Paper and code
examples
0
No public
information
available
Code 4 Open source 3 OpenRAIL 3
Example code
available
4 Open source
Impact of foundational model or pre-training datasets
BLOOMChat Use Restrictions
l. To provide medical advice and medical results interpretation; or
m. To generate or disseminate information for the purpose to be used
for administration of justice, law enforcement, immigration or asylum
processes, such as predicting an individual will commit fraud/crime
commitment.
Collaboration platform: Hugging Face
Enabler for collaboration and reuse
• Startup and ecosystem dedicated to democratizing AI
• Open source Transformers library
• LLM leaderboard: upload and assess models
• The “GitHub of AI”
• Collaborative space for exploring, sharing and experimenting AI
• Hosts thousands of models, datasets, and demo applications
Hosting and resource paradigms
Closed models are centralized and resource-consuming
Big players invest billions (Microsoft/OpenAI, AWS/Anthropic)
CSP selling shovels in the AI Gold rush
Source: numind.ai
Hosting and resource paradigms
• Democratizing AI Computing
• Quantization, AI Chips
• Run models locally, in containers
• Emergence of smaller models for edge and mobile
• Small/Tiny Language Models: Gemini nano, Microsoft Phi-2, Huawei TinyBERT
• Domain Specific Language Models: BloombergGPT, BioMistral, Harvey (law)
• Mixture of models: Mixtral 8x7B, OpenMoE → Mixture of licenses?
Key takeaways
• Hyper-centralization leads to black boxes and closed solutions
• Openness
• Fosters collaboration and fuels community-driven innovation
• Enables inclusivity
• Just like opensource software beware of licenses and restrictions
• GenAI’s innovation continually reshapes the landscape
Thank you
Raphaël Semeteys - Worldline
@RaphaelSemeteys
https://blog.worldline.tech
https://dev.to/raphiki
Check the two-part article co-written with Luxin Zhang
Want to shape
how the world
pays & gets paid?
Explore our jobs in tech:
careers.worldline.com

Weitere ähnliche Inhalte

Ähnlich wie I LOVE Tech 2024 - Unlocking AI: Navigating Open Source vs. Commercial Frontiers

Open Source Compliance at Twitter
Open Source Compliance at TwitterOpen Source Compliance at Twitter
Open Source Compliance at TwitterChris Aniszczyk
 
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...Edge AI and Vision Alliance
 
OSSF 2018 - Daniel Izquierdo of Bitergia / InnerSource Commons - Starting wit...
OSSF 2018 - Daniel Izquierdo of Bitergia / InnerSource Commons - Starting wit...OSSF 2018 - Daniel Izquierdo of Bitergia / InnerSource Commons - Starting wit...
OSSF 2018 - Daniel Izquierdo of Bitergia / InnerSource Commons - Starting wit...FINOS
 
Asayopenbravowccommunityimperativenv 090419061800 Phpapp01
Asayopenbravowccommunityimperativenv 090419061800 Phpapp01Asayopenbravowccommunityimperativenv 090419061800 Phpapp01
Asayopenbravowccommunityimperativenv 090419061800 Phpapp01Ruchi Raveendran
 
Matt Asay - The Community Imperative - Openbravo World Conference 2009
Matt Asay - The Community Imperative - Openbravo World Conference 2009Matt Asay - The Community Imperative - Openbravo World Conference 2009
Matt Asay - The Community Imperative - Openbravo World Conference 2009Matt Asay
 
NTC16 - Open Data and Open Source Data Science
NTC16 - Open Data and Open Source Data ScienceNTC16 - Open Data and Open Source Data Science
NTC16 - Open Data and Open Source Data ScienceSteph Nagoski
 
Open source presentation to lgma workshop april 2010
Open source presentation to lgma workshop april 2010Open source presentation to lgma workshop april 2010
Open source presentation to lgma workshop april 2010OpenSourceLGMA
 
GPT-4 What It Is, What It Does, How It Deals with Content.pdf
GPT-4 What It Is, What It Does, How It Deals with Content.pdfGPT-4 What It Is, What It Does, How It Deals with Content.pdf
GPT-4 What It Is, What It Does, How It Deals with Content.pdfAdsy
 
Open soucre(cut shrt)
Open soucre(cut shrt)Open soucre(cut shrt)
Open soucre(cut shrt)Shivani Rai
 
OCITA 2012: Opening Up to Open Source Software for Government
OCITA 2012: Opening Up to Open Source Software for GovernmentOCITA 2012: Opening Up to Open Source Software for Government
OCITA 2012: Opening Up to Open Source Software for GovernmentJillmz
 
Open Source In Further Education
Open Source In Further EducationOpen Source In Further Education
Open Source In Further EducationRoss Gardler
 
OSFair2017 Training | OpenMinTeD platform training
OSFair2017 Training | OpenMinTeD platform trainingOSFair2017 Training | OpenMinTeD platform training
OSFair2017 Training | OpenMinTeD platform trainingOpen Science Fair
 
OWF14 - Plenary Session : Louis-David Benyayer, Président, Without Model
OWF14 - Plenary Session : Louis-David Benyayer, Président, Without ModelOWF14 - Plenary Session : Louis-David Benyayer, Président, Without Model
OWF14 - Plenary Session : Louis-David Benyayer, Président, Without ModelParis Open Source Summit
 
Promoting the Semantic Web
Promoting the Semantic WebPromoting the Semantic Web
Promoting the Semantic WebOptum
 
Leveraging Generative AI: Exploring New Technology for Data Integration
Leveraging Generative AI: Exploring New Technology for Data IntegrationLeveraging Generative AI: Exploring New Technology for Data Integration
Leveraging Generative AI: Exploring New Technology for Data IntegrationSafe Software
 
Economics of Open Source Software
Economics of Open Source SoftwareEconomics of Open Source Software
Economics of Open Source SoftwareRay Toal
 
Will Postgres Live Forever?
Will Postgres Live Forever?Will Postgres Live Forever?
Will Postgres Live Forever?EDB
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryNeo4j
 
Open Source All The Things
Open Source All The ThingsOpen Source All The Things
Open Source All The ThingsAll Things Open
 

Ähnlich wie I LOVE Tech 2024 - Unlocking AI: Navigating Open Source vs. Commercial Frontiers (20)

Open Source Compliance at Twitter
Open Source Compliance at TwitterOpen Source Compliance at Twitter
Open Source Compliance at Twitter
 
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
 
OSSF 2018 - Daniel Izquierdo of Bitergia / InnerSource Commons - Starting wit...
OSSF 2018 - Daniel Izquierdo of Bitergia / InnerSource Commons - Starting wit...OSSF 2018 - Daniel Izquierdo of Bitergia / InnerSource Commons - Starting wit...
OSSF 2018 - Daniel Izquierdo of Bitergia / InnerSource Commons - Starting wit...
 
Asayopenbravowccommunityimperativenv 090419061800 Phpapp01
Asayopenbravowccommunityimperativenv 090419061800 Phpapp01Asayopenbravowccommunityimperativenv 090419061800 Phpapp01
Asayopenbravowccommunityimperativenv 090419061800 Phpapp01
 
Matt Asay - The Community Imperative - Openbravo World Conference 2009
Matt Asay - The Community Imperative - Openbravo World Conference 2009Matt Asay - The Community Imperative - Openbravo World Conference 2009
Matt Asay - The Community Imperative - Openbravo World Conference 2009
 
NTC16 - Open Data and Open Source Data Science
NTC16 - Open Data and Open Source Data ScienceNTC16 - Open Data and Open Source Data Science
NTC16 - Open Data and Open Source Data Science
 
Open source
Open sourceOpen source
Open source
 
Open source presentation to lgma workshop april 2010
Open source presentation to lgma workshop april 2010Open source presentation to lgma workshop april 2010
Open source presentation to lgma workshop april 2010
 
GPT-4 What It Is, What It Does, How It Deals with Content.pdf
GPT-4 What It Is, What It Does, How It Deals with Content.pdfGPT-4 What It Is, What It Does, How It Deals with Content.pdf
GPT-4 What It Is, What It Does, How It Deals with Content.pdf
 
Open soucre(cut shrt)
Open soucre(cut shrt)Open soucre(cut shrt)
Open soucre(cut shrt)
 
OCITA 2012: Opening Up to Open Source Software for Government
OCITA 2012: Opening Up to Open Source Software for GovernmentOCITA 2012: Opening Up to Open Source Software for Government
OCITA 2012: Opening Up to Open Source Software for Government
 
Open Source In Further Education
Open Source In Further EducationOpen Source In Further Education
Open Source In Further Education
 
OSFair2017 Training | OpenMinTeD platform training
OSFair2017 Training | OpenMinTeD platform trainingOSFair2017 Training | OpenMinTeD platform training
OSFair2017 Training | OpenMinTeD platform training
 
OWF14 - Plenary Session : Louis-David Benyayer, Président, Without Model
OWF14 - Plenary Session : Louis-David Benyayer, Président, Without ModelOWF14 - Plenary Session : Louis-David Benyayer, Président, Without Model
OWF14 - Plenary Session : Louis-David Benyayer, Président, Without Model
 
Promoting the Semantic Web
Promoting the Semantic WebPromoting the Semantic Web
Promoting the Semantic Web
 
Leveraging Generative AI: Exploring New Technology for Data Integration
Leveraging Generative AI: Exploring New Technology for Data IntegrationLeveraging Generative AI: Exploring New Technology for Data Integration
Leveraging Generative AI: Exploring New Technology for Data Integration
 
Economics of Open Source Software
Economics of Open Source SoftwareEconomics of Open Source Software
Economics of Open Source Software
 
Will Postgres Live Forever?
Will Postgres Live Forever?Will Postgres Live Forever?
Will Postgres Live Forever?
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
Open Source All The Things
Open Source All The ThingsOpen Source All The Things
Open Source All The Things
 

Mehr von Raphaël Semeteys

2023 - Between Philosophy and Practice: Introducing Yoga
2023 - Between Philosophy and Practice: Introducing Yoga2023 - Between Philosophy and Practice: Introducing Yoga
2023 - Between Philosophy and Practice: Introducing YogaRaphaël Semeteys
 
OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...
OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...
OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...Raphaël Semeteys
 
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3Raphaël Semeteys
 
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?Raphaël Semeteys
 
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutions
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutionsSnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutions
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutionsRaphaël Semeteys
 
Solution Linux 2009 - JavaScript
Solution Linux 2009 - JavaScriptSolution Linux 2009 - JavaScript
Solution Linux 2009 - JavaScriptRaphaël Semeteys
 
Solutions Linux 2008 - JavaScript
Solutions Linux 2008 - JavaScriptSolutions Linux 2008 - JavaScript
Solutions Linux 2008 - JavaScriptRaphaël Semeteys
 
Solutions Linux 2008 - Poste de travail Linux
Solutions Linux 2008 - Poste de travail LinuxSolutions Linux 2008 - Poste de travail Linux
Solutions Linux 2008 - Poste de travail LinuxRaphaël Semeteys
 

Mehr von Raphaël Semeteys (13)

2023 - Between Philosophy and Practice: Introducing Yoga
2023 - Between Philosophy and Practice: Introducing Yoga2023 - Between Philosophy and Practice: Introducing Yoga
2023 - Between Philosophy and Practice: Introducing Yoga
 
OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...
OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...
OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...
 
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3
 
Nantes JUG 2023 - Web3
Nantes JUG 2023 - Web3Nantes JUG 2023 - Web3
Nantes JUG 2023 - Web3
 
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?
 
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutions
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutionsSnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutions
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutions
 
Solution Linux 2009 - QSOS
Solution Linux 2009 - QSOSSolution Linux 2009 - QSOS
Solution Linux 2009 - QSOS
 
Solution Linux 2009 - SVG
Solution Linux 2009 - SVGSolution Linux 2009 - SVG
Solution Linux 2009 - SVG
 
Solution Linux 2009 - JavaScript
Solution Linux 2009 - JavaScriptSolution Linux 2009 - JavaScript
Solution Linux 2009 - JavaScript
 
Solutions Linux 2008 - JavaScript
Solutions Linux 2008 - JavaScriptSolutions Linux 2008 - JavaScript
Solutions Linux 2008 - JavaScript
 
Solutions Linux 2008 - Poste de travail Linux
Solutions Linux 2008 - Poste de travail LinuxSolutions Linux 2008 - Poste de travail Linux
Solutions Linux 2008 - Poste de travail Linux
 
Solutions Linux 2008 - ECOS
Solutions Linux 2008 - ECOSSolutions Linux 2008 - ECOS
Solutions Linux 2008 - ECOS
 
Solutions Linux 2007 - QSOS
Solutions Linux 2007 - QSOSSolutions Linux 2007 - QSOS
Solutions Linux 2007 - QSOS
 

Kürzlich hochgeladen

Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Gáspár Nagy
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdfkalichargn70th171
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)Max Lee
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Krakówbim.edu.pl
 
OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024Shane Coughlan
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion Clinic
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfDeskTrack
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationWave PLM
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfmbmh111980
 
The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionWave PLM
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems ApproachNeo4j
 
Studiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareStudiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareinfo611746
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAlluxio, Inc.
 
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfImplementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfVictor Lopez
 
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdfMicrosoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdfQ-Advise
 
CompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfCompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfFurqanuddin10
 
What need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java DevelopersWhat need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java DevelopersEmilyJiang23
 

Kürzlich hochgeladen (20)

Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 
5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 
OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdf
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion Production
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
 
AI Hackathon.pptx
AI                        Hackathon.pptxAI                        Hackathon.pptx
AI Hackathon.pptx
 
Studiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareStudiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting software
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfImplementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
 
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdfMicrosoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
 
CompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfCompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdf
 
What need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java DevelopersWhat need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java Developers
 

I LOVE Tech 2024 - Unlocking AI: Navigating Open Source vs. Commercial Frontiers

  • 1. Payments to grow your world Unlocking AI: Navigating Open Source vs. Commercial Frontiers Raphaël Semeteys Head of DevRel, Senior Architect at Worldline March 16th Centrul Regional de Afaceri, Timișoara
  • 2. We design payments technology that powers the growth of millions of businesses around the world. 7000+ engineers in over 40 countries Managing 43+ billion transactions per year €250M spent in R&D every year Handling 150+ payment methods
  • 3. The early days of LLMs From rule-based and simpler statistical models to LLMs 2010’s 2020’s 2017-2018 Word embeddings such as Word2Vec and GloVe “Attention is All You Need" Transformers, BERT Generative AI, ChatGPT responsibility concerns
  • 4. GenAI is having its Linux Moment • Just like open source and Internet, bust much faster! • Dynamics between collaborative openness and commercial ownership • Need of clarity on licenses Labs & Universities Individuals Enterprises Commodities
  • 5. Defining Openness of a LLM Pre-training Dataset Fine-tuning Dataset Reward Model Model Data Processing Code
  • 6. Defining Openness of a LLM Score Level Description Model (weights) Pre- training Dataset Fine- tuning Dataset Reward model Data Processing Code 0 Closed No access to any public information, data or asset 1 Published research only Research papers(s) published but with no more information, data or asset 2 Restricted access Access to asset is possible only with special agreement (commercial, research…) 3 Open with limitations Access and reuse of asset is possible but with certain limitations on usage 4 Totally open Access and reuse of asset is possible without restriction on usage (ex. open source license)
  • 7. Market-Leading Player: OpenAI Deviation from original vision of research transparency & openness Non/For-profit (US) Component Score Level description Model 4 Totally open Dataset 1 Published research only Code 1 Published research only 0 Closed → GPT-1 & 2 GPT-3 & 4 ChatGPT research paper only
  • 8. Market-Leading Player: OpenAI Deviation from original vision of research transparency & openness Non/For-profit (US) Component Score Level description Model 4 Totally open Dataset 1 Published research only Code 1 Published research only 0 Closed → GPT-1 & 2 GPT-3 & 4 ChatGPT research paper only No training of other commercial LLMs You may not: […] Use Output to develop models that compete with OpenAI.
  • 9. Market-Leading Player: Google Transition from open research to a pragmatic approach Enterprise (US) Component Score Level description Model 4 Totally open Dataset 2 Restricted access Code 4 Totally open 1 Published research only 1 Published research only 0 Closed → 3 Open with limitations 1 Published research only 4 Toolchain available →
  • 10. Market-Leading Player: Google Transition from open research to a pragmatic approach Enterprise (US) Component Score Level description Model 4 Totally open Dataset 2 Restricted access Code 4 Totally open 1 Published research only 1 Published research only 0 Closed → 3 Open with limitations 1 Published research only 4 Toolchain available → You may not use nor allow others to use Gemma or Model Derivatives to: [illegals activities, unlicensed practices of profession, abuse, security bypass and promotion of hatred, abuse, violence, monitoring people without consent, misinformation/defamation, automate decisions concerning human rights and well-being, etc.] Responsible AI contradicts Open Source Definition
  • 11. Market-Leading Player: Meta Journey to openness Enterprise (US) Component Score Level description Model 4 Totally open Dataset 3 Open with limitations Code 4 Totally open RoBERTa 3 Open with limitations 1 Published research only 1 Published research only →
  • 12. Market-Leading Player: Meta Journey to openness Enterprise (US) Component Score Level description Model 4 Totally open Dataset 3 Open with limitations Code 4 Totally open RoBERTa 3 Open with limitations 1 Published research only 1 Published research only → Restriction on usage: license for platforms with 700+ M users Additional Commercial Terms. If, on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
  • 13. Llama offspring’s: Alpaca and Vicuna Fine-tuned models from Llama 2 by universities Research (US) Component Score Level description Model 3 Open with limitations Pre-training Dataset 1 Published research only Fine-tuning Dataset 2 Research use only Code 4 Under Apache 2 license Restrictions from both Llama 2 and OpenAI (ShareGPT)
  • 14. Collaborative foundational LLMs Non-profit (US) Research (UAE) Research (EU) Research (US) Enterprise (FR) EleutherAI GPT-J Falcon BLOOM OpenLLaMa Mistral Model 4 Access and reuse without restriction 3 Open with limitations 3 Open RAIL license 4 Access and reuse without restriction 4 Access and reuse without restriction Dataset 3 Open with limitations 4 Access and reuse without restriction 3 Open with limitations 4 Access and reuse without restriction 0 No public information or access Code 4 Completely open 1 General instructions 4 Completely open 1 Just examples 4 Completely open Dataset fuzziness: please refer to the specific license depending on the subset you use Notion of responsible usage
  • 15. Collaborative foundational LLMs Dataset fuzziness: please refer to the specific license depending on the subset you use Notion of responsible usage Non-profit (US) Research (UAE) Research (EU) Research (US) Enterprise (FR) EleutherAI GPT-J Falcon BLOOM OpenLLaMa Mistral Model 4 Access and reuse without restriction 3 Open with limitations 3 Open RAIL license 4 Access and reuse without restriction 4 Access and reuse without restriction Dataset 3 Open with limitations 4 Access and reuse without restriction 3 Open with limitations 4 Access and reuse without restriction 0 No public information or access Code 4 Completely open 1 General instructions 4 Completely open 1 Just examples 4 Completely open This license is, in part, based on the Apache License Version 2.0, with a series of modifications. The contribution of the Apache License 2.0 to the framing of this document is acknowledged. Please read this license carefully, as it is different to other ‘open access’ licenses you may have encountered previously. Use of Falcon180B for hosted services may require a separate license.
  • 16. Collaborative fine-tuned LLMs Enterprise (US) Enterprise (US) Enterprise (US) Consortium (UAE/US) Dolly BLOOMChat Zephyr LLM360 Model 4 Based on GPT-J 3 Based on BLOOM 4 Based on Mistral 4 Open source Pre-training Dataset 3 Based on GPT-J 3 Based on BLOOM 0 Based on Mistral 4 RedPajama, Falcon, StarCoder Fine-tuning Dataset 4 Access and reuse without restriction 4 Dolly and LAION 2 Research use only (OpenAI) 2 Research use only (OpenAI) Reward model 0 No public information available 0 No public information available 3 Paper and code examples 0 No public information available Code 4 Open source 3 OpenRAIL 3 Example code available 4 Open source Impact of foundational model or pre-training datasets
  • 17. Collaborative fine-tuned LLMs Enterprise (US) Enterprise (US) Enterprise (US) Consortium (UAE/US) Dolly BLOOMChat Zephyr LLM360 Model 4 Based on GPT-J 3 Based on BLOOM 4 Based on Mistral 4 Open source Pre-training Dataset 3 Based on GPT-J 3 Based on BLOOM 0 Based on Mistral 4 RedPajama, Falcon, StarCoder Fine-tuning Dataset 4 Access and reuse without restriction 4 Dolly and LAION 2 Research use only (OpenAI) 2 Research use only (OpenAI) Reward model 0 No public information available 0 No public information available 3 Paper and code examples 0 No public information available Code 4 Open source 3 OpenRAIL 3 Example code available 4 Open source Impact of foundational model or pre-training datasets BLOOMChat Use Restrictions l. To provide medical advice and medical results interpretation; or m. To generate or disseminate information for the purpose to be used for administration of justice, law enforcement, immigration or asylum processes, such as predicting an individual will commit fraud/crime commitment.
  • 18. Collaboration platform: Hugging Face Enabler for collaboration and reuse • Startup and ecosystem dedicated to democratizing AI • Open source Transformers library • LLM leaderboard: upload and assess models • The “GitHub of AI” • Collaborative space for exploring, sharing and experimenting AI • Hosts thousands of models, datasets, and demo applications
  • 19. Hosting and resource paradigms Closed models are centralized and resource-consuming Big players invest billions (Microsoft/OpenAI, AWS/Anthropic) CSP selling shovels in the AI Gold rush Source: numind.ai
  • 20. Hosting and resource paradigms • Democratizing AI Computing • Quantization, AI Chips • Run models locally, in containers • Emergence of smaller models for edge and mobile • Small/Tiny Language Models: Gemini nano, Microsoft Phi-2, Huawei TinyBERT • Domain Specific Language Models: BloombergGPT, BioMistral, Harvey (law) • Mixture of models: Mixtral 8x7B, OpenMoE → Mixture of licenses?
  • 21. Key takeaways • Hyper-centralization leads to black boxes and closed solutions • Openness • Fosters collaboration and fuels community-driven innovation • Enables inclusivity • Just like opensource software beware of licenses and restrictions • GenAI’s innovation continually reshapes the landscape
  • 22. Thank you Raphaël Semeteys - Worldline @RaphaelSemeteys https://blog.worldline.tech https://dev.to/raphiki Check the two-part article co-written with Luxin Zhang
  • 23. Want to shape how the world pays & gets paid? Explore our jobs in tech: careers.worldline.com