SlideShare ist ein Scribd-Unternehmen logo
1 von 23
From OpenAI
to Open Source AI
Navigating Between Commercial Ownership and Collaborative Openness
https://stateofopencon.com/ #stateofopencon #soocon24 #openuk
https://hachyderm.io/@openuk
Raphaël Semeteys (and Luxin Zhang) - Worldline
Introduction
Raphaël Semeteys
• Open source since 1997, professionally since 2004
• Yoga Teacher, Creator of the QSOS method
• Head of DevRel at Worldline
7000+ engineers
in over 40 countries
Managing 43+ billion
transactions per year
€250M spent in R&D
every year
Handling 150+
payment methods
We design payments technology that powers the growth
of millions of businesses around the world
The early days of LLMs
From rule-based and simpler statistical models to LLMs
2010’s 2020’s
2017-2018
Word embeddings
such as Word2Vec
and GloVe
“Attention is All You Need"
Transformers, BERT
Generative AI, ChatGPT
responsibility concerns
GenAI is having its Linux Moment
• Just like open source and Internet, bust much faster!
• Dynamics between collaborative openness and commercial ownership
• Need of clarity on licenses
Labs &
Universities
Individuals
Enterprises
Commodities
Defining Openness of a LLM
Pre-training
Dataset
Fine-tuning
Dataset
Reward
Model
Model
Data Processing Code
Defining Openness of a LLM
Score Level Description
Model
(weights)
Pre-
training
Dataset
Fine-
tuning
Dataset
Reward
model
Data
Processing
Code
0 Closed
No access to any public
information, data or asset
1
Published
research only
Research papers(s) published but
with no more information, data or
asset
2
Restricted
access
Access to asset is possible only
with special agreement
(commercial, research…)
3
Open with
limitations
Access and reuse of asset is
possible but with certain
limitations on usage (ex. Open
RAIL)
4 Totally open
Access and reuse of asset is
possible without restriction (ex.
open source license)
Market-Leading Player: OpenAI
Deviation from original vision of research transparency & openness
Non/For-profit (US)
Component Score
Level
description
Model 4 Totally open
Dataset 1
Published
research
only
Code 1
Published
research
only
0 Closed

GPT-1 & 2 GPT-3 & 4
ChatGPT
research paper only
No training of other commercial LLMs
Market-Leading Player: OpenAI
Deviation from original vision of research transparency & openness
Non/For-profit (US)
Component Score
Level
description
Model 4 Totally open
Dataset 1
Published
research
only
Code 1
Published
research
only
0 Closed

GPT-1 & 2 GPT-3 & 4
ChatGPT
research paper only
No training of other commercial LLMs
You may not: […] Use Output to develop
models that compete with OpenAI.
Market-Leading Player: Google
Transition from open research to proprietary commercial approach
Enterprise (US)
Component Score
Level
description
Model 4 Totally open
Dataset 2
Restricted
access
Code 4 Totally open
BERT PaLM 2 & Gemini
1
Published
research only
1
Published
research only
0 Closed

Market-Leading Player: Meta
Journey to openness
Enterprise (US)
Component Score
Level
description
Model 4 Totally open
Dataset 3
Open with
limitations
Code 4 Totally open
RoBERTa Llama 2
3
Open with
limitations
1
Published
research only
1
Published
research only

Restriction on usage: license for platforms with 700+ M users
Market-Leading Player: Meta
Journey to openness
Enterprise (US)
Component Score
Level
description
Model 4 Totally open
Dataset 3
Open with
limitations
Code 4 Totally open
RoBERTa Llama 2
3
Open with
limitations
1
Published
research only
1
Published
research only

Restriction on usage: license for platforms with 700+ M users
Additional Commercial Terms. If, on the Llama 2 version release date, the
monthly active users of the products or services made available by or for
Licensee, or Licensee’s affiliates, is greater than 700 million monthly active
users in the preceding calendar month, you must request a license from
Meta, which Meta may grant to you in its sole discretion, and you are not
authorized to exercise any of the rights under this Agreement unless or
until Meta otherwise expressly grants you such rights.
Llama offspring’s: Alpaca and Vicuna
Fine-tuned models from Llama 2 by universities
Research (US)
Component Score
Level
description
Model 3
Open with
limitations
Pre-training
Dataset
1
Published
research only
Fine-tuning
Dataset
2
Research use
only
Code 4
Under Apache
2 license
Restrictions from both Llama 2 and OpenAI (ShareGPT)
Collaborative foundational LLMs
Dataset fuzziness: please refer to the specific license depending on the subset you use
Notion of responsible usage
Non-profit (US) Research (UAE) Research (EU) Research (US) Enterprise (FR)
EleutherAI GPT-J Falcon BLOOM OpenLLaMa Mistal/Mixtral
Model 4
Access and
reuse without
restriction
3
Open with
limitations
3
Open RAIL
license
4
Access and
reuse without
restriction
4
Access and
reuse without
restriction
Dataset 3
Open with
limitations
4
Access and
reuse without
restriction
3
Open with
limitations
4
Access and
reuse without
restriction
0
No public
information or
access
Code 4
Completely
open
1
General
instructions
4
Completely
open
1 Just examples 4
Completely
open
Collaborative foundational LLMs
Dataset fuzziness: please refer to the specific license depending on the subset you use
Notion of responsible usage
Non-profit (US) Research (UAE) Research (EU) Research (US) Enterprise (FR)
EleutherAI GPT-J Falcon BLOOM OpenLLaMa Mistal/Mixtral
Model 4
Access and
reuse without
restriction
3
Open with
limitations
3
Open RAIL
license
4
Access and
reuse without
restriction
4
Access and
reuse without
restriction
Dataset 3
Open with
limitations
4
Access and
reuse without
restriction
3
Open with
limitations
4
Access and
reuse without
restriction
0
No public
information or
access
Code 4
Completely
open
1
General
instructions
4
Completely
open
1 Just examples 4
Completely
open
This license is, in part, based on the Apache License Version 2.0,
with a series of modifications. The contribution of the Apache
License 2.0 to the framing of this document is acknowledged.
Please read this license carefully, as it is different to other ‘open
access’ licenses you may have encountered previously. Use of
Falcon180B for hosted services may require a separate license.
Collaborative fine-tuned LLMs
Impact of foundational model or pre-training datasets
Enterprise (US) Enterprise (US) Enterprise (US) Consortium (UAE/US)
Dolly BLOOMChat Zephyr LLM360
Model 4 Based on GPT-J 3 Based on BLOOM 4 Based on Mistral 4 Open source
Pre-training
Dataset
3 Based on GPT-J 3 Based on BLOOM 0 Based on Mistral 4
RedPajama,
Falcon, StarCoder
Fine-tuning
Dataset
4
Access and reuse
without restriction
4 Dolly and LAION 2
Research use only
(OpenAI)
2
Research use only
(OpenAI)
Reward model 0
No public
information
available
0
No public
information
available
3
Paper and code
examples
0
No public
information
available
Code 4 Open source 3 OpenRAIL 3
Example code
available
4 Open source
Collaborative fine-tuned LLMs
Impact of foundational model or pre-training datasets
Enterprise (US) Enterprise (US) Enterprise (US) Consortium (UAE/US)
Dolly BLOOMChat Zephyr LLM360
Model 4 Based on GPT-J 3 Based on BLOOM 4 Based on Mistral 4 Open source
Pre-training
Dataset
3 Based on GPT-J 3 Based on BLOOM 0 Based on Mistral 4
RedPajama,
Falcon, StarCoder
Fine-tuning
Dataset
4
Access and reuse
without restriction
4 Dolly and LAION 2
Research use only
(OpenAI)
2
Research use only
(OpenAI)
Reward model 0
No public
information
available
0
No public
information
available
3
Paper and code
examples
0
No public
information
available
Code 4 Open source 3 OpenRAIL 3
Example code
available
4 Open source
BLOOMChat Use Restrictions
l. To provide medical advice and medical results interpretation; or
m. To generate or disseminate information for the purpose to be used for
administration of justice, law enforcement, immigration or asylum processes,
such as predicting an individual will commit fraud/crime
commitment.
Collaboration platform: Hugging Face
• Startup and ecosystem dedicated to democratizing AI
• Open source Transformers library
• LLM leaderboard: upload and assess models
• The “GitHub of AI”
• Collaborative space for exploring, sharing and experimenting AI
• Hosts thousands of models, datasets, and demo applications
Enabler for collaboration and reuse
Hosting and resource paradigms
• Big players invest billions (Microsoft/OpenAI, AWS/Anthropic)
• CSP selling shovels in the AI Gold rush
Source: numind.ai
Closed models are centralized and resource-consuming
Hosting and resource paradigms
• Democratizing AI Computing
• Quantization, AI Chips
• Run models locally, in containers
• Emergence of smaller models for edge and mobile
• Small/Tiny Language Models: Gemini nano, Microsoft Phi-2, Huawei TinyBERT
• Domain Specific Language Models: BloombergGPT, Harvey (law)
• Mixture of models: Mixtral 8x7B, OpenMoE  Mixture of licenses?
Key takeaways
• Hyper-centralization leads to black boxes and closed solutions
• Openness
• Fosters collaboration and fuels community-driven innovation
• Enables inclusivity
• Just like open source software beware of licenses and restrictions
• AI's democratization continually reshapes the landscape
Thank you
Raphaël Semeteys - Worldline
@RaphaelSemeteys
https://dev.to/raphiki
Check the two-part article co-written with Luxin Zhang
Image credits
• Opensource, Internet & GenAI evolution image generated with DALL-E
• Robot evolution from Freepik
• LLMs’ #parameters evolution from numind.ai
• Shovels in Gold rush image generated with DALL-E
• Logos from official websites
• Coffee cups from Freepik
#stateofopencon #soocon24 #openuk

Weitere ähnliche Inhalte

Ähnlich wie SOOCon24 - From OpenAI to Opensource AI: Navigating Between Commercial Ownership and Collaborative Openness

Open Source
Open SourceOpen Source
Open Source
John Gs
 
Open Source Business Case Final
Open Source Business Case FinalOpen Source Business Case Final
Open Source Business Case Final
FITT
 
Red Hat - The Open Source Model
Red Hat - The Open Source ModelRed Hat - The Open Source Model
Red Hat - The Open Source Model
helkomy
 

Ähnlich wie SOOCon24 - From OpenAI to Opensource AI: Navigating Between Commercial Ownership and Collaborative Openness (20)

OpenChain Webinar 57 - The Open Source Initiative - 2023-11-27
OpenChain Webinar 57 - The Open Source Initiative - 2023-11-27OpenChain Webinar 57 - The Open Source Initiative - 2023-11-27
OpenChain Webinar 57 - The Open Source Initiative - 2023-11-27
 
Open Source
Open SourceOpen Source
Open Source
 
Open Source And the Internet Of Things
Open Source And the Internet Of ThingsOpen Source And the Internet Of Things
Open Source And the Internet Of Things
 
Open Source Software Licenses and Business Models Explained
Open Source Software Licenses and Business Models ExplainedOpen Source Software Licenses and Business Models Explained
Open Source Software Licenses and Business Models Explained
 
Open Source Software Licenses and Business Models Explained
Open Source Software Licenses and Business Models ExplainedOpen Source Software Licenses and Business Models Explained
Open Source Software Licenses and Business Models Explained
 
"Open Source licensing and software quality" by Monty Michael Widenius @ eLib...
"Open Source licensing and software quality" by Monty Michael Widenius @ eLib..."Open Source licensing and software quality" by Monty Michael Widenius @ eLib...
"Open Source licensing and software quality" by Monty Michael Widenius @ eLib...
 
Open source
Open sourceOpen source
Open source
 
Open Source Business Case Final
Open Source Business Case FinalOpen Source Business Case Final
Open Source Business Case Final
 
Opensource
OpensourceOpensource
Opensource
 
Open Source & What It Means For Self-Sovereign Identity (SSI)
Open Source & What It Means For Self-Sovereign Identity (SSI)Open Source & What It Means For Self-Sovereign Identity (SSI)
Open Source & What It Means For Self-Sovereign Identity (SSI)
 
1 open power foundation_japan meetup - v1
1 open power foundation_japan meetup - v11 open power foundation_japan meetup - v1
1 open power foundation_japan meetup - v1
 
Open source presentation
Open source presentationOpen source presentation
Open source presentation
 
Open Source Software Development by TLV Partners
Open Source Software Development by TLV PartnersOpen Source Software Development by TLV Partners
Open Source Software Development by TLV Partners
 
Open Source All The Things
Open Source All The ThingsOpen Source All The Things
Open Source All The Things
 
Flax ovum search-across_the_enterprise
Flax ovum search-across_the_enterpriseFlax ovum search-across_the_enterprise
Flax ovum search-across_the_enterprise
 
My Seminar
My SeminarMy Seminar
My Seminar
 
201704 - An Introduction to the Symphony Software Foundation
201704 - An Introduction to the Symphony Software Foundation201704 - An Introduction to the Symphony Software Foundation
201704 - An Introduction to the Symphony Software Foundation
 
Implications of Open Source Software Use (or Let's Talk Open Source)
Implications of Open Source Software Use (or Let's Talk Open Source)Implications of Open Source Software Use (or Let's Talk Open Source)
Implications of Open Source Software Use (or Let's Talk Open Source)
 
Red Hat - The Open Source Model
Red Hat - The Open Source ModelRed Hat - The Open Source Model
Red Hat - The Open Source Model
 
Can end user apps also be open source? OW2online'20, June 2020
Can end user apps also be open source? OW2online'20, June 2020Can end user apps also be open source? OW2online'20, June 2020
Can end user apps also be open source? OW2online'20, June 2020
 

Mehr von Raphaël Semeteys

Mehr von Raphaël Semeteys (13)

2023 - Between Philosophy and Practice: Introducing Yoga
2023 - Between Philosophy and Practice: Introducing Yoga2023 - Between Philosophy and Practice: Introducing Yoga
2023 - Between Philosophy and Practice: Introducing Yoga
 
OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...
OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...
OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...
 
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3
 
Nantes JUG 2023 - Web3
Nantes JUG 2023 - Web3Nantes JUG 2023 - Web3
Nantes JUG 2023 - Web3
 
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?
 
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutions
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutionsSnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutions
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutions
 
Solution Linux 2009 - QSOS
Solution Linux 2009 - QSOSSolution Linux 2009 - QSOS
Solution Linux 2009 - QSOS
 
Solution Linux 2009 - SVG
Solution Linux 2009 - SVGSolution Linux 2009 - SVG
Solution Linux 2009 - SVG
 
Solution Linux 2009 - JavaScript
Solution Linux 2009 - JavaScriptSolution Linux 2009 - JavaScript
Solution Linux 2009 - JavaScript
 
Solutions Linux 2008 - JavaScript
Solutions Linux 2008 - JavaScriptSolutions Linux 2008 - JavaScript
Solutions Linux 2008 - JavaScript
 
Solutions Linux 2008 - Poste de travail Linux
Solutions Linux 2008 - Poste de travail LinuxSolutions Linux 2008 - Poste de travail Linux
Solutions Linux 2008 - Poste de travail Linux
 
Solutions Linux 2008 - ECOS
Solutions Linux 2008 - ECOSSolutions Linux 2008 - ECOS
Solutions Linux 2008 - ECOS
 
Solutions Linux 2007 - QSOS
Solutions Linux 2007 - QSOSSolutions Linux 2007 - QSOS
Solutions Linux 2007 - QSOS
 

Kürzlich hochgeladen

Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
mbmh111980
 

Kürzlich hochgeladen (20)

Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
 
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfImplementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purityAPVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
 
Studiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareStudiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting software
 
A Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationA Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data Migration
 
The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion Production
 
What need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java DevelopersWhat need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java Developers
 
A Guideline to Gorgias to to Re:amaze Data Migration
A Guideline to Gorgias to to Re:amaze Data MigrationA Guideline to Gorgias to to Re:amaze Data Migration
A Guideline to Gorgias to to Re:amaze Data Migration
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
 
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdfMicrosoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 

SOOCon24 - From OpenAI to Opensource AI: Navigating Between Commercial Ownership and Collaborative Openness

  • 1. From OpenAI to Open Source AI Navigating Between Commercial Ownership and Collaborative Openness https://stateofopencon.com/ #stateofopencon #soocon24 #openuk https://hachyderm.io/@openuk Raphaël Semeteys (and Luxin Zhang) - Worldline
  • 2. Introduction Raphaël Semeteys • Open source since 1997, professionally since 2004 • Yoga Teacher, Creator of the QSOS method • Head of DevRel at Worldline 7000+ engineers in over 40 countries Managing 43+ billion transactions per year €250M spent in R&D every year Handling 150+ payment methods We design payments technology that powers the growth of millions of businesses around the world
  • 3. The early days of LLMs From rule-based and simpler statistical models to LLMs 2010’s 2020’s 2017-2018 Word embeddings such as Word2Vec and GloVe “Attention is All You Need" Transformers, BERT Generative AI, ChatGPT responsibility concerns
  • 4. GenAI is having its Linux Moment • Just like open source and Internet, bust much faster! • Dynamics between collaborative openness and commercial ownership • Need of clarity on licenses Labs & Universities Individuals Enterprises Commodities
  • 5. Defining Openness of a LLM Pre-training Dataset Fine-tuning Dataset Reward Model Model Data Processing Code
  • 6. Defining Openness of a LLM Score Level Description Model (weights) Pre- training Dataset Fine- tuning Dataset Reward model Data Processing Code 0 Closed No access to any public information, data or asset 1 Published research only Research papers(s) published but with no more information, data or asset 2 Restricted access Access to asset is possible only with special agreement (commercial, research…) 3 Open with limitations Access and reuse of asset is possible but with certain limitations on usage (ex. Open RAIL) 4 Totally open Access and reuse of asset is possible without restriction (ex. open source license)
  • 7. Market-Leading Player: OpenAI Deviation from original vision of research transparency & openness Non/For-profit (US) Component Score Level description Model 4 Totally open Dataset 1 Published research only Code 1 Published research only 0 Closed  GPT-1 & 2 GPT-3 & 4 ChatGPT research paper only No training of other commercial LLMs
  • 8. Market-Leading Player: OpenAI Deviation from original vision of research transparency & openness Non/For-profit (US) Component Score Level description Model 4 Totally open Dataset 1 Published research only Code 1 Published research only 0 Closed  GPT-1 & 2 GPT-3 & 4 ChatGPT research paper only No training of other commercial LLMs You may not: […] Use Output to develop models that compete with OpenAI.
  • 9. Market-Leading Player: Google Transition from open research to proprietary commercial approach Enterprise (US) Component Score Level description Model 4 Totally open Dataset 2 Restricted access Code 4 Totally open BERT PaLM 2 & Gemini 1 Published research only 1 Published research only 0 Closed 
  • 10. Market-Leading Player: Meta Journey to openness Enterprise (US) Component Score Level description Model 4 Totally open Dataset 3 Open with limitations Code 4 Totally open RoBERTa Llama 2 3 Open with limitations 1 Published research only 1 Published research only  Restriction on usage: license for platforms with 700+ M users
  • 11. Market-Leading Player: Meta Journey to openness Enterprise (US) Component Score Level description Model 4 Totally open Dataset 3 Open with limitations Code 4 Totally open RoBERTa Llama 2 3 Open with limitations 1 Published research only 1 Published research only  Restriction on usage: license for platforms with 700+ M users Additional Commercial Terms. If, on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
  • 12. Llama offspring’s: Alpaca and Vicuna Fine-tuned models from Llama 2 by universities Research (US) Component Score Level description Model 3 Open with limitations Pre-training Dataset 1 Published research only Fine-tuning Dataset 2 Research use only Code 4 Under Apache 2 license Restrictions from both Llama 2 and OpenAI (ShareGPT)
  • 13. Collaborative foundational LLMs Dataset fuzziness: please refer to the specific license depending on the subset you use Notion of responsible usage Non-profit (US) Research (UAE) Research (EU) Research (US) Enterprise (FR) EleutherAI GPT-J Falcon BLOOM OpenLLaMa Mistal/Mixtral Model 4 Access and reuse without restriction 3 Open with limitations 3 Open RAIL license 4 Access and reuse without restriction 4 Access and reuse without restriction Dataset 3 Open with limitations 4 Access and reuse without restriction 3 Open with limitations 4 Access and reuse without restriction 0 No public information or access Code 4 Completely open 1 General instructions 4 Completely open 1 Just examples 4 Completely open
  • 14. Collaborative foundational LLMs Dataset fuzziness: please refer to the specific license depending on the subset you use Notion of responsible usage Non-profit (US) Research (UAE) Research (EU) Research (US) Enterprise (FR) EleutherAI GPT-J Falcon BLOOM OpenLLaMa Mistal/Mixtral Model 4 Access and reuse without restriction 3 Open with limitations 3 Open RAIL license 4 Access and reuse without restriction 4 Access and reuse without restriction Dataset 3 Open with limitations 4 Access and reuse without restriction 3 Open with limitations 4 Access and reuse without restriction 0 No public information or access Code 4 Completely open 1 General instructions 4 Completely open 1 Just examples 4 Completely open This license is, in part, based on the Apache License Version 2.0, with a series of modifications. The contribution of the Apache License 2.0 to the framing of this document is acknowledged. Please read this license carefully, as it is different to other ‘open access’ licenses you may have encountered previously. Use of Falcon180B for hosted services may require a separate license.
  • 15. Collaborative fine-tuned LLMs Impact of foundational model or pre-training datasets Enterprise (US) Enterprise (US) Enterprise (US) Consortium (UAE/US) Dolly BLOOMChat Zephyr LLM360 Model 4 Based on GPT-J 3 Based on BLOOM 4 Based on Mistral 4 Open source Pre-training Dataset 3 Based on GPT-J 3 Based on BLOOM 0 Based on Mistral 4 RedPajama, Falcon, StarCoder Fine-tuning Dataset 4 Access and reuse without restriction 4 Dolly and LAION 2 Research use only (OpenAI) 2 Research use only (OpenAI) Reward model 0 No public information available 0 No public information available 3 Paper and code examples 0 No public information available Code 4 Open source 3 OpenRAIL 3 Example code available 4 Open source
  • 16. Collaborative fine-tuned LLMs Impact of foundational model or pre-training datasets Enterprise (US) Enterprise (US) Enterprise (US) Consortium (UAE/US) Dolly BLOOMChat Zephyr LLM360 Model 4 Based on GPT-J 3 Based on BLOOM 4 Based on Mistral 4 Open source Pre-training Dataset 3 Based on GPT-J 3 Based on BLOOM 0 Based on Mistral 4 RedPajama, Falcon, StarCoder Fine-tuning Dataset 4 Access and reuse without restriction 4 Dolly and LAION 2 Research use only (OpenAI) 2 Research use only (OpenAI) Reward model 0 No public information available 0 No public information available 3 Paper and code examples 0 No public information available Code 4 Open source 3 OpenRAIL 3 Example code available 4 Open source BLOOMChat Use Restrictions l. To provide medical advice and medical results interpretation; or m. To generate or disseminate information for the purpose to be used for administration of justice, law enforcement, immigration or asylum processes, such as predicting an individual will commit fraud/crime commitment.
  • 17. Collaboration platform: Hugging Face • Startup and ecosystem dedicated to democratizing AI • Open source Transformers library • LLM leaderboard: upload and assess models • The “GitHub of AI” • Collaborative space for exploring, sharing and experimenting AI • Hosts thousands of models, datasets, and demo applications Enabler for collaboration and reuse
  • 18. Hosting and resource paradigms • Big players invest billions (Microsoft/OpenAI, AWS/Anthropic) • CSP selling shovels in the AI Gold rush Source: numind.ai Closed models are centralized and resource-consuming
  • 19. Hosting and resource paradigms • Democratizing AI Computing • Quantization, AI Chips • Run models locally, in containers • Emergence of smaller models for edge and mobile • Small/Tiny Language Models: Gemini nano, Microsoft Phi-2, Huawei TinyBERT • Domain Specific Language Models: BloombergGPT, Harvey (law) • Mixture of models: Mixtral 8x7B, OpenMoE  Mixture of licenses?
  • 20. Key takeaways • Hyper-centralization leads to black boxes and closed solutions • Openness • Fosters collaboration and fuels community-driven innovation • Enables inclusivity • Just like open source software beware of licenses and restrictions • AI's democratization continually reshapes the landscape
  • 21. Thank you Raphaël Semeteys - Worldline @RaphaelSemeteys https://dev.to/raphiki Check the two-part article co-written with Luxin Zhang
  • 22. Image credits • Opensource, Internet & GenAI evolution image generated with DALL-E • Robot evolution from Freepik • LLMs’ #parameters evolution from numind.ai • Shovels in Gold rush image generated with DALL-E • Logos from official websites • Coffee cups from Freepik