Weitere ähnliche Inhalte
Ähnlich wie ISCC Foundation Presentation from DDEX MRT Summit 16 Nov 2023 (20)
Kürzlich hochgeladen (20)
ISCC Foundation Presentation from DDEX MRT Summit 16 Nov 2023
- 1. ISCC – a solution to some challenges
presented by generative AI?
Sebastian Posth, ISCC Foundation
Musical Works Data and Rights Standards Implementation
Seminar
17th November 2023, Arlington
DDEX 2023
© 2023 CC-BY-SA Sebastian Posth
- 2. © 2023 CC-BY-SA Sebastian Posth
● Co-initiator of the International Standard
Content Code (ISCC)
● Co-founder and member of directorʼs board
of ISCC Foundation (NL)
● Convenor of ISO/DIS 24138 on ISCC
● Background in publishing, digital distribution
and data analytics (Bertelsmann, et al.)
● Entrepreneur and consultant on digital
innovation projects in the media industries
● Building Liccium (liccium.com) and
CreatorCredentials.com
SEBASTIAN POSTH
- 3. © 2023 CC-BY-SA Sebastian Posth
ISCC – INTRODUCTION
● ISCC originated in the German book market – addressing
a number of inefficiencies of the digital supply chain:
○ Manually identifier management, file naming
conventions, missing metadata, issues with updates,
versioning, duplicate content, etc…
● Growing relevance of a decentralised media environment
(web, platforms, user generated content)
● The ISCC is an open system for the decentralised
identification of digital media content of all media
types and formats (text, image, video, and audio)
- 4. © 2023 CC-BY-SA Sebastian Posth
ISO STAGE CODES
Goal:
Publication
Q1/2024
We are
here
- 6. © 2023 CC-BY-SA Titusz Pan
THE DNA OF YOUR DIGITAL CONTENT
ISCC:KADV5PDFXBL7HGBXFFW64KVNP6UGTUZC2CJTDBKMFYTTZPLQQVX22FI
AAAV5PDFXBL7HGBX EAASS3POFKWX7KDJ IAASOPF5OCCW7LIV
GAA5GIWQSMYYKTBO
Meta-Code Content-Code Data-Code Instance-Code
Metadata
Similarity
Content
Similarity
Data
Similarity
Data
Integrity
Integrity verifying
checksum (crypto hash)
Similarity-preserving hashes
(SIM hash)
- 8. © 2023 CC-BY-SA Sebastian Posth/Titusz Pan
WHAT IS THE ISCC (NOT)
ISCC ACR
An open identifier standard proposal Services require proprietary software
Short identifier string Not an identifier but a content identification
system
Can be generated by anyone with access
to content for all media types and formats
Often optimised for specific media types
ISCC has near duplicate content-matching
capabilities (lightweight fingerprints)
Developed for content matching, can
match and compare assets and small
chunks in great detail
Can be easily implemented in existing
applications, which ensures
interoperability across entities
Detailed fingerprints not interoperable
- 9. © 2023 CC-BY-SA Titusz Pan
WHAT IS THE ISCC (NOT)
T-034.524.680-1
ISWC (Work)
US-S1Z-20-00001
ISRC (Recording)
US-S1Z-20-00002
ISRC (Recording)
US-S1Z-20-00003
ISRC (Recording)
RMMXPR2HGBYNXE5T…
ISCC (File)
RMM362DSFHZPYS2S…
ISCC (File)
RMM6DDMSNP5NFDY2…
ISCC (File)
ISWC
Work
ISRC
Recording event
ISCC
Media asset
- 11. FAKE 😉
© 2023 CC-BY-SA Sebastian Posth https://twitter.com/spsth/status/1718591647087251519
KECX4FAQRQG2VJ3YS3G5JUWUZDMJNE7GIT67RT27H3AAFVV4NDGABUI
- 12. AAA2LO3CU6XL6ZVX Meta-Code AAAX4FAQRQG2VJ3Y 41%
EEAZNTOV2LKMRSEW Content-Code EEAZNTOU2LKMRWEW 94%
GAAYNVAI6L53MPPG Data-Code GAAZHZSE7X4M6XZ6 27%
IAAYMNLCV43P7IPI Instance-Code IAA4AAWWXRUMYAGR Unique
EXAMPLE FAKE NEWS
JPG File
KEC2LO3CU6XL6ZVXS3G5LUWUZDEJNBWUBDZPXNR542DDKYVPG372D2A KECX4FAQRQG2VJ3YS3G5JUWUZDMJNE7GIT67RT27H3AAFVV4NDGABUI
JPG File Components Similarity
© 2023 CC-BY-SA Sebastian Posth
2 bits of difference
of the 64 bit hash
Fake declaration
Original declaration
- 13. EXAMPLE FAKE NEWS
© 2023 CC-BY-SA Sebastian Posth
The clustering und matching of near-duplicate
content is possible by having
access only to the ISCC codes!
KEC37M6L6YX645BORDR3KWVUJSLJ5AU7SVSE46E56YV33YW4HQKU5PY KEC3XJ7PYYXUM5KORG2ZMGVUNSL6DSI6GT2UZMTBMYOUULIGBBFXHYQ
AAA2LO3CU6XL6ZVX Meta-Code AAAX4FAQRQG2VJ3Y 41%
EEAZNTOV2LKMRSEW Content-Code EEAZNTOU2LKMRWEW 94%
GAAYNVAI6L53MPPG Data-Code GAAZHZSE7X4M6XZ6 27%
IAAYMNLCV43P7IPI Instance-Code IAA4AAWWXRUMYAGR Unique
JPG File
JPG File Components Similarity
2 bits of difference
of the 64 bit hash
Fake declaration
Original declaration
- 14. © 2023 CC-BY-SA Sebastian Posth
SUPPORTED MEDIA TYPES/FORMATS
● TEXT
doc, docx, xls, xlsx, pptx, epub, mobi, ibooks,
html, xhtml, odt, pdf, rtf, txt, xml, json, md
● IMAGE
gif, jpg, png, tif, bmp, psd, eps, webp
● AUDIO
aif, flac, mp3, opus, ogg, wav
● VIDEO
3gp, 3g2, asf, avi, drc, flv, f4v, flu, gif, h264, mpg,
mp4, mkv, mov, ogv, rm, swf, webm, wmv
Algorithm
</>
Content
Codes
Digital Media
Asset
- 15. © 2023 CC-BY-SA Sebastian Posth
MAIN INNOVATIONS
Algorithm
</>
Content
Codes
Digital Media
Asset
● All users or machines with access to the
content to the content can generate the ISCC
from the media file – without the need for
centralised databases or registries
● With the ISCC, users or machines can confirm
the integrity of a media file or recognise and
match near-duplicate content
● Recognition is possible even when content
has been altered, manipulated or embedded
metadata, watermarks or steganographic data
have been stripped from the content!
- 16. © 2023 CC-BY-SA Sebastian Posth
PUBLIC DECLARATIONS
Public ISCC declarations allow for the
persistent binding of metadata, rights and
other information to the media asset:
● Sector-specific product and title metadata,
e.g. IPTC photo metadata, ONIX, DDEX, etc.
● Rights and licencing offerings
● Usage statistics, reporting data
● Opt-out for TDM and the use of content
as AI training data
- 17. (1) ISCC + OPT-OUT
© 2023 CC-BY-SA Sebastian Posth
● Creators and rightsholders can inseparably
bind a machine-readable opt-out
declaration to prevent their content
from being used as AI training data
(Article 4, EU DSM Directive on Copyright)
● Providers of AI applications can derive the
legal restrictions from the ISCC, and thus
respect the requirements set out by the
rightsholders
AI Training
Opt-out
- 19. (2) ISCC FOR INPUT TRANSPARENCY
© 2023 CC-BY-SA Sebastian Posth
● With the help of the ISCC, providers of AI
systems can provide lists of copyright
protected works that are/were used for
training their models
● This will allow future EU regulatory
requirements to be met
(An obligation may arise under Art. 28b 4c of
the revised EU AI Act)
- 20. (3) ISCC FOR OUTPUT TRANSPARENCY
© 2023 CC-BY-SA Sebastian Posth
● AI system providers can publicly declare
AI-generated content
● This will increase trustworthiness of the
digital media landscape
● At the same time, AI system providers can
prevent AI-generated output from being used
to train the LLM/base models
(Model Collapse, AI Entropy)
Gonzalo Martinez Ruiz De Arcaute, via
https://spectrum.ieee.org/ai-collapse
- 21. DISCUSSION OF USE CASES
© 2023 CC-BY-SA Sebastian Posth
● AI opt-out
● Anti-piracy
● Anti-counterfeit
● Reporting of sales or shares
● Using MRT + ISCC
● Using ISCC in DDEX metadata
● etc.
- 22. ISCC FOUNDATION
© 2023 CC-BY-SA Sebastian Posth
Your support will make a difference!
ISCC Foundation is a purpose-driven non-profit
organisation, dedicated to developing and
promoting of open source technology for
decentralised, digital content identification.
You can support our goals:
● with donations
● sponsored development
● testing the ISCC
● promoting the ISCC system
https://core.iscc.codes
Please contact:
https://iscc.foundation
Sebastian Posth
posth@iscc.foundation
Titusz Pan
tp@iscc.foundation