SlideShare a Scribd company logo
1 of 13
Overview of Character Encoding Duy Lam – Dec 2010
Agenda Character Encoding Unicode Encoding problem 2
Character encoding
Definition Character Encoding (character set - charset, character map or code page) is a system to specify: Set of codes (natural numbers or electrical pulses) that represents for characters How to persist characters (such as “hello”) onto disk as a sequence of bytes 4
Common Encodings 5
Unicode
Life was perfect 7 a = 01100001  ấ = ???????? ä = ???????? a ?ä
Unicode Unicode is a computing industry standard to map every known character to a number (code point) Unicode is one character set that can be encoded several different ways. Common Unicode encoding methods (Unicode Transformation Format and Universal Character Set): UTF-8 (one to four bytes): maximized compatibility with ASCII UTF-16 (UCS-2): variable-width encoding (one or two 16-bit code unit) UTF-32 (UCS-4): fixed-width encoding 8
Unicode mapping table Unicode charts 9
Encoding problem
Application Missing understanding 11 UTF-16 encoding UTF-8 encoding UTF-8 encoding
Demo
End

More Related Content

What's hot

ASCII data formats
ASCII data formatsASCII data formats
ASCII data formatsikram niaz
 
Fundamentals of Programming Constructs.pptx
Fundamentals of  Programming Constructs.pptxFundamentals of  Programming Constructs.pptx
Fundamentals of Programming Constructs.pptxvijayapraba1
 
Digital Logic & Design (DLD) presentation
Digital Logic & Design (DLD) presentationDigital Logic & Design (DLD) presentation
Digital Logic & Design (DLD) presentationfoyez ahammad
 
Asynchronous Data Transfer.pptx
Asynchronous Data Transfer.pptxAsynchronous Data Transfer.pptx
Asynchronous Data Transfer.pptxArunaDevi63
 
Lecture ascii and ebcdic codes
Lecture ascii and ebcdic codesLecture ascii and ebcdic codes
Lecture ascii and ebcdic codesYazdan Yousafzai
 
Subnet Masks
Subnet MasksSubnet Masks
Subnet Masksswascher
 
Assembly Language and microprocessor
Assembly Language and microprocessorAssembly Language and microprocessor
Assembly Language and microprocessorKhaled Sany
 
String functions and operations
String functions and operations String functions and operations
String functions and operations Mudasir Syed
 
Digital System Design Basics
Digital System Design BasicsDigital System Design Basics
Digital System Design Basicsanishgoel
 
Assembly level language
Assembly level languageAssembly level language
Assembly level languagePDFSHARE
 

What's hot (20)

ipv4 (internet protocol version 4)
  ipv4 (internet protocol version 4)     ipv4 (internet protocol version 4)
ipv4 (internet protocol version 4)
 
ASCII data formats
ASCII data formatsASCII data formats
ASCII data formats
 
Fundamentals of Programming Constructs.pptx
Fundamentals of  Programming Constructs.pptxFundamentals of  Programming Constructs.pptx
Fundamentals of Programming Constructs.pptx
 
06 floating point
06 floating point06 floating point
06 floating point
 
Ascii codes
Ascii codesAscii codes
Ascii codes
 
Ppt of socket
Ppt of socketPpt of socket
Ppt of socket
 
Digital Logic & Design (DLD) presentation
Digital Logic & Design (DLD) presentationDigital Logic & Design (DLD) presentation
Digital Logic & Design (DLD) presentation
 
IP Routing
IP RoutingIP Routing
IP Routing
 
Asynchronous Data Transfer.pptx
Asynchronous Data Transfer.pptxAsynchronous Data Transfer.pptx
Asynchronous Data Transfer.pptx
 
Lecture ascii and ebcdic codes
Lecture ascii and ebcdic codesLecture ascii and ebcdic codes
Lecture ascii and ebcdic codes
 
Rs 232 interface
Rs 232 interfaceRs 232 interface
Rs 232 interface
 
Array Of Pointers
Array Of PointersArray Of Pointers
Array Of Pointers
 
Subnet Masks
Subnet MasksSubnet Masks
Subnet Masks
 
Assembly Language and microprocessor
Assembly Language and microprocessorAssembly Language and microprocessor
Assembly Language and microprocessor
 
Lecture 37
Lecture 37Lecture 37
Lecture 37
 
String functions and operations
String functions and operations String functions and operations
String functions and operations
 
Unicode
UnicodeUnicode
Unicode
 
Ipv4 presentation
Ipv4 presentationIpv4 presentation
Ipv4 presentation
 
Digital System Design Basics
Digital System Design BasicsDigital System Design Basics
Digital System Design Basics
 
Assembly level language
Assembly level languageAssembly level language
Assembly level language
 

Viewers also liked

Building Single-page Web Applications with AngularJS @ TechCamp Sai Gon 2014
Building Single-page Web Applications with AngularJS @ TechCamp Sai Gon 2014Building Single-page Web Applications with AngularJS @ TechCamp Sai Gon 2014
Building Single-page Web Applications with AngularJS @ TechCamp Sai Gon 2014Duy Lâm
 
KMS TechCon 2014 - Interesting in JavaScript
KMS TechCon 2014 - Interesting in JavaScriptKMS TechCon 2014 - Interesting in JavaScript
KMS TechCon 2014 - Interesting in JavaScriptDuy Lâm
 
Amazon Web Services
Amazon Web ServicesAmazon Web Services
Amazon Web ServicesDuy Lâm
 
Refactoring group 1 - chapter 3,4,6
Refactoring   group 1 - chapter 3,4,6Refactoring   group 1 - chapter 3,4,6
Refactoring group 1 - chapter 3,4,6Duy Lâm
 
Advantages of Cassandra's masterless architecture
Advantages of Cassandra's masterless architectureAdvantages of Cassandra's masterless architecture
Advantages of Cassandra's masterless architectureDuy Lâm
 

Viewers also liked (6)

Building Single-page Web Applications with AngularJS @ TechCamp Sai Gon 2014
Building Single-page Web Applications with AngularJS @ TechCamp Sai Gon 2014Building Single-page Web Applications with AngularJS @ TechCamp Sai Gon 2014
Building Single-page Web Applications with AngularJS @ TechCamp Sai Gon 2014
 
Mocha
Mocha Mocha
Mocha
 
KMS TechCon 2014 - Interesting in JavaScript
KMS TechCon 2014 - Interesting in JavaScriptKMS TechCon 2014 - Interesting in JavaScript
KMS TechCon 2014 - Interesting in JavaScript
 
Amazon Web Services
Amazon Web ServicesAmazon Web Services
Amazon Web Services
 
Refactoring group 1 - chapter 3,4,6
Refactoring   group 1 - chapter 3,4,6Refactoring   group 1 - chapter 3,4,6
Refactoring group 1 - chapter 3,4,6
 
Advantages of Cassandra's masterless architecture
Advantages of Cassandra's masterless architectureAdvantages of Cassandra's masterless architecture
Advantages of Cassandra's masterless architecture
 

Similar to Overview of character encoding

Comprehasive Exam - IT
Comprehasive Exam - ITComprehasive Exam - IT
Comprehasive Exam - ITguest6ddfb98
 
Data encryption and tokenization for international unicode
Data encryption and tokenization for international unicodeData encryption and tokenization for international unicode
Data encryption and tokenization for international unicodeUlf Mattsson
 
Camomile : A Unicode library for OCaml
Camomile : A Unicode library for OCamlCamomile : A Unicode library for OCaml
Camomile : A Unicode library for OCamlYamagata Yoriyuki
 
Abap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfilesAbap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfilesMilind Patil
 
Character encoding and unicode format
Character encoding and unicode formatCharacter encoding and unicode format
Character encoding and unicode formatAdityaSharma1452
 
Data Representation in Computers
Data Representation in ComputersData Representation in Computers
Data Representation in ComputersCBAKhan
 
Encodings - Ruby 1.8 and Ruby 1.9
Encodings - Ruby 1.8 and Ruby 1.9Encodings - Ruby 1.8 and Ruby 1.9
Encodings - Ruby 1.8 and Ruby 1.9Dimelo R&D Team
 
Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...Ulf Mattsson
 
Unicode and character sets
Unicode and character setsUnicode and character sets
Unicode and character setsrenchenyu
 
Computer Systems Organization
Computer Systems OrganizationComputer Systems Organization
Computer Systems OrganizationLiEdo
 
Unicode - Hacking The International Character System
Unicode - Hacking The International Character SystemUnicode - Hacking The International Character System
Unicode - Hacking The International Character SystemWebsecurify
 
E-Business Suite 1 _ Jim Pang _ The anatomy of multiple language support (MLS...
E-Business Suite 1 _ Jim Pang _ The anatomy of multiple language support (MLS...E-Business Suite 1 _ Jim Pang _ The anatomy of multiple language support (MLS...
E-Business Suite 1 _ Jim Pang _ The anatomy of multiple language support (MLS...InSync2011
 

Similar to Overview of character encoding (20)

Comprehasive Exam - IT
Comprehasive Exam - ITComprehasive Exam - IT
Comprehasive Exam - IT
 
Data encryption and tokenization for international unicode
Data encryption and tokenization for international unicodeData encryption and tokenization for international unicode
Data encryption and tokenization for international unicode
 
Camomile : A Unicode library for OCaml
Camomile : A Unicode library for OCamlCamomile : A Unicode library for OCaml
Camomile : A Unicode library for OCaml
 
Abap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfilesAbap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfiles
 
Character encoding and unicode format
Character encoding and unicode formatCharacter encoding and unicode format
Character encoding and unicode format
 
Data Representation in Computers
Data Representation in ComputersData Representation in Computers
Data Representation in Computers
 
Notes on a Standard: Unicode
Notes on a Standard: UnicodeNotes on a Standard: Unicode
Notes on a Standard: Unicode
 
Character Sets
Character SetsCharacter Sets
Character Sets
 
Uncdtalk
UncdtalkUncdtalk
Uncdtalk
 
Journey of Bsdconv
Journey of BsdconvJourney of Bsdconv
Journey of Bsdconv
 
Unicode basics in python
Unicode basics in pythonUnicode basics in python
Unicode basics in python
 
Encodings - Ruby 1.8 and Ruby 1.9
Encodings - Ruby 1.8 and Ruby 1.9Encodings - Ruby 1.8 and Ruby 1.9
Encodings - Ruby 1.8 and Ruby 1.9
 
Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...
 
Unicode and character sets
Unicode and character setsUnicode and character sets
Unicode and character sets
 
Io
IoIo
Io
 
Computer Systems Organization
Computer Systems OrganizationComputer Systems Organization
Computer Systems Organization
 
Unicode - Hacking The International Character System
Unicode - Hacking The International Character SystemUnicode - Hacking The International Character System
Unicode - Hacking The International Character System
 
Unit iv
Unit ivUnit iv
Unit iv
 
E-Business Suite 1 _ Jim Pang _ The anatomy of multiple language support (MLS...
E-Business Suite 1 _ Jim Pang _ The anatomy of multiple language support (MLS...E-Business Suite 1 _ Jim Pang _ The anatomy of multiple language support (MLS...
E-Business Suite 1 _ Jim Pang _ The anatomy of multiple language support (MLS...
 
Unicode 101
Unicode 101Unicode 101
Unicode 101
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 

Overview of character encoding

Editor's Notes

  1. ASCII encoding, which specifies how to store English characters in a single byte each (taking up the space in 0-127, leaving 128-255 empty)Microsoft code page is used in pre-Windows NT systems (Windows NT is a family of operating systems produced by Microsoft, the first version of which was released in July 1993. NT was the first fully 32-bit version of Windows, (Windows 3.1x and Windows 9x, were 16-bit/32-bit hybrids). Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Home Server, Windows Server 2008 and Windows 7 are based on Windows NT, although they are not branded as Windows NT)Reference:http://www.nadcomm.com/fiveunit/fiveunits.htm
  2. In the beginning of computer age, ASCII covers everything you would find on an English keyboard: letters in upper and lower case, numbers, and some common symbols. There was even some room left in the 128 character ASCII mapping for some control character sequences. But the entire world can't quite get by on just these characters. Need a encoding system to help encode characters in languages There are many characters in different existing Chinese, Japanese and Korean (CJK) character sets actually represent the same character. Need an effort to identify them
  3. A character has to be stored in computer as some number. Unicode tries to unify characters from different encodings that represent the same character. For instance, the A in ASCII, the A in ISO-8859-1, and the A in the Japanese encoding SHIFT-JIS all map to the same Unicode character.A character set and a character encoding aren't necessarily the same thing. Unicode is one character set, and has multiple character encodingsThe UTF-8 is most efficient for Strings containing mostly ASCII characters (inWestern countries). UTF-8 and UTF-16 are approximately equivalent for Strings containing mostly characters outside ASCII but inside the the BMP (characters for almost all modern languages, and a large number of special characters). For Strings containing mostly characters outside the BMP, UTF-8, UTF-16, and UTF-32 are approximately equivalent.
  4. Go to Start Menu > Accessories > System Tools > Character MapsUse this tool show characters and their number in Unicode charts