Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Introduction   HTML parser choiceHTML5::Sanitizer interna HTML5::Sanitizer usage             Conclusion         HTML5::San...
Introduction                      HTML parser choice                   HTML5::Sanitizer interna                    HTML5::...
Introduction                    HTML parser choice      Task: WYSIWYG editor                 HTML5::Sanitizer interna   Te...
Introduction                   HTML parser choice      Task: WYSIWYG editor                HTML5::Sanitizer interna   Team...
Introduction                   HTML parser choice      Task: WYSIWYG editor                HTML5::Sanitizer interna   Team...
Introduction                   HTML parser choice      Task: WYSIWYG editor                HTML5::Sanitizer interna   Team...
Introduction                  HTML parser choice      Task: WYSIWYG editor               HTML5::Sanitizer interna   Team  ...
Introduction                  HTML parser choice      Task: WYSIWYG editor               HTML5::Sanitizer interna   Team  ...
Introduction                      HTML parser choice      CPAN modules                   HTML5::Sanitizer interna   Evalua...
Introduction                  HTML parser choice      CPAN modules               HTML5::Sanitizer interna   Evaluation    ...
Introduction   HTML parser choice      CPAN modulesHTML5::Sanitizer interna   Evaluation HTML5::Sanitizer usage    Final d...
Introduction             HTML parser choice      CPAN modules          HTML5::Sanitizer interna   Evaluation           HTM...
Introduction             HTML parser choice      CPAN modules          HTML5::Sanitizer interna   Evaluation           HTM...
Introduction             HTML parser choice      CPAN modules          HTML5::Sanitizer interna   Evaluation           HTM...
Introduction             HTML parser choice      CPAN modules          HTML5::Sanitizer interna   Evaluation           HTM...
Introduction                                              Processing Phases                      HTML parser choice       ...
Introduction                                            Processing Phases                    HTML parser choice           ...
Introduction                                            Processing Phases                    HTML parser choice           ...
Introduction                                            Processing Phases                    HTML parser choice           ...
Introduction                                            Processing Phases                    HTML parser choice           ...
Introduction                                              Processing Phases                      HTML parser choice       ...
Introduction                                              Processing Phases                      HTML parser choice       ...
Introduction                                         Processing Phases                 HTML parser choice                 ...
Introduction                                          Processing Phases                  HTML parser choice               ...
Introduction                                           Processing Phases                   HTML parser choice             ...
Introduction                                           Processing Phases                   HTML parser choice             ...
Introduction                                           Processing Phases                   HTML parser choice             ...
Introduction                                             Processing Phases                     HTML parser choice         ...
Introduction                                              Usage                      HTML parser choice                   ...
Introduction                                              Usage                      HTML parser choice                   ...
Introduction                                          Usage                  HTML parser choice                           ...
Introduction                                           Usage                   HTML parser choice                         ...
Introduction                                           Usage                   HTML parser choice                         ...
Introduction                                            Usage                    HTML parser choice                       ...
Introduction                                            Usage                    HTML parser choice                       ...
Introduction                                            Usage                    HTML parser choice                       ...
Introduction                                            Usage                    HTML parser choice                       ...
Introduction                                             Usage                     HTML parser choice                     ...
Introduction                                           Usage                   HTML parser choice                         ...
Introduction                                              Usage                      HTML parser choice                   ...
Introduction                                              Usage                      HTML parser choice                   ...
Introduction                                                 Usage                         HTML parser choice             ...
Introduction                                                 Usage                         HTML parser choice             ...
Introduction                   HTML parser choice                HTML5::Sanitizer interna                 HTML5::Sanitizer...
Introduction                    HTML parser choice                 HTML5::Sanitizer interna                  HTML5::Saniti...
Introduction                    HTML parser choice                 HTML5::Sanitizer interna                  HTML5::Saniti...
Introduction                HTML parser choice             HTML5::Sanitizer interna              HTML5::Sanitizer usage   ...
Nächste SlideShare
Wird geladen in …5
×

Sanitizing HTML 5 with Perl 5

5.545 Aufrufe

Veröffentlicht am

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Sanitizing HTML 5 with Perl 5

  1. 1. Introduction HTML parser choiceHTML5::Sanitizer interna HTML5::Sanitizer usage Conclusion HTML5::Sanitizer Sanitizing HTML 5 with Perl 5 Uwe Voelker XING AG August 16th 2011 Uwe Voelker HTML5::Sanitizer
  2. 2. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage Conclusion1 Introduction2 HTML parser choice3 HTML5::Sanitizer interna4 HTML5::Sanitizer usage5 Conclusion Uwe Voelker HTML5::Sanitizer
  3. 3. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example Conclusion1 Introduction Task: WYSIWYG editor Team Live example2 HTML parser choice3 HTML5::Sanitizer interna4 HTML5::Sanitizer usage5 Conclusion Uwe Voelker HTML5::Sanitizer
  4. 4. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example ConclusionTask: WYSIWYG editor integrate WYSIWYG editor in XING frontend architect researched open source solutions Uwe Voelker HTML5::Sanitizer
  5. 5. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example ConclusionTask: WYSIWYG editor integrate WYSIWYG editor in XING frontend architect researched open source solutions none was suited, mostly for security reasons decision was made, to build it inhouse Uwe Voelker HTML5::Sanitizer
  6. 6. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example ConclusionTask: WYSIWYG editor integrate WYSIWYG editor in XING frontend architect researched open source solutions none was suited, mostly for security reasons decision was made, to build it inhouse goals: secure, share profiles (allowed tags) between frontend and backend Uwe Voelker HTML5::Sanitizer
  7. 7. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example ConclusionTeam Christopher Blum Ingo Chao Uwe Voelker Javascript QA (HTML5/CSS) Perl Uwe Voelker HTML5::Sanitizer
  8. 8. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example ConclusionLive example Uwe Voelker HTML5::Sanitizer
  9. 9. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion1 Introduction2 HTML parser choice CPAN modules Evaluation Final decision3 HTML5::Sanitizer interna4 HTML5::Sanitizer usage5 Conclusion Uwe Voelker HTML5::Sanitizer
  10. 10. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision ConclusionHTML parser on CPAN HTML::Parser HTML::TreeBuilder HTML::TreeBuilder::LibXML XML::LibXML HTML::HTML5::Parser Marpa::HTML ... Uwe Voelker HTML5::Sanitizer
  11. 11. Introduction HTML parser choice CPAN modulesHTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion Uwe Voelker HTML5::Sanitizer
  12. 12. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusionstarted with HTML::HTML5::Parser (HH5P)because it understands semantic of HTML 5 tags Uwe Voelker HTML5::Sanitizer
  13. 13. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusionstarted with HTML::HTML5::Parser (HH5P)because it understands semantic of HTML 5 tagsbut it also did this: http://example.com/?section=2&copy=3&lang=en Uwe Voelker HTML5::Sanitizer
  14. 14. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusionstarted with HTML::HTML5::Parser (HH5P)because it understands semantic of HTML 5 tagsbut it also did this: http://example.com/?section=2&copy=3&lang=en http://example.com/?section=2©=3&lang=en Uwe Voelker HTML5::Sanitizer
  15. 15. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusionstarted with HTML::HTML5::Parser (HH5P)because it understands semantic of HTML 5 tagsbut it also did this: http://example.com/?section=2&copy=3&lang=en http://example.com/?section=2©=3&lang=enfinal choice: XML::LibXML Uwe Voelker HTML5::Sanitizer
  16. 16. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion1 Introduction2 HTML parser choice3 HTML5::Sanitizer interna Processing Phases Parsing Converting Writing4 HTML5::Sanitizer usage5 Conclusion Uwe Voelker HTML5::Sanitizer
  17. 17. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing ConclusionProcessing phases preprocessing (e. g. migration) Uwe Voelker HTML5::Sanitizer
  18. 18. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing ConclusionProcessing phases preprocessing (e. g. migration) parsing (HTML → DOM tree) Uwe Voelker HTML5::Sanitizer
  19. 19. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing ConclusionProcessing phases preprocessing (e. g. migration) parsing (HTML → DOM tree) converting (rebuild tree according to profile) Uwe Voelker HTML5::Sanitizer
  20. 20. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing ConclusionProcessing phases preprocessing (e. g. migration) parsing (HTML → DOM tree) converting (rebuild tree according to profile) writing (DOM tree → HTML) Uwe Voelker HTML5::Sanitizer
  21. 21. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing ConclusionParsing HTML with XML::LibXML use XML : : LibXML ; my $ p a r s e r = XML : : LibXML−>new ( encoding => ’UTF−8 ’ , recover => 2 , keep blanks => 1 , no cdata => 1 , expand entities => 1 , no network => 1 , suppress errors => 1 , s u p p r e s s w a r n i n g s => 1 , ); Uwe Voelker HTML5::Sanitizer
  22. 22. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing ConclusionParsing HTML with XML::LibXML my $doc = $ p a r s e r −>p a r s e h t m l s t r i n g ( $html , { no cdata => 1 , suppress errors => 1 , s u p p r e s s w a r n i n g s => 1 , }, ); Uwe Voelker HTML5::Sanitizer
  23. 23. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing ConclusionConverting - rebuilding DOM tree loop through every node (only ELEMENT and TEXT) Uwe Voelker HTML5::Sanitizer
  24. 24. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing ConclusionConverting - rebuilding DOM tree loop through every node (only ELEMENT and TEXT) drop unwanted elements completely (e. g. <script>) change unknown elements to <span> Uwe Voelker HTML5::Sanitizer
  25. 25. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing ConclusionConverting - rebuilding DOM tree loop through every node (only ELEMENT and TEXT) drop unwanted elements completely (e. g. <script>) change unknown elements to <span> eventually change tag name (profile) transform (or copy) attributes Uwe Voelker HTML5::Sanitizer
  26. 26. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing ConclusionConverting - rebuilding DOM tree loop through every node (only ELEMENT and TEXT) drop unwanted elements completely (e. g. <script>) change unknown elements to <span> eventually change tag name (profile) transform (or copy) attributes proceed recursively with child nodes Uwe Voelker HTML5::Sanitizer
  27. 27. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing ConclusionWriting HTML mainly for additional escapes could not find a nice way to integrate this in XML::LibXML Uwe Voelker HTML5::Sanitizer
  28. 28. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing ConclusionWriting HTML mainly for additional escapes could not find a nice way to integrate this in XML::LibXML $text =˜ s/&/&amp ; / g ; $text =˜ s / ’ /'/g;# ’ $text =˜ s /”/&q u o t ; / g;#” $text =˜ s/</& l t ; / g ; $text =˜ s/>/&g t ; / g ; $text =˜ s / ‘/&#9 6 ; / g ; $text =˜ s /{/&#1 2 3 ; / g ; $text =˜ s /}/&#1 2 5 ; / g ; Uwe Voelker HTML5::Sanitizer
  29. 29. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion1 Introduction2 HTML parser choice3 HTML5::Sanitizer interna4 HTML5::Sanitizer usage Usage Profile Examples Debugging5 Conclusion Uwe Voelker HTML5::Sanitizer
  30. 30. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging ConclusionUsage # construct object my $ s a n i t i z e r = HTML5 : : S a n i t i z e r −>new ( p r o f i l e => ’My : : P r o f i l e ’ , ); # c a l l process () my $ c l e a n = $ s a n i t i z e r −>p r o c e s s ( $html ) ; Uwe Voelker HTML5::Sanitizer
  31. 31. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging ConclusionProfile you have to build your own Uwe Voelker HTML5::Sanitizer
  32. 32. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging ConclusionProfile you have to build your own class with just one method: element($tag) return undef or a hashref with: Uwe Voelker HTML5::Sanitizer
  33. 33. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging ConclusionProfile you have to build your own class with just one method: element($tag) return undef or a hashref with: remove remove complete sub tree (boolean) rename tag rename tag (string) set attributes set these attributes (hashref) check attributes check/transform these attributes (hashref) set class set class (string) add class add class from other attributes (hashref) Uwe Voelker HTML5::Sanitizer
  34. 34. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging ConclusionExamples - script completely remove <script> (including all children) Uwe Voelker HTML5::Sanitizer
  35. 35. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging ConclusionExamples - script completely remove <script> (including all children) { remove => 1 , } Uwe Voelker HTML5::Sanitizer
  36. 36. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging ConclusionExamples - script completely remove <script> (including all children) { remove => 1 , } otherwise it would be converted to <span> and all children processed recursively Uwe Voelker HTML5::Sanitizer
  37. 37. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging ConclusionExamples - big <big> → <span class=”big”> Uwe Voelker HTML5::Sanitizer
  38. 38. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging ConclusionExamples - big <big> → <span class=”big”> { r e n a m e t a g => ’ s p a n ’ , s e t c l a s s => ’ b i g ’ , } Uwe Voelker HTML5::Sanitizer
  39. 39. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging ConclusionExamples - a add rel=”nofollow” and target=” blank” to every link Uwe Voelker HTML5::Sanitizer
  40. 40. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging ConclusionExamples - a add rel=”nofollow” and target=” blank” to every link { s e t a t t r i b u t e s => { rel => ’ n o f o l l o w ’ , t a r g e t => ’ b l a n k ’ , }, } Uwe Voelker HTML5::Sanitizer
  41. 41. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging ConclusionExamples - font r e n a m e t a g => ’ s p a n ’ , a d d c l a s s => { s i z e => ’ s i z e f o n t ’ } , Uwe Voelker HTML5::Sanitizer
  42. 42. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging ConclusionExamples - font r e n a m e t a g => ’ s p a n ’ , a d d c l a s s => { s i z e => ’ s i z e f o n t ’ } , sub c l a s s s i z e f o n t { my ( $ s e l f , $ v a l ) = @ ; return unless $val ; r e t u r n ’ s i z e −xx−l a r g e ’ i f $ v a l eq ’ 7 ’ ; # ... r e t u r n ’ s i z e −xx−s m a l l ’ i f $ v a l eq ’ 1 ’ ; r e t u r n ’ s i z e −l a r g e r ’ i f $ v a l =˜ /ˆ+/; r e t u r n ’ s i z e −s m a l l e r ’ i f $ v a l =˜ /ˆ −/; return ; } Uwe Voelker HTML5::Sanitizer
  43. 43. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging ConclusionDebugging if the result is not as expected, you can access intermediate results: my $ r e s = $ s a n i t i z e r −>p r o c e s s ( $html , { r e t u r n r e s u l t # s e e HTML5 : : S a n i t i z e r : : R e s u l t s a y $ r e s −>i n p u t ; s a y $ r e s −>p r e p r o c e s s e d ; s a y $ r e s −>p a r s e d d o c −>t o S t r i n g ; s a y $ r e s −>c o n v e r t e d d o c −>t o S t r i n g ; s a y $ r e s −>o u t p u t ; p r i n t $ r e s −>d e b u g o u t p u t ; Uwe Voelker HTML5::Sanitizer
  44. 44. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage ConclusionRepositories HTML5::Sanitizer (backend) http://github.com/xing/html5-sanitizer Uwe Voelker HTML5::Sanitizer
  45. 45. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage ConclusionRepositories HTML5::Sanitizer (backend) http://github.com/xing/html5-sanitizer wysihtml5 (javascript frontend) http://github.com/xing/wysihtml5 Uwe Voelker HTML5::Sanitizer
  46. 46. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage ConclusionRepositories HTML5::Sanitizer (backend) http://github.com/xing/html5-sanitizer wysihtml5 (javascript frontend) http://github.com/xing/wysihtml5 Feedback? uwe@uwevoelker.de Uwe Voelker HTML5::Sanitizer
  47. 47. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage ConclusionQuestions? Uwe Voelker HTML5::Sanitizer

×