11. • split "aaa<>bb<>c<><>d" "<>"
• split' "aaa<>bb<>c<><>d" "" "<>"
• split' "aa<>bb<>c<><>d" "a" "<>"
• split' "a<>bb<>c<><>d" "aa" "<>"
1 split :: String -> String -> [String] • split' "<>bb<>c<><>d" "aaa" "<>"
2
3
str `split` pat = split' str pat ""
• "aaa" : split "bb<>c<><>d" "<>"
4 split' :: String -> String -> String -> [String]
5 split' "" _ memo = [reverse memo]
6 split' str pat memo = let (a, b) = splitAt (length pat) str in
7 ______________________if a == pat
8 _________________________then (reverse memo) : (b `split` pat)
9 _________________________else split' (tail str) pat (head str : memo)
Tuesday, October 11, 2011
12. Another approach
• Text.Parsec: v3
• Text.ParserCombinators.Parsec: v2
• Real World Haskell Parsec chapter
• csv parser
Tuesday, October 11, 2011
13. Design of split
• split "aaa<>bb<>c<><>d" "<>"
• many of
• any char except for the string of
"<>"
• that separated by "<>" or the end
of string
Tuesday, October 11, 2011
14. 1 import qualified Text.Parsec as P
2
3 str `split` pat = case P.parse (split' (P.string pat)) "split" str of
4 _______________________Right x -> x
5 split' pat = P.anyChar `P.manyTill` (P.eof P.<|> (P.try (P.lookAhead pat) >> return ())) `P.sepBy` pat
Tuesday, October 11, 2011
15. 1 import qualified Text.Parsec as P
2
3 str `split` pat = case P.parse (split' (P.string pat)) "split" str of
4 _______________________Right x -> x
5 split' pat = P.anyChar `P.manyTill` (P.eof P.<|> (P.try (P.lookAhead pat) >> return ())) `P.sepBy` pat
Any char
Except for end of the string or the pattern to separate
(without consuming text)
Tuesday, October 11, 2011
16. 1 import qualified Text.Parsec as P
2
3 main = do
4 print $ abc1 "abc" -- True
5 print $ abc1 "abcd" -- False
6 print $ abc2 "abc" -- True
7 print $ abc2 "abcd" -- False
8
9 abc1 str = str == "abc"
10 abc2 str = case P.parse (P.string "abc" >> P.eof ) "abc" str of
11 Right _ -> True
12 Left _ -> False
Tuesday, October 11, 2011
17. 1 import qualified Text.Parsec as P
2
3 main = do
4 print $ parenthMatch1 "(a (b c))" -- True
5 print $ parenthMatch1 "(a (b c)" -- False
6 print $ parenthMatch1 ")(a (b c)" -- False
7 print $ parenthMatch2 "(a (b c))" -- True
8 print $ parenthMatch2 "(a (b c)" -- False
9 print $ parenthMatch2 ")(a (b c)" -- False
10
11 parenthMatch1 str = f str 0 1 parenthMatch2 str =
12 where 2 case P.parse (f >> P.eof ) "parenthMatch" str of
13 f "" 0 = True 3 Right _ -> True
14 f "" _ = False 4 Left _ -> False
15 f ('(':xs) n = f xs (n + 1) 5 where
16 f (')':xs) 0 = False 6 f = P.many (P.noneOf "()" P.<|> g)
17 f (')':xs) n = f xs (n - 1) 7 g = do
18 f (_:xs) n = f xs n 8 P.char '('
9 f
10 P.char ')'
Tuesday, October 11, 2011
22. three types of text
• String
• ByteString
• Text
Tuesday, October 11, 2011
23. String
• [Char]
• Char: a UTF-8 character
• "aaa" is String
• List is lazy and slow
Tuesday, October 11, 2011
24. ByteString
• import Data.ByteString
• Base64
• Char8
• UTF8
• Lazy (Char8, UTF8)
• Fast. The default of snap
Tuesday, October 11, 2011
25. ByteString (cont'd)
1 {-# LANGUAGE OverloadedStrings #-}
2 import Data.ByteString.Char8 ()
3 import Data.ByteString (ByteString)
4
5 main = print ("hello" :: ByteString)
• OverloadedStrings with Char8
• Give type expliticly or use with
ByteString functions
Tuesday, October 11, 2011
26. ByteString (cont'd)
1 import Data.ByteString.UTF8 ()
2 import qualified Data.ByteString as B
3 import Codec.Binary.UTF8.String (encode)
4
5 main = B.putStrLn (B.pack $ encode " " :: B.ByteString)
Tuesday, October 11, 2011
27. Text
• import Data.Text
• import Data.Text.IO
• always UTF8
• import Data.Text.Lazy
• Fast
Tuesday, October 11, 2011
28. Text (cont'd)
1 {-# LANGUAGE OverloadedStrings #-}
2 import Data.Text (Text)
3 import qualified Data.Text.IO as T
4
5 main = T.putStrLn (" " :: Text)
• UTF-8 friendly
Tuesday, October 11, 2011
32. Attoparsec pros/cons
• Pros
• fast
• text support
• enumerator/iteratee
• Cons
• no lookAhead/notFollowedBy
Tuesday, October 11, 2011
33. Parsec and Attoparsec
1 {-# LANGUAGE OverloadedStrings #-}
1 import qualified Text.Parsec as P 2 import qualified Data.Attoparsec.Text as P
2 3
3 main = print $ abc "abc" 4 main = print $ abc "abc"
4 5
5 abc str = case P.parse f "abc" str of 6 abc str = case P.parseOnly f str of
6 Right _ -> True 7 Right _ -> True
7 Left _ -> False 8 Left _ -> False
8 f = P.string "abc" 9 f = P.string "abc"
Tuesday, October 11, 2011