3. Requirements
•
Parse a string. Convert all occurrences of HTML
escape characters into their Unicode equivalent
•
"If you see '<' convert it to '<'"
4. How Google Did It
static HTMLEscapeMap gAsciiHTMLEscapeMap[] = {
// A.2.2. Special characters
{ @""", 34 },
{ @"&", 38 },
{ @"'", 39 },
{ @"<", 60 },
...
{ @"♥", 9829 },
{ @"♦", 9830 }
};
https://code.google.com/p/google-toolbox-for-mac/source/browse/trunk/Foundation/GTMNSString
%2BHTML.m
5. How Google Did It
for (unsigned i = 0; i < sizeof(gAsciiHTMLEscapeMap) /
sizeof(HTMLEscapeMap); ++i) {
if ([escapeString
isEqualToString:gAsciiHTMLEscapeMap[i].escapeSequence]) {
[finalString replaceCharactersInRange:escapeRange withString:
[NSString stringWithCharacters:&gAsciiHTMLEscapeMap[i].uchar length:1]];
break;
}
}
7. “flex is a tool for generating scanners. A
scanner is a program which recognizes
lexical patterns in text. The flex program
[looks for a] description of a scanner to
generate. The description is in the form of
pairs of regular expressions and C code,
called rules. flex generates as output a C
source file”
Lexical Analysis With Flex Introduction
http://flex.sourceforge.net/manual/Introduction.html#Introduction
10. Main loop
while ((expression = WSLlex(scanner))) {
switch (expression) {
case WSL_ENTITY_NOMATCH:
[output appendFormat:@"%@", [NSString stringWithCString:WSLget_text(scanner)
encoding:NSISOLatin1StringEncoding]];
break;
case WSL_ENTITY_NUMBER:
expression = atoi(&WSLget_text(scanner)[2]);
// fall through so expression is added to string
default:
[output appendFormat:@"%C", (unsigned short) expression];
break;
}
}
12. Benefits
•
Right tool for the right job
•
Consistent performance
•
Xcode knows about Flex
(with some caveats) so
simple to integrate
•
Flex has various flags to
optimise performance, for
example -Cf is much faster
but uses lots more memory
13. Further information
•
WSLHTMLEntities is on GitHub
(https://github.com/sdarlington/WSLHTMLEntities
)
•
Flex documentation
(http://flex.sourceforge.net/manual/)
•
"Introduction to Compiling Techniques," J P
Bennett