2. Strings and Regular Expressions Prepared By : Abed ElAzeem Bukhari What’s in This Chapter? . Building strings . Formatting expressions . Using regular expressions
3. Examining System.String string message1 = "Hello"; // returns "Hello" message1 += ", There"; // returns "Hello, There" string message2 = message1 + "!"; // returns "Hello, There!“ C# also allows extraction of a particular character using an indexer-like syntax: string message = "Hello"; char char4 = message[4]; // returns 'o'. Note the string is zero-indexed
6. building strings cont. The StringBuilder class has two main properties: ➤ Length , which indicates the length of the string that it actually contains ➤ Capacity , which indicates the maximum length of the string in the memory allocation StringBuilder myBuilder = new StringBuilder("Hello for all ", 100 ); myBuilder Append(“students.");
7. stringbuilder members StringBuilder sb = new StringBuilder("Hello"); Or you can create an empty StringBuilder with a given capacity: StringBuilder sb = new StringBuilder(20); // another way to set the capacity StringBuilder sb = new StringBuilder("Hello"); sb.Capacity = 100; a read-only MaxCapacity property : StringBuilder sb = new StringBuilder(100, 500 );
9. Format Strings double d = 16.45; int i = 25; Console.WriteLine("The double is {0,10:E} and the int contains {1}", d, i);
10.
11. How the string is formatted? Console.WriteLine("The double is {0,10:E} and the int contains {1}", d, i); // Likely implementation of Console.WriteLine() public void WriteLine(string format, object arg0, object arg1) { this.WriteLine(string.Format(this.FormatProvider, format, new object[]{arg0, arg1})); }
13. The formattableVector example FormattableVector.cs The format specifiers you are going to support are: - N — Should be interpreted as a request to supply a quantity known as the Norm of the Vector. This is just the sum of squares of its components, which for mathematics buffs happens to be equal to the square of the length of the Vector, and is usually displayed between double vertical bars, like this: ||34.5||. - VE — Should be interpreted as a request to display each component in scientific format, just as the specifier E applied to a double indicates (2.3E+01, 4.5E+02, 1.0E+00). - IJK — Should be interpreted as a request to display the vector in the form 23i + 450j + 1k . -Anything else should simply return the default representation of the Vector ( 23, 450, 1.0) .
14. Regular Expressions System.Text.RegularExpressions The regular expressions language is designed specifically for string processing. It contains two features: ➤ A set of escape codes for identifying specific types of characters. You will be familiar with the use of the * character to represent any substring in DOS expressions. (For example, the DOS command Dir Re* lists the files with names beginning with Re .) Regular expressions use many sequences like this to represent items such as any one character , a word break , one optional character , and so on. ➤ A system for grouping parts of substrings and intermediate results during a search operation.
15. Regular Expressions cont With regular expressions, you can perform quite sophisticated and high- level operations on strings. For example, you can: ➤ Identify (and perhaps either flag or remove) all repeated words in a string (for example, “T he computer books books ” to “ The computer books ” ) ➤ Convert all words to title case (for example, “ this is a Title ” to “ This Is A Title ” ) ➤ Convert all words longer than three characters to title case (for example, “ this is a Title ” to “ This is a Title ” ) ➤ Ensure that sentences are properly capitalized ➤ Separate the various elements of a URI (for example, given http://www.najahclub.net , extract the protocol, computer name, file name, and so on)
16. Regular Expressions Examples const string pattern = "ion"; MatchCollection myMatches = Regex.Matches(myText, pattern, RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture ); foreach (Match nextMatch in myMatches) { Console.WriteLine(nextMatch.Index); } Another example in RegularExpressions.cs
19. Regular Expressions Matches, Groups, and Captures For example, URIs have the format <protocol> ://<address> :<port> , where the port is optional . An example of this is http://www.el-bukhari.com:4355 . Suppose that you want to extract the protocol, the address, and the port from a URI, where you know that there may or may not be whitespace (but no punctuation) immediately following the URI. You could do so using this expression: (+)://([^:]+)(?::(+))?
//Encoder.cs using System; namespace Encoder { class Program { static void Main() { string greetingText = &quot;Hello from all the students at Najah. &quot;; greetingText += &quot;We do hope you enjoy this codes as much as we enjoyed writing it&quot;; for (int i = (int)'z'; i >= (int)'a'; i--) { char old1 = (char)i; char new1 = (char)(i + 1); greetingText = greetingText.Replace(old1, new1); } for (int i = (int)'Z'; i >= (int)'A'; i--) { char old1 = (char)i; char new1 = (char)(i + 1); greetingText = greetingText.Replace(old1, new1); } Console.WriteLine(&quot;Encoded:\\n&quot; + greetingText); Console.ReadLine(); } } } //Encoder2.cs using System; using System.Text; namespace Encoder2 { class Program { static void Main() { StringBuilder greetingBuilder = new StringBuilder(&quot;Hello from all the students at Najah. &quot;, 150); greetingBuilder.Append(&quot;We do hope you enjoy this codes as much as we enjoyed writing it&quot;); for(int i = (int)'z'; i>=(int)'a' ; i--) { char old1 = (char)i; char new1 = (char)(i+1); greetingBuilder = greetingBuilder.Replace(old1, new1); } for(int i = (int)'Z'; i>=(int)'A' ; i--) { char old1 = (char)i; char new1 = (char)(i+1); greetingBuilder = greetingBuilder.Replace(old1, new1); } Console.WriteLine(&quot;Encoded:\\n&quot; + greetingBuilder); Console.ReadLine(); } } }
// FormattableVector.cs using System; using System.Text; namespace Najah.ILoveCsharp.FormattableVector { class MainEntryPoint { static void Main() { Vector v1 = new Vector(1,32,5); Vector v2 = new Vector(845.4, 54.3, -7.8); Console.WriteLine(&quot;\\nIn IJK format,\\nv1 is {0,30:IJK}\\nv2 is {1,30:IJK}&quot;, v1, v2); Console.WriteLine(&quot;\\nIn default format,\\nv1 is {0,30}\\nv2 is {1,30}&quot;, v1, v2); Console.WriteLine(&quot;\\nIn VE format\\nv1 is {0,30:VE}\\nv2 is {1,30:VE}&quot;, v1, v2); Console.WriteLine(&quot;\\nNorms are:\\nv1 is {0,20:N}\\nv2 is {1,20:N}&quot;, v1, v2); } } struct Vector : IFormattable { public double x, y, z; public Vector(double x, double y, double z) { this.x = x; this.y = y; this.z = z; } public string ToString(string format, IFormatProvider formatProvider) { if (format == null) return ToString(); string formatUpper = format.ToUpper(); switch (formatUpper) { case &quot;N&quot;: return &quot;|| &quot; + Norm() + &quot; ||&quot;; case &quot;VE&quot;: return String.Format(&quot;( {0:E}, {1:E}, {2:E} )&quot;, x, y, z); case &quot;IJK&quot;: StringBuilder sb = new StringBuilder(x.ToString(), 30); sb.Append(&quot; i + &quot;); sb.Append(y.ToString()); sb.Append(&quot; j + &quot;); sb.Append(z.ToString()); sb.Append(&quot; k&quot;); return sb.ToString(); default: return ToString(); } } public double Norm() { return x*x + y*y + z*z; } public Vector(Vector rhs) { x = rhs.x; y = rhs.y; z = rhs.z; } public override string ToString() { return &quot;( &quot; + x + &quot; , &quot; + y + &quot; , &quot; + z + &quot; )&quot;; } public double this [uint i] { get { switch (i) { case 0: return x; case 1: return y; case 2: return z; default: throw new IndexOutOfRangeException( &quot;Attempt to retrieve Vector element&quot; + i) ; } } set { switch (i) { case 0: x = value; break; case 1: y = value; break; case 2: z = value; break; default: throw new IndexOutOfRangeException( &quot;Attempt to set Vector element&quot; + i) ; } } } //public static bool operator == (Vector lhs, Vector rhs) //{ // if (lhs.x == rhs.x && lhs.y == rhs.y && lhs.z == rhs.z) // return true; // return false; //} private const double Epsilon = 0.0000001; public static bool operator == (Vector lhs, Vector rhs) { if (Math.Abs(lhs.x - rhs.x) < Epsilon && Math.Abs(lhs.y - rhs.y) < Epsilon && Math.Abs(lhs.z - rhs.z) < Epsilon ) return true; return false; } public static bool operator != (Vector lhs, Vector rhs) { return ! (lhs == rhs); } public static Vector operator + (Vector lhs, Vector rhs) { Vector result = new Vector(lhs); result.x += rhs.x; result.y += rhs.y; result.z += rhs.z; return result; } public static Vector operator * (double lhs, Vector rhs) { return new Vector(lhs*rhs.x, lhs*rhs.y, lhs*rhs.z); } public static Vector operator * (Vector lhs, double rhs) { return rhs*lhs; } public static double operator * (Vector lhs, Vector rhs) { return lhs.x*rhs.x + lhs.y+rhs.y + lhs.z*rhs.z; } } } /* Results: FormattableVector In IJK format, v1 is 1 i + 32 j + 5 k v2 is 845.4 i + 54.3 j + -7.8 k In default format, v1 is ( 1, 32, 5 ) v2 is ( 845.4, 54.3, -7.8 ) In VE format v1 is ( 1.000000E+000, 3.200000E+001, 5.000000E+000 ) v2 is ( 8.454000E+002, 5.430000E+001, -7.800000E+000 ) Norms are: v1 is || 1050 || v2 is || 717710.49 || */
using System; using System.Text.RegularExpressions; namespace Najah.ILoveCsharp.RegularExpressionPlayaround { class MainEntryPoint { static void Main() { Find1(); Console.ReadLine(); } static void Find1() { const string text = @&quot;XML has made a major impact in almost every aspect of software development. Designed as an open, extensible, self-describing language, it has become the standard for data and document delivery on the web. The panoply of XML-related technologies continues to develop at breakneck speed, to enable validation, navigation, transformation, linking, querying, description, and messaging of data.&quot;; const string pattern = @&quot;\\bn\\S*ion\\b&quot;; MatchCollection matches = Regex.Matches(text, pattern, RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture); WriteMatches(text, matches); } static void Find2() { const string text = @&quot;XML has made a major impact in almost every aspect of software development. Designed as an open, extensible, self-describing language, it has become the standard for data and document delivery on the web. The panoply of XML-related technologies continues to develop at breakneck speed, to enable validation, navigation, transformation, linking, querying, description, and messaging of data.&quot;; const string pattern = @&quot;\\bn&quot;; MatchCollection matches = Regex.Matches(text, pattern, RegexOptions.IgnoreCase); WriteMatches(text, matches); } static void WriteMatches(string text, MatchCollection matches) { Console.WriteLine(&quot;Original text was: \\n\\n&quot; + text + &quot;\\n&quot;); Console.WriteLine(&quot;No. of matches: &quot; + matches.Count); foreach (Match nextMatch in matches) { int index = nextMatch.Index; string result = nextMatch.ToString(); int charsBefore = (index < 5) ? index : 5; int fromEnd = text.Length - index - result.Length; int charsAfter = (fromEnd < 5) ? fromEnd : 5; int charsToDisplay = charsBefore + charsAfter + result.Length; Console.WriteLine(&quot;Index: {0}, \\tString: {1}, \\t{2}&quot;, index, result, text.Substring(index - charsBefore, charsToDisplay)); } } } }