Regex vs. string.IndexOf

time to read 2 min | 237 words

I send a piece of code to Justin, which dealt with doing some simple text parsing. His comment was:

text.Substring(lastIndex, currentIndex - lastIndex);

Dude, Regex, dude!

This code reminds me of when I wrote an XML parser in ASP3

The reason that I used IndexOf there was performance, this piece of code is in the critical path, and I don't think that Regex would give me much there. But Justin said that compiled Regex is more efficient than IndexOf, so I decided to check it.

Here is my quick perf test:

static void Main(string[] args)
{
	string testStr = "select foo, bar, x, y, z, 5 from Items";
	int count = 500000;
	DateTime start = DateTime.Now;
	for (int i = 0; i < count; i++)
	{
		int last = 0, current = 0;
		while ((current = testStr.IndexOf(',', current)) != -1)
		{
			string x = testStr.Substring(last, current - last);
			current = last = current + 1;
		}
	}
	Console.WriteLine(DateTime.Now -start);
	start = DateTime.Now;
	Regex r = new Regex(",", RegexOptions.Compiled);
	for (int i = 0; i < count; i++)
	{
		int last = 0, current = 0;
		Match match = r.Match(testStr, current);
		while (match.Success)
		{
			current = match.Index;
			string y = testStr.Substring(last, current - last);
			current = last = current + 1;
			match = r.Match(testStr, current);
		}
	}
	Console.WriteLine(DateTime.Now - start);
}

The results, on my machine:

String.IndexOf:   00.2343750
Compiled Regex: 01.4687500