Recent Representative Line

A single line of code from a large application that somehow manages to provide an almost endless insight into the pain that its maintainers face each day.

Feb 2015

How to Validate a URL

by in Representative Line on

INTERNET!There's an old joke among programmers, particularly those who have had to use regexes more often than they're comfortable with:

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
It's a seductive trap: Regexes are good at processing strings, and are more complex than your usual string-processing utilities, so it seems logical to use regexes to do advanced string-parsing. But regular expressions are not meant to do arbitrary string parsing. Regular expressions are meant to parse regular languages. Furthermore, regular expressions are notoriously hard to read, resulting in, what appears to be, a string of random characters sneezed out all over your screen. For example, consider the following that's used for parsing a valid URL:


Regex regex =new Regex(
  @"^((((H|h)(T|t)|(F|f))(T|t)(P|p)((S|s)?))\://)?(www.|[a-zA-Z0-9].)[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,6}(\:[0-9]{1,5})*(/($|[a-zA-Z0-9\.\,\;\?\'\\\+&%\$#\=~_\-]+))*$"
);