Skip to content
Peter M Howard (Contact Options) ::

Peter Howard is Wintermute, mythologist

The site of a film student and geek from Sydney, Australia. Most of the content on the site is arranged under ?bits, which you can navigate by post, month, or category. You may want to subscribe to the Atom feed.

wintermute :: bits

You are currently viewing an individual bits entry. You can navigate individual entries through time, or navigate the archives via the options at the bottom of the page

« Site Updates :: let's go out tonight »

Converting Relative to Absolute links in PHP (preg_replace)

It took a while, but I managed to get my head around the Regex syntax to change relative links (in href or src attributes) to absolute ones (which I had to do for the RSS feed)… Requires the following line (in PHP):

$str=preg_replace('#(href|src)="([^:"]*)("|(?:(?:%20|\s|\+)[^"]*"))#','$1="http://wintermute.com.au/$2$3',$str);

Explanations, from left to right:

  • the preg_replace function returns a string (in this case; it could also return an array), I already had the contents I wanted in $str, and am returning back to it
  • preg_replace takes three arguments: find,replace,item
  • The find string appears complicated: it is housed between the ‘# … #’; Where there are characters outside of brackets it matches those exactly
  • Inside parentheses it finds a match and records it, the first match comes through as item $1, then $2, &c; (?:…) finds the match but doesn’t return an item
  • (href|src) means I’m searching for the href or src attribute, both of which often contain relative URIs
  • ([^:”]*) means I want to match any character that ISN’T a colon or a quotemark; I use the colon as it is always present in absolute URIs, which will fail the match at this stage and be passed over
  • The final bit gets messy, as on my site I have some relative links that use the colon; they only occur as part of a ‘postdate’, which I know contains a %20 or a space (or a +) prior to the timestamp. The pipe (|) acts as an ‘or’, so I’m checking for either a closing quote, or the presence of a space followed by any number of characters and a final closing quote. This allows any relative links to match even if (and as long as) there is a colon (after the space)
  • The second argument recomposes the component parts of the relative URI, argument $1 replaces the href|src part, then I string the website address; $2 is the midsection, leading up to either the final quote or a space; $3 is either the final quote or the space and the rest of the URI, including any stray colons and the final quote

It took me a while to find any information like this on the internet so I’m posting this hint here. There’s plenty on REGEX (a search on regex syntax will pick up lots), but I struggled to find anything on rewriting links in this context… It was also made more difficult by the timestamps using colons, but though ugly, the solution works elegantly.

Update: I’m seeing this entry get a lot of search engine hits, so in order that it might be a little more useful, I’m going to add some simplified code.

Still read the above section as it helps to get an idea of what’s going on, BUT, if you know that the relative link you’re rewriting has NO colons (:) in it, there’s a simpler piece of code you can use:

$str=preg_replace('#(href|src)="([^:"]*)(?:")#','$1="http://wintermute.com.au/$2"',$str);

Of course, if your relative link MAY have colons in it, and you know nothing about their position in the string, both these codes are useless. Further, any quotes (“) in a relative link MUST be encoded or the preg_replace string will stop processing. I have the whole end bit checking for spaces there because I KNOW that a space will occur before any colons in my own relative links. YMMV, and of course, I recommend testing the strings thoroughly before relying on it.

Update: Months later I’ve gone through and fixed my internal URIs to use a ‘+’ instead of a space, which wasn’t entirely proper; I updated the first string accordingly

« Site Updates :: let's go out tonight »

Related [webprog]

photos :: recent albums
photos :: random