Converting Relative to Absolute links in PHP (preg_replace)
21Sep2005/16Sep2006 [webprog]
It took a while, but I managed to get my head around the Regex syntax to change relative links (in href or src attributes) to absolute ones (which I had to do for the RSS feed)... Requires the following line (in PHP):
$str=preg_replace('#(href|src)="([^:"]*)("|(?:(?:%20|\s|\+)[^"]*"))#','$1="http://wintermute.com.au/$2$3',$str);
Explanations, from left to right:
- the preg_replace function returns a string (in this case; it could also return an array), I already had the contents I wanted in $str, and am returning back to it
- preg_replace takes three arguments: find,replace,item
- The find string appears complicated: it is housed between the '# ... #'; Where there are characters outside of brackets it matches those exactly
- Inside parentheses it finds a match and records it, the first match comes through as item $1, then $2, &c; (?:...) finds the match but doesn't return an item
- (href|src) means I'm searching for the href or src attribute, both of which often contain relative URIs
- ([^:"]*) means I want to match any character that ISN'T a colon or a quotemark; I use the colon as it is always present in absolute URIs, which will fail the match at this stage and be passed over
- The final bit gets messy, as on my site I have some relative links that use the colon; they only occur as part of a 'postdate', which I know contains a %20 or a space (or a +) prior to the timestamp. The pipe (|) acts as an 'or', so I'm checking for either a closing quote, or the presence of a space followed by any number of characters and a final closing quote. This allows any relative links to match even if (and as long as) there is a colon (after the space)
- The second argument recomposes the component parts of the relative URI, argument $1 replaces the href|src part, then I string the website address; $2 is the midsection, leading up to either the final quote or a space; $3 is either the final quote or the space and the rest of the URI, including any stray colons and the final quote
It took me a while to find any information like this on the internet so I'm posting this hint here. There's plenty on REGEX (a search on regex syntax will pick up lots), but I struggled to find anything on rewriting links in this context... It was also made more difficult by the timestamps using colons, but though ugly, the solution works elegantly.
Update: I'm seeing this entry get a lot of search engine hits, so in order that it might be a little more useful, I'm going to add some simplified code.
Still read the above section as it helps to get an idea of what's going on, BUT, if you know that the relative link you're rewriting has NO colons (:) in it, there's a simpler piece of code you can use:
$str=preg_replace('#(href|src)="([^:"]*)(?:")#','$1="http://wintermute.com.au/$2"',$str);
Of course, if your relative link MAY have colons in it, and you know nothing about their position in the string, both these codes are useless. Further, any quotes (") in a relative link MUST be encoded or the preg_replace string will stop processing. I have the whole end bit checking for spaces there because I KNOW that a space will occur before any colons in my own relative links. YMMV, and of course, I recommend testing the strings thoroughly before relying on it.
Update: Months later I've gone through and fixed my internal URIs to use a '+' instead of a space, which wasn't entirely proper; I updated the first string accordingly
« let’s go out tonight :: Search Terms half-working »
Related [webprog]
- None of these seem to get in the way (17May2010)
- IE6 and 7 on OSX (20Sep2008)
In which I run IE 6 and 7 on OSX; the entire process using free software - Random on Windows (15May2007)
In which I am baffled by the mystery that is Windows, and once more wooed by the ease with which one can make Python work, even there - Now ETagging Right! (17Apr2007)
In which I stop fighting the framework and let Django handle ETags the right way - Semantics and Style :: Markup for Dialogue (30Dec2006)
- Winter with Django (02Dec2006)
- Drop that double-u-double-u-double-u-dot (08Nov2005)
- Blog/Search (07Oct2005)
- Converting Relative to Absolute links in PHP (preg_replace) (21Sep2005)
- Think You’re Fighting Spam? (14Sep2005)
- BBC backstage access (12May2005)
- funky google trick (28Sep2004)
- The name of this class, ‘abc’, conflicts with the name of another class that was loaded, ‘abc’ (28Jun2004)
- The Altar Of Reason (10Jun2004)
- Domain.com.au Feedback (17May2004)
- aligned with evil (11May2004)
- flash gallery thingy (08Apr2004)