Data KonceptsApache's mod_rewriteData KonceptsApache's mod_rewrite |
||||
Apache's mod_rewriteLast updated: January 28, 2021. PDF version for download Apache's low cost and powerful set of features make it the server of choice around the world. One of its real treasures is the mod_rewrite module who's purpose is to redirect a visitor's request in the manner specified by a set of rules. This article will lead you through the Why, Installation and Test, Regex, RewriteCond(itions), Flags, Comments, Linking, Introduced Problems, Examples and will Summarize with the best references I've discovered. Why Redirect a URL?The simple answer is to make them human-readable (commonly called "user friendly" or "Search Engine Optimized"). URLs with query strings (the URL's text after a question mark) confuse most visitors and are difficult for them to type correctly. By changing the URL, you can make your site more "user-friendly." For example: http://www.example.com/display.php?country=USA&state=California&city=San_Diego could be changed to http://www.example.com/USA/California/San_Diego Critical Note: As the webmaster, YOU must create your links in the "new format" then create the mod_rewrite code to redirect that link to the file you wish to serve. Other possible reasons might include:
mod_rewrite has other use, too, but let's get on to the basics first. Server SetupSome hosts do not have mod_rewrite enabled (it is, by default, not enabled). You can find out if yours server has mod_rewrite enabled by using a script with the simple PHP code: phpinfo(); Look in the Apache2Handler section and, if mod_rewrite is not listed, you will have to ask your host to enable it - or find a "good host" (most hosts will have it enabled). The following will describe how to enable and test mod_rewrite on your test server. First, you will need to change the default Apache configuration (this is in Apache's httpd.conf file) by removing the "#" at the beginning of the line # LoadModule rewrite_module modules/mod_rewrite.so While you're in the httpd.conf file, be sure that you have <Directory /> You will need to RESTART Apache for these changes to take effect. Apache will now be running with mod_rewrite as you will see with another look at the phpinfo() output. TestTo be sure that you have mod_rewrite installed and working properly, here is a simple test for you: Create three files, test.html, test.php and .htaccess. test.html: <h2>This is the HTML file.</h2> and ... test.php: <h2>This is the PHP file.</h2> Create the third file, .htaccess, with the following: RewriteEngine on If you are using Notepad, you may have to save it as htaccess.txt, upload and change the name to .htaccess on the server. Upload all three files (in ASCII mode) to your server and then type: http://www.example.com/test.html in the location box - using your domain, of course! If the page shows "This is the HTML file." You have got to start over. If it shows "This is the PHP file", it is working properly! Note, please, that the test.html URL has remained in the browser's location box. Specificity (your specification)Whether you're changing from one URI to another or creating a whole new file structure (e.g., renaming all files from .html to .php or eliminating the file extension), you must create a specification for what your redirection will accomplish (and what it must NOT accomplish). To amplify, your specification needs to tell you in an unambiguous manner exactly what you want to change so mod_rewrite can match ONLY that URI and the redirection must NOT create a loop. Matching: Do you want to match/redirect EVERYTHING (or NOTHING)? If not, eliminate (.*) NOW! If you want to remain at the same depth in your directory structure (highly recommended), eliminate /'s from your regex's character set. Uppercase letters? If not, you're pretty much left to lowercase letters, digits and the dot, dash and underscore (as allowed characters - ref: Uniform Resource Identifiers (URI): Generic Syntax by Tim Berners-Lee et al). Redirection: Will your redirection loop? That's the primary problem with (.*) - although it will also pass unexpected garbage (or nothing at all). ALWAYS check that the redirection cannot be matched by the regex and, if it can, specify an exclusion. (WordPress users will know that WP redirects EVERYTHING to index.php with the exclusion that it will not redirect existing directories or files.) mod-rewrite RegexRemember that you create the "new format" URIs then the mod_rewrite code to convert that into a file request for the script you want to provide. Now we can begin with rewriting your URIs! If you are not familiar with regular expressions (regex), there are many sites which provide excellent tutorials. At the end of this article, I have listed the best pages I have found: A tutorial, a "cheat sheet," a very nice text editor with regex capabilities and a test tool for your regex. If you are not able to follow my explanations, review the first two of those links. Problem: Display city information based on the country, state and city requested. To change http://www.example.com/USA/California/San_Diego to http://www.example.com/display.php?country=USA&state=California&city=San_Diego so your display script can read and parse the query string, you will need to use regex to tell mod_rewrite what to attempt to match. Too many people just use the (.*) to select (NOTHING OR) EVERYTHING in an "atom" (an Apache variable you can create and use within mod_rewrite) and try to pass that along to the redirection string. In this case, you would need three of these atoms separated by the subdirectory slashes ("/") so the regex would become: (.*)/(.*)/(.*) Note #1: (.*) combines two metacharacters, the dot character (which means ANY character) and the * character (which specifies ZERO or MORE of the preceding character) within an atom (Apache variable created by mod_rewrite). Thus, (.*) matches EVERYTHING in the {REQUEST_URI} string ({REQUEST_URI} is that part of the URL which follows the domain up to but not including the ? of a query string and is the ONLY Apache variable that a RewriteRule can attempt to match). With the above regex, the regex engine will progress to learn that you have required two slashes (anywhere) in the string. For our purposes, though, we need to capture the three values in the {REQUEST_URI} so I've used the slashes to separate them. To tell mod_rewrite that the URI should begin and end with this string, we add the start anchor (^) and end anchor ($) so the regex becomes: ^(.*)/(.*)/(.*)$ Note #2: Apache changed regex engines when it changed versions so that Apache 1.x requires the leading slash while Apache 2.x forbids it! I satisfy both versions by making the leading slash optional, i.e., ^/? (? is the metacharacter for zero or one of the preceding character) but I'll use the Apache 2 version, the ^. This allows TOO MUCH to be sent to your query string – often a security hazard – and, when used inappropriately, WILL cause mod_rewrite to loop! To avoid unnecessary problems, I'll change the EVERYTHING atoms to specify exactly the characters I will allow. Thus, the first atom (USA) can be matched by ([A-Z]+) which ONLY allows one or more uppercase letter (the "+" metacharacter specifies one or more of the preceding character while the "*" metacharacter specifies zero or more – I want to ensure at least one character in the range from A to Z). California contains both uppercase and lowercase letters so this atom becomes ([a-zA-Z]+). San_Diego also contains an underscore (replacing the space which would display as the "ugly" %20 in the URI) so this atom becomes ([a-zA-Z_]+) and, with the {REQUEST_URI}'s starting /, we have: ^([A-Z]+) / ([a-zA-Z]+) / ([a-zA-Z_]+) $ All that would be well and good if the only country was USA but we'll need to expand the regex for other countries and allow an underline to replace the spaces in the "North," South," "West" and "New" states so the regex would expand once again to: ^([a-zA-Z_]+)/([a-zA-Z_]+)/([a-zA-Z_]+)$ Note #3: If you have a short list of allowable countries, it would be best to avoid database problems by specifying the acceptable values with regex: ^(USA|Canada|Mexico)/([a-zA-Z_]+)/([a-zA-Z_]+)$ Note #4: If you are concerned about people typing in CAPS when your database is strictly lowercase, have regex ignore the case by adding the No Case flag ("[NC]") after the redirection. Just don't forget to convert to lowercase in your script after obtaining the $_GET array! More on flags later. Note #5: Since URLs can't have spaces (except as %20), use underlines or hyphens to replace them. If you ABSOLUTELY have to use spaces (%20) in your URIs, you can include them in your regex within a range definition as \{space}, i.e., ([a-zA-Z\ ]+). However, this is NOT advised. Note #6: If you are converting to/from a database field which does contain spaces, you should convert the spaces to some other character. Using PHP, you can use $state = str_replace ( ' ', '_', $state ); before placing $country in the link and reverse the process with $state = str_replace ( '_', ' ', $state ); before matching $state to the database field. Using _'s is better than -'s because text can often include the hyphen character which would be converted to a space by this code and is better than %20 in the URI as spaces require special treatment in the regex and redirection. With the regex in hand, you can now map the atoms to the query string: display.php?country=$1&state=$2&city=$3 where display.php is the name of the script, $1 is the first (country) atom, $2 is the second (state) atom and $3 is third (city) atom. Note that there can only be nine atoms created, $1 … $9 (the tenth, $0, is the entire target string, the {REQUEST_URI}). Almost there! Open a New document with EditPad (or your text editor) and type: RewriteEngine on Note #7: The RewriteRule must go on ONE line with one space between the RewriteRule, the regex and the redirection (and before any optional flags). NotePad indiscriminately inserts line returns in long lines so you're far better off using a good text editor (see references at the end). Note #8: If you won't always have the city or the state and city, then you can easily make them optional replacing the above with: RewriteEngine on where
If the optional atoms confused you, use three separate statements. Optional atoms are NOT mandatory, just an easy way to combine several statements into one. Save this as .htaccess in the directory where display.php resides. If you want to use digits (0, 1, ... 9) for, say, Congressional Districts, then you'll need to change an atom's specification from ([a-zA-Z_]+) to ([0-9]) to signify a single digit, ([0-9]{1,2}) for one or two digits (0 through 99) or ([0-9]+) for one or more digits (0 through ...; useful for database id's). The RewriteCond(ition) StatementNow that you have learned to match mod_rewrite's basic RewriteRule(s) with the {REQUEST_URI} string, it's time to learn to use conditionals to access other variables with the RewriteCond(ition). RewriteCond is similar in format to the RewriteRule in that you have the command name, RewriteCond, a variable to be matched, the regex and flags (the logical OR flag is a useful flag to keep in mind as RewriteConds are ANDed by default). The best list of Server Variables I've found is located here. For an example, let me assume that you want to force the www in your domain name (and you don't have subdomains to be concerned with). To do this, you will need to test the Apache {HTTP_HOST} variable to see if the www. is already there and, if not, redirect. RewriteEngine on Here, to denote that {HTTP_HOST} is an Apache variable, we must prepend a %. Then, the regex says to match the logical negation of (i.e., NOT) (start anchor to match the start of the {HTTP_HOST} string) www, an escaped dot (meaning that it ONLY matches the dot character), the domain name example, another escaped dot, and com (end anchor to match the end of the {HTTP_HOST} string). The No Case flag ([NC]) is necessary because a domain name is not case sensitive. AND … The RewriteRule says to match zero or one of anything then redirect to http://www.example.com with the original {REQUEST_URI}. The R=301 tells the browser (and search engines) that this is a permanent redirection and the Last flag tells mod_rewrite that you've completed your redirection. RewriteCond statements can also create atoms via their regex but these are denoted by %1 … %9 the same way that RewriteRule atoms are $1 … $9. You'll see these in operation in the Examples. Flagsmod_rewrite uses "flags" to give your mod_rewrite code additional power. I've used the Last, Redirect and No Case flags above but the main ones you'll need to be familiar with are:
There are other flags but you can get their definitions from Apache.org's mod_rewrite documentation. mod_rewrite CommentsWhile the RewriteEngine on statement tells Apache to "start your engines," it also serves to denote mod_rewrite comments. As a good programmer, you know how important comments are in your code. mod_rewrite allows comments after a // at the beginning of a line but it also allows you to comment out an entire block of mod_rewrite code by wrapping the code in RewriteEngine off and RewriteEngine on statements: RewriteEngine off RewriteEngine statements can be very helpful when developing new mod_rewrite code – just use them as you would the /* … */ wrapper for PHP comments. WARNING: Do not use RewriteEngine statements to hide your mod_rewrite code if you don't have mod_rewrite enabled as you will get the same "500" error as if you used the "foo directive" (merely placing foo on a line in your .htaccess file). This is a mod_rewrite directive. Note: You only need ONE RewriteEngine on statement per .htaccess file (unless you also include RewriteEngine off statement(s) for commenting blocks of code). mod_rewrite Links*As a webmaster, it is for YOU to determine how your pages will be identified to visitors as well as how to rewrite those URIs so Apache can serve the appropriate content. Since nobody yet knows that you have made your links "user-friendly" (nor how you have formatted them), YOU have to create the links in your site's pages. You can use an editor (like Dreamweaver) which will perform multiple find and replace actions across your website (because you did not know about user-friendly URLs when you built it). In the example in the section above, I used countries, states and cities – items that would be unique in a database. As I build websites for clients to update themselves, it is not reasonable for me to insist that they provide unique names for all their articles so database articles are typically identified by an auto-incremented ID. That's all that's required to pick a single article out of that database! So long as you can use an unique key, you will be able to use any key in your query string. * There have been many questions about how to use a database to redirect from a title (or other field) to an ID. Unless you have access to your httpd.conf (in order to create a RewriteMap application), forget about using a database for your redirections. Instead, make the field of choice unique and use that field to create your links. The only thing to remember is that spaces appear as %20 in URLs so convert them before creating the link and back after obtaining the string in the $_GET array – the str_replace() code I offered above is perfect for this. WARNING: There are other characters which are "reserved," "unreserved" or must be "escaped." There is a rather technical article which identifies the Uniform Resource Identifiers (URI): General Syntax. Obviously, you'll need to remove or escape these characters as appropriate. Relative Links Are Missing!Sorry, you are not ready yet, though, because, when you test your user friendly URLs, they work the same as the original links except that all your CSS, javascript files and images have disappeared! You can blame mod_rewrite if you like but it is your fault as you have used URLs that tell Apache that the script is in another directory (in my example, you are considered to be in the USA/California/ subdirectory – San_Diego would be the script's name) which is two subdirectories deeper into the website than display.php! To get around this seeming "bad feature" of mod_rewrite, you can use absolute links throughout your site instead of relative links OR use HTML's <base> tag to identify the real location: <head> Note that an absolute link (with either a leading / to denote DocumentRoot OR the full URL) is required as you are trying to "fix" the problem with relative links. Problems With WordPress mod_rewriteWordPress (WP) installs their mod_rewrite code in the .htaccess in your DocumentRoot but there are problems which you need to know about. First, their latest code: # BEGIN WordPress Okay, what's wrong with this?
If you're going to use third party code, KNOW what it's doing and be sure it fits your website! ExamplesLet's get on to examples which combine these basic structures to so some useful work! Replace A CharacterOnce you've discovered that the hyphens (dashes) in your URLs are causing problems (with your regex as well as converting to and from your database fields), you'll want to change them to underscores (the underline character). The problem is that you don't know how many hyphens you have in your URLs so you'll use regex to repetitively replace the hyphen: RewriteEngine on The Next flag tells Apache to restart the mod_rewrite rules (upon successful match and redirection). Unfortunately, you'll need to do further processing to be able to use an R=301 on the resultant redirection so that others will know you've changed your URL format so do this first. Unlimited key/value pairsIf you followed the Regex section above to it's conclusion, you might guess that there is a limit to the number of key/value pairs. There is: As already explained, the number of Apache variables that can be created is nine. If you need more, however, don't despair! Using the Next flag, I've just demonstrated how to change an unlimited number of -'s to _'s. We'll now extend that to unlimited key/value pairs. RewriteEngine on This will capture a new key/value pair with the first two atoms ($1 and $2), anything "leftover" with $3 (which includes the trailing /) and redirect to the "leftover" with the redirect.php script remaining as the target with the key/value pair ADDED to any existing query string by the Query String Append flag before the process is restarted by the Next flag - the Last flag ensures that the mod_rewrite statement is terminated (not ANDed with any following statements). If you don't want to show the redirect script in the URL, you'll need to account for the final redirection another way. RewriteEngine on Here, I've captured the key and value pairs with the first two atoms again and used the third to capture anything else. Assuming that the atoms are properly paired, the result will be a query string in the DocumentRoot. Assuming that the DirectoryIndex (normally index.php or index.html) is not the target of your redirection (and does not receive a query string), the existence of a query string (as denoted by finding an = within the query string) is used as a marker to effect a redirection to the script which will handle the redirect. WARNING: Do NOT exceed 255 characters in your URI. (I recall 255 as the limit but I can't find the source to confirm.) Force www for a Domain[repeated from above] If you want to force a browser to use the full domain with the www. prefix, you will need to test the Apache {HTTP_HOST} variable to see if it already exists and, if not, redirect. RewriteEngine on If you have subdomains, however, preserve the subdomain like this: RewriteEngine on Capture the optional subdomain and, if it does not start with www., redirect with www. prepended to the subdomain and domain with the original {REQUEST_URI}. Eliminate www for a DomainGoing the other way (getting rid of the www prefix)? RewriteEngine on Get rid of the www but preserve a subdomain with: RewriteEngine on Here, the subdomain is captured in %2 (the inner atom) but, since it's optional and already captured in the %1 Apache variable, all you need is the %1 for the subdomain and domain without the leading www. Prevent Image HotlinkingIf some unscrupulous webmaster is stealing your bandwidth (leeching) by linking to images on your site to post on his: RewriteEngine on This example uses the optional list to select just GIF and JPG images – do not allow a space in that list and remember, example.com is your site! If you are upset enough at these pirates, you could change the image and feed something to let his visitors know he's hotlinking. Just don't forget to exclude the hotlinked image: RewriteEngine on Of course, these both require the visitor to have his HTTP_REFERER enabled (most browsers do by default). Block specific hotlinkers with: RewriteEngine on This blocks visitors coming from the leecher's site to view GIF and JPF files. Rather allow (or forbid) visitors from a specific IP Addresses? Use {REMOTE_ADDR} instead like: RewriteEngine on Redirect to a 404 PageIf your host doesn't provide for a "file not found" redirection, create it yourself! # you SHOULD be using This script checks to see that the requested filename does not exist and then that it is not really a directory before it redirects to the DocumentRoot's 404.php script. Extend this just a bit by including the URI in a query string by adding ?url=$1 immediately after the /404.php: RewriteEngine on Rename Your DirectoriesYou've shifted files around on your site changing directory name(s): # mod_alias can do this faster without the regex engine Note that I've included the dot character (not the "any character" metacharacter) inside the range to allow file extensions but the a-z will accept only lowercase characters. If you need uppercase, you know from above how to modify this code. Convert .html Links to .php LinksUpdating your website but need to be sure that bookmarked links will still work? RewriteEngine on This is not a permanent redirection so it will be invisible to your visitors. To make it permanent (and visible), change the flag to [R=301,L]. Obviously, this will also work for changing any file extension from one to another by changing the html and php above. Extensionless LinksNeed to make your links easier to remember or just want to hide your file types? Typically you're only using either .html or .php files so: RewriteEngine on Someone has asked about using extensionless URIs for both .html and .php files. Requiring that both php and html extensions be considered requires that you use RewriteCond statements to check whether the filename with either extension exists as a file: RewriteEngine on As in the 404 example, the -f checks for the existence of a file. Redirect TO New FormatI have fielded questions where someone wanted to redirect their real URIs to extensionless URIs so search engines would update to their new, extensionless format? Okay, Apache can do that but it can not serve scripts in the new format (they have to be redirected back to the real link!). Have I got your head spinning? I do NOT recommend this (unless you're on a dedicated server with low volume) as it requires additional processing by Apache. The key to this is the No Subrequest flag which will prevent redirection if a request has already been redirected. # Assumes "usable link" is index.php?id=alpha
Here, the original http://www.example.com/index.php?id=something has not been redirected so it is redirected to http://www.example.com/something. Then, the second RewriteRule finds the something and redirects it back to index.php which is prohibited from redirecting again by the No Subrequest flag. RewriteMapA major problem arises for a webmaster who has redesigned a website without preserving the old links. The problem is that the old links will return 404s unless a mapping from the old link to the new can be implemented. This is where the RewriteMap shines! Defined in the server or virtual host configuration files, the syntax is: RewriteMap MapName MapType:MapSource where: MapName is the name you assign, txt - A plain text file containing space-separated key-value pairs, one per line. Depending upon the number of mappings required, it's probable that a simple text file will suffice, i.e., the MapSource will be the absolute path to a text file like: Link1 NewLink1 and would be called after checking for the existence of the file or directory by: RewriteCond %{REQUEST_FILENAME} !-f where NOTFOUND is necessary to account for the file not being found in the text map. For our purposes, neither random file selection nor internal functions are appropriate. For our purposes, the dbm (binary database of key-value pairs) and dbd/fastdbd are useful as they are virtually the same as the txt file but requires the mod_auth_dbm or mod_dbd module to access. That leaves the prg (program) as an alternative to txt. In this case, the MapSource should lead to an application file (PHP or Perl) to access a database table and return the link (or a tailored 404 file) to mod_rewrite. Unfortunately, none of the above is useful to the average webmaster as they will not have access to the server or virtual host configuration file. Stop and think a moment. If an old link no longer exists, mod_rewrite can redirect to a 404 handler script which can look into a text list or database to determine whether the request was to a replaced file and select the replacement. If a replacement is found, the handler file could use a header("Location:{redirection}"); to redirect to the replacement file! I call this a "Poor Man's RewriteMap" but, in reality, it's just a smart 404 handler. A note from php.net: "Note: The HTTP status header line will always be the first sent to the client, regardless of the actual header() call being the first or not. The status may be overridden by calling header() with a new status line at any time unless the HTTP headers have already been sent." Translated, that means that the "Poor Man's RewriteMap" needs to send: header("Status: 301"); // permanent redirection vs 200 (Okay) AND this must be done before any output is made from the 404 handler. Check for Key in Query StringIf you need to have a specific key's value in your query string, you can check for its existence with RewriteCond: RewriteCond %{QUERY_STRING} !uniquekey= ... will check the {QUERY_STRING} variable for lack of the key "uniquekey" and, if the {REQUEST_URI} is the script_that_requires_uniquekey, it will redirect. If you are looking for an unique value, remove the in the RewriteCond statement. RewriteCond %{QUERY_STRING} !uniquevalue Delete the Query StringApache's mod_rewrite automatically passes-through a query string UNLESS you
Enforce Secure ServerApache can determine whether you're using a secure server in two ways: Using the {HTTPS}and {SERVER_PORT} (which is 443 for a secure server). So, these two bits will redirect to a secure server is you're not already there: RewriteEngine on OR RewriteEngine on Since the {HTTPS} variable is null when you've not requested a secure server, I use the {SERVER_PORT} option. Selective Enforce Secure Server(where the secure and unsecure domains share the DocumentRoot) This requires a RewriteCond statement to check whether the secure server port is being used and, if not AND the requested script is one in the list requiring a secure server, redirect. RewriteEngine on And, to redirect pages not requiring a secure server, RewriteEngine on will force the http ({SERVER_PORT}=80) mode. WARNING: Mixing these two (force HTTPS and HTTP at the same time) will force non-script files to be served in HTTP protocol, i.e., not encrypted. The result WILL BE a warning that some content has NOT been authenticated. To avoid the "mixing" problem of the force non-secure pages, target the scripts rather than all files: RewriteEngine on Another method utilizes a regex "trick": RewriteEngine on Explanation: I needed an explanation so here it is:
In short, this is a trick to replace on with s and similar techniques can be used in other situations, too. Since the first version is far simpler to understand, I recommend that one. Summarymod_rewrite is primarily used to allow "Search Engine Optimization" / "User-Friendly" URLs but it is an extremely flexible webmaster tool for other important redirection tasks. Reference Links
|
||||
|