Article of the day at ASP.NET (november 3rd, 2010)
FULL SOURCE CODE IS AVAILABLE AT THE BOTTOM OF THIS ARTICLE.
PLEASE READ ALSO MY NEWER POST ABOUT AN UPDATED IMPLEMENTATION.
In the last few days I had a bad surprise: most of the visits to my CodeGolem blog were actually Referrer Spamming Bots. 
Many bloggers suffer from Referrer Spam...
Well, I hope this article will help them to permanently stop referrer spam... or at least bloggers using a .NET® powered blog engine 
This is the overall idea for how to achieve this:
- Referrer Spamming Bot requests a page out of our site
- A HttpModule analyses the request and, if needed, performs a redirect to a Bot-Challange HttpHandler
- The HttpHandler performs a two-step Challange based on javascript (Bots rarely execute client scripts)
- On success, the client is redirected to the originally requested URL. If Challange fails, a 403 (Forbidden) HTTP Status Code terminates the request and the client IP address in marked as BAD
Implementing the whole thing is much simpler than it sounds.
This is the code I used for the HttpModule:
public class SpamBlockerModule : IHttpModule
{
#region IHttpModule Members
public void Dispose()
{
}
public void Init(HttpApplication context)
{
context.BeginRequest += new EventHandler(context_BeginRequest);
}
#endregion
/// <summary>
/// Terminates the current request and return a 403 HTTP Status Code (Forbidden)
/// </summary>
public static void TerminateRequest()
{
HttpContext.Current.Response.StatusCode = 403;
HttpContext.Current.Response.End();
}
/// <summary>
/// Inserts a requested URL in a Guid-Based Cache key
/// </summary>
/// <returns>Generated key</returns>
public static string SetRequestedUrl()
{
string key = Guid.NewGuid().ToString();
HttpContext.Current.Cache.Insert(key,
HttpContext.Current.Request.Url.ToString(),
null,
System.Web.Caching.Cache.NoAbsoluteExpiration,
TimeSpan.FromMinutes(1));
return key;
}
/// <summary>
/// Gets a requested URL from Cache
/// </summary>
/// <returns></returns>
public static string GetRequestedUrl()
{
string requestedurl = HttpContext.Current.Cache[HttpContext.Current.Request.QueryString["key"]] as String;
return requestedurl;
}
/// <summary>
/// Handles referral requests
/// </summary>
/// <param name="sender"></param>
/// <param name="e"></param>
void context_BeginRequest(object sender, EventArgs e)
{
HttpContext context = ((HttpApplication)sender).Context;
// no action for requests without referrer, whose referrer is in the current domain, or requests to the BotTrap.ashx file
if (context.Request.UrlReferrer == null || context.Request.UrlReferrer.Authority.ToLower() == context.Request.Url.Authority.ToLower() || Path.GetFileName(context.Request.FilePath).ToLower() == "bottrap.ashx")
return;
// refuses requests from BAD IPs
if (isBadIp())
{
TerminateRequest();
}
else
{
// flags the current IP as BAD
SetBadIp();
// stores requested URL in cache
string key = SetRequestedUrl();
// redirects to the BotTrap handler
context.Response.Redirect("~/BotTrap.ashx?key=" + key, true); }
}
This HttpModule analyses each request.
It performs no action if the incoming request has no referrer, if the domain of the referrer URL is the same as the current domain, or if the client is requesting the Bot-Challange Handler ("BotTrap.ashx").
It then checks if the requesting client's IP address is already flagged as BAD. If so, it terminates the current request and returns a 403 Http status code (Forbidden).
The original requested URL is stored in a short-timed (1 minute) Guid-Based Cache key. The generated key is then passed in querystring to the handler.
At this point, you can notice ALL IP addresses are flagged as BAD before redirecting to the Challange HttpHandler. Only clients that pass the challange will be unflagged and redirected to the original URL.
These are the HttpModule's methods used to check and flag/unflag bad IPs:
/// <summary>
/// Get a unique cache key for the UserHostAddress from the HttpContext
/// </summary>
/// <returns>Cache key</returns>
private static string getCacheKey()
{
return string.Format("ReferralSpamBlockerBadIP_{0}", HttpContext.Current.Request.UserHostAddress);
}
/// <summary>
/// Verifies if the UserHostAddress from the HttpContext is marked as BAD
/// </summary>
/// <returns>Boolean value</returns>
bool isBadIp()
{
return HttpContext.Current.Cache[getCacheKey()] != null;
}
/// <summary>
/// Marks the UserHostAddress from the HttpContext as BAD
/// </summary>
public static void SetBadIp()
{
HttpContext.Current.Cache.Insert(getCacheKey(),
true,
null,
System.Web.Caching.Cache.NoAbsoluteExpiration,
TimeSpan.FromMinutes(10));
}
/// <summary>
/// Removes the BAD flag from the UserHostAddress in the HttpContext
/// </summary>
public static void UnsetBadIp()
{
HttpContext.Current.Cache.Remove(getCacheKey());
}
To store and check bad IPs I used ASP.NET Cache with a timeout of 10 minutes, but you can customize your HttpModule to use different timeout or to store bad IPs in FileSystem or a DataBase if you prefer.
You could also want to implement some logging logic to keep track of the spammer's URLs.
Now let's give a look to the challange HttpHandler.
public class BotTrapHandler : IHttpHandler
{
#region IHttpHandler Members
public bool IsReusable
{
get { return false; }
}
public void ProcessRequest(HttpContext context)
{
// sets no cacheability
context.Response.Cache.SetNoStore();
context.Response.Cache.SetCacheability(HttpCacheability.NoCache);
// sets current step
int step = 1;
if (context.Request.QueryString["steptwo"] != null)
step = 2;
switch (step)
{
case 1:
// javascript redirect to second step
context.Response.Write(string.Format("<html><head><script type='text/javascript'>function redirect() {{ window.location = 'BotTrap.ashx?steptwo=true&key={0}'; }}</script><body onmouseover='redirect()'>Redirecting to the requested URL. <a href='javascript:redirect()'>Click here</a> if you are not redirected automatically.</body></html>",
context.Request.QueryString["key"]));
break;
case 2:
string requestedUrl = SpamBlockerModule.GetRequestedUrl();
if (requestedUrl == null)
SpamBlockerModule.TerminateRequest();
else
{
// unflags BAD IP
SpamBlockerModule.UnsetBadIp();
// redirects to originally requested url
context.Response.Redirect(requestedUrl);
}
break;
}
}
#endregion
}
Its implementation is divided in two steps.
In the first step, it simply renders a small HTML markup with a link and a javascript to perform a redirect to itself. Notice a user-performed action is needed to proceed: redirect is fired by a click on the link, or a mouseover event on the page.
In the second step, it verifies if the requested Guid exists in cache, gets the originally requested URL out of the cache key, then performs a server-side redirect to the original URL.
This two-step process ensures that the client is javascript enabled, that it's not a bot and that it follows all redirects as expected. Storing the original URL in server-side cache ensures the client is going through all the steps and is not trying to directly request the HttpHandler with tampered querystring values.
Most bots are catched in the trap. Their IP is flagged as BAD and subsequent requests coming from those IPs are immediatly refused.
Now, all we have to do is register the HttpModule and the HttpHandler in the web.config file in our site:
<system.webServer>
<modules>
<add name="ReferrerSpamBlocker" type="CodeGolem.ReferrerSpamBlocker.SpamBlockerModule, CodeGolem.ReferrerSpamBlocker"/>
</modules>
<handlers>
<add name="BotTrapHandler" verb="GET" path="BotTrap.ashx" type="CodeGolem.ReferrerSpamBlocker.BotTrapHandler, CodeGolem.ReferrerSpamBlocker"/>
</handlers>
</system.webServer>
I am testing the whole thing in my own CodeGolem Blog, and it seems working fine.
If you came here from an external referrer, you could have noticed the redirection page... well, now you know why it was there!
Hope my Anti-Referrer-Spam-Bot-Trap is useful to all you ASP.NET powered bloggers!
Any feedbacks or comments are welcome!
Happy spam-free blogging! 
Here you can download full source code for the Anti-Referrer-Spam-Bot-Trap implementation:
CodeGolem.ReferrerSpamBlocker.zip (4.27 kb)