Storing URLs in SQL Server

Using SQL Server I want to store a list of URLs in a table. In addition I have the requirement that I do not want any URL to appear in the table more that once.

This would suggest that I would like to make the URL the primary key for the table but this is not possible in SQL Server because of the length of URLs. SQL Server has a constraint that the maximum length of a character field that can be indexed is 900 characters while URLs according to the spec are potentially unlimited and as a practical matter IE supports URLs up to 2k so 900 is just too short.

My next thought is to use the HashBytes function to create a hash of the URL to use as a column to be indexed. In this case the potential exists that two different URLs might hash to the same value (unlikely but possible) so I can not use a unique index.

The bulk of the processing against this table will be inserts which is the performance I wist to optimize for.

My thought is to have a URL column and a Hashvalue column and create a non-unique index on the Hashvalue.

Then I would create a Trigger for Insert which would rollback the insert if the inserted Hashvale = an existing Hashvalue and the Inserted URL = an existing URL. My hope is that the query optimizer would use the index to first find the record(s) where the Hashvalues match and then not have to do a full table scan to try and match the URL.

Am I on the right track here or is there a better way to go about this?

6
задан JonnyBoats 24 August 2010 в 22:45
поделиться