I would do the following:
Using a hash also lets you merge the same image uploaded multiple times.
Ответ Джо Беды почти идеален, но, пожалуйста, обратите внимание, что MD5, как было доказано, может столкнуться с конфликтами через iirc 2 часа на ноутбуке?
Тем не менее, если вы действительно будете использовать файл MD5-хеширование описанным способом, Ваш сервис станет уязвимым для атак. Как будет выглядеть атака?
Кто-то говорит: тогда давайте не будем перезаписывать. Затем, если можно предсказать, что кто-то что-то загрузит (например, может быть загружено популярное изображение в сети), можно сначала взять его "хеш-место". Пользователь будет счастлив, загрузив изображение котенка, он обнаружит, что он на самом деле выглядит как (используйте свое воображение здесь). Я говорю: используйте SHA1, поскольку за 127 лет iirc было доказано, что кластер из 10 000 компьютеров может его взломать?
to expand upon Joe Beda's approach:
if you care about grouping or finding files by user, original filename, upload date, photo-taken-on date (EXIF), etc., store this metadata in a database and use the appropriate queries to pick out the appropriate files.
Use the database primary key — whether a file hash, or an autoincrementing number — to locate files among a fixed set of directories (alternatively, use a fixed maximum-number-of-files N per directory, and when you fill up go to the next one, e.g. the kth photo should be stored at {somepath}/aaaaaa/bbbb.jpg
where aaaaaa = floor(k/N), formatted as decimal or hex, and bbbb = mod(k,N), formatted as decimal or hex. If that's too flat a hierarchy for you, use something like {somepath}/aa/bb/cc/dd/ee.jpg
)
Don't expose the directory structure directly to your users. If they are using web browsers to access your server via HTTP, give them a url like www.myserver.com/images/{primary key} and encode the proper filetype in the Content-Type header.
What I used for another requirement but which can fit your needs is to use a simple convention.
Increment by 1 and get the length of the new number, and then prefix with this number.
For example:
Assume 'a' is a var which is set with the last id.
a = 564;
++a;
prefix = length(a);
id = prefix + a; // 3565
Then, you can use a timestamp for the directory, using this convention:
20092305 (yyyymmdd)
Then you can explode your path like this:
2009/23/05/3565.jpg
(or more)
It's interesting because you can keep a sort order by date, and by number at the same time (sometimes useful) And you can still decompose your path in more directories
Here are two functions I wrote a while back for exactly this situation. They've been in use for over a year on a site with thousands of members, each of which has lots of files.
In essence, the idea is to use the last digits of each member's unique database ID to calculate a directory structure, with a unique directory for everyone. Using the last digits, rather than the first, ensures a more even spread of directories. A separate directory for each member means maintenance tasks are a lot simpler, plus you can see where's people's stuff is (as in visually).
// checks for member-directories & creates them if required
function member_dirs($user_id) {
$user_id = sanitize_var($user_id);
$last_pos = strlen($user_id);
$dir_1_pos = $last_pos - 1;
$dir_2_pos = $last_pos - 2;
$dir_3_pos = $last_pos - 3;
$dir_1 = substr($user_id, $dir_1_pos, $last_pos);
$dir_2 = substr($user_id, $dir_2_pos, $last_pos);
$dir_3 = substr($user_id, $dir_3_pos, $last_pos);
$user_dir[0] = $GLOBALS['site_path'] . "files/members/" . $dir_1 . "/";
$user_dir[1] = $user_dir[0] . $dir_2 . "/";
$user_dir[2] = $user_dir[1] . $dir_3 . "/";
$user_dir[3] = $user_dir[2] . $user_id . "/";
$user_dir[4] = $user_dir[3] . "sml/";
$user_dir[5] = $user_dir[3] . "lrg/";
foreach ($user_dir as $this_dir) {
if (!is_dir($this_dir)) { // directory doesn't exist
if (!mkdir($this_dir, 0777)) { // attempt to make it with read, write, execute permissions
return false; // bug out if it can't be created
}
}
}
// if we've got to here all directories exist or have been created so all good
return true;
}
// accompanying function to above
function make_path_from_id($user_id) {
$user_id = sanitize_var($user_id);
$last_pos = strlen($user_id);
$dir_1_pos = $last_pos - 1;
$dir_2_pos = $last_pos - 2;
$dir_3_pos = $last_pos - 3;
$dir_1 = substr($user_id, $dir_1_pos, $last_pos);
$dir_2 = substr($user_id, $dir_2_pos, $last_pos);
$dir_3 = substr($user_id, $dir_3_pos, $last_pos);
$user_path = "files/members/" . $dir_1 . "/" . $dir_2 . "/" . $dir_3 . "/" . $user_id . "/";
return $user_path;
}
sanitize_var() is a supporting function for scrubbing input & ensuring it's numeric, $GLOBALS['site_path'] is the absolute path for the server. Hopefully, they'll be self-explanatory otherwise.