Perl script to delete millions of files in one directory

Filed under Scripting
Tagged as ,

Got a ticket today about a possible “HDD issue” due to the user unable to run du -h on a partition. I looked and figured out it was just due to a directory that had 12+ million files in it. Not exaggerating, since the box was out of production, I was able to do a `ls | wc -l` on it using screen. Not sure how long it took but the next morning I saw this after re-attaching to screen session. (Still not a clue where he got possible HDD issue, but whatever)

# ls | wc -l
12399466

After cleaning out my shorts I started looking into a way to start deleting the files, knowing full well there was no way rm files or directories was going to work without getting “list to long” error which I’m sure we all have seen time and time again.

Also tried, and actually these would work but it would take about ~2 days to remove all those files..too long.

find . -type f -delete
find . -type f | xargs rm -f

Then I found a post by Randal L. Schwartz about using a perl command, which will by pass your SHELL completely, and this is working and it’s deleting the files pretty quickly.

perl -e ‘chdir “problem_dir” or die; opendir D, “.”; while ($n = readdir D) { unlink $n }’

In relative path terms…
You will want run this perl line one level up from the directory that contains millions of files. Change “problem_dir” to whatever your directory has all the files.

In absolute path terms..
Run the perl script where ever you want, change “problem_dir” to the absolute path example, /usr/local/nas/logs/

I would think the relative path one would be a tad faster, but really don’t know. Either way, you should start noticing the file system slowly increasing via `df -k`

2 Comments

  1. Andrew says:

    Thanks for the help, had a issue on my server with millions of files.
    The perl script seems to be working as Im noticing a slow decrese in used disk space. But I think its still got hours to run before all is clean.

    *Ps you said “run this one directory below the problem directory”, that made me a little confused. Should rather say run this in the problem directory.

  2. phatdee says:

    Thanks for the comment. You are right that line was a little confusing. I updated it, and most likely made it even more confusing..haha But hopefully not.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*