
Recycle or trash bin configuration
There will also be cases where we need to restore an accidently deleted file or directory. This may be due to a user error or some archiving policy that cleans data periodically.
For such situations, we can configure the recycle bin so that the deleted files can be restored for a specified amount of time. In this recipe, we will see that this can be configured.
Getting ready
This recipe shows the steps needed to edit the configuration file and add new parameters to the file to enable trash in the Hadoop cluster.
How to do it...
- ssh to Namenode and edit the
core-site.xml
file to add the following property to it:<property> <name>fs.trash.interval</name> <value>10080</value> </property>
- The
fs.trash.interval
parameter defines the time in minutes after which the checkpoint will be deleted. - Restart the
namenode
daemon for the property to take effect:$ hadoop-daemons.sh stop namenode $ hadoop-daemons.sh start namenode
- Once trash is enabled, delete any unimportant files, as shown in the following screenshot. You will see a different message--rather than saying
deleted
, it saysmoved to trash
: - The deleted file can be restored by using the following command:
$ hadoop fs -cp /user/hadoop/.Trash/Current/input/new.txt /input/
How it works...
Any deleted data is moved to the .Trash
directory under the home of the user who executed the command. Every time the check pointer runs, it creates a new checkpoint out of current and removes any checkpoints created more than fs.trash.interval
minutes ago.
There's more...
In addition to the preceding method, there is a fs.trash.checkpoint.interval
parameter that defines the number of minutes between checkpoints.