Cloudera Enterprise 5.15.x | Other versions

Configuring HDFS Trash

The Hadoop trash feature helps prevent accidental deletion of files and directories. When you delete a file in HDFS, the file is not immediately expelled from HDFS. Deleted files are first moved to the /user/<username>/.Trash/Current directory, with their original filesystem path being preserved. After a user-configurable period of time (fs.trash.interval), a process known as trash checkpointing renames the Current directory to the current timestamp, that is, /user/<username>/.Trash/<timestamp>. The checkpointing process also checks the rest of the .Trash directory for any existing timestamp directories and removes them from HDFS permanently. You can restore files and directories in the trash simply by moving them to a location outside the .Trash directory.

  Important:
  • The trash feature is disabled by default. Cloudera recommends that you enable it on all production clusters.
  • The trash feature works by default only for files and directories deleted using the Hadoop shell. Files or directories deleted programmatically using other interfaces (WebHDFS or the Java APIs, for example) are not moved to trash, even if trash is enabled, unless the program has implemented a call to the trash functionality. (Hue, for example, implements trash as of CDH 4.4.)

    Users can bypass trash when deleting files using the shell by specifying the -skipTrash option to the hadoop fs -rm -r command. This can be useful when it is necessary to delete files that are too large for the user's quota.

Trash Behavior with HDFS Transparent Encryption Enabled

Starting with CDH 5.7.1, you can delete files or directories that are part of an HDFS encryption zone. As is evident from the procedure described above, moving and renaming files or directories is an important part of trash handling in HDFS. However, currently HDFS transparent encryption only supports renames within an encryption zone. To accommodate this, HDFS creates a local .Trash directory every time a new encryption zone is created. For example, when you create an encryption zone, /enc_zone, HDFS will also create the /enc_zone/.Trash/ sub-directory. Files deleted from enc_zone are moved to /enc_zone/.Trash/<username>/Current/. After the checkpoint, the Current directory is renamed to the current timestamp, /enc_zone/.Trash/<username>/<timestamp>.

If you delete the entire encryption zone, it will be moved to the .Trash directory under the user's home directory, /users/<username>/.Trash/Current/enc_zone. Trash checkpointing will occur only after the entire zone has been moved to /users/<username>/.Trash. However, if the user's home directory is already part of an encryption zone, then attempting to delete an encryption zone will fail because you cannot move or rename directories across encryption zones.

If you have upgraded your cluster to CDH 5.7.1 (or higher), and you have an encryption zone that was created before the upgrade, create the .Trash directory using the -provisionTrash option as follows:
$ hdfs crypto -provisionTrash -path /enc_zone
In CDH 5.7.0, HDFS does not automatically create the .Trash directory when an encryption zone is created. However, you can use the following commands to manually create the .Trash directory within an encryption zone. Make sure you run the commands as an admin user.
$ hdfs dfs -mkdir /enc_zone/.Trash
$ hdfs dfs -chmod 1777 /enc_zone/.Trash

Configuring HDFS Trash Using Cloudera Manager

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

Enabling and Disabling Trash

  1. Go to the HDFS service.
  2. Click the Configuration tab.
  3. Select Scope > Gateway.
  4. Select or clear the Use Trash checkbox.

    To apply this configuration property to other role groups as needed, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.

  5. Click Save Changes to commit the changes.
  6. Restart the cluster and deploy the cluster client configuration.

Setting the Trash Interval

  1. Go to the HDFS service.
  2. Click the Configuration tab.
  3. Select Scope > NameNode.
  4. Specify the Filesystem Trash Interval property, which controls the number of minutes after which a trash checkpoint directory is deleted and the number of minutes between trash checkpoints. For example, to enable trash so that deleted files are deleted after 24 hours, set the value of the Filesystem Trash Interval property to 1440.
      Note: The trash interval is measured from the point at which the files are moved to trash, not from the last time the files were modified.

    To apply this configuration property to other role groups as needed, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.

  5. Click Save Changes to commit the changes.
  6. Restart all NameNodes.

Configuring HDFS Trash Using the Command Line

Page generated May 18, 2018.