Hadoop 2.x Administration Cookbook
上QQ阅读APP看书,第一时间看更新

Configuring HDFS block size

Getting ready

To step through the recipes in this chapter, make sure you have completed the recipes in Chapter 1, Hadoop Architecture and Deployment or at least understand the basic Hadoop cluster setup.

How to do it...

  1. ssh to the master node, which is Namenode, and navigate to the directory where Hadoop is installed. In the previous chapter, Hadoop was installed at /opt/cluster/hadoop:
    $ ssh root@10.0.0.4
    
  2. Change to the Hadoop user, or any other user that is running Hadoop, by using the following:
    $ sudo su - hadoop
    
  3. Edit the hdfs-site.xml file and modify the parameter to reflect the changes, as shown in the following screenshot:
    How to do it...
  4. dfs.blocksize is the parameter that decides on the value of the HDFS block size. The unit is bytes and the default value is 64 MB in Hadoop 1 and 128 MB in Hadoop 2. The block size can be configured according to the need.
  5. Once the changes are made to hdfs-site.xml, copy the file across all nodes in the cluster.
  6. Then restart the Namenode and datanode daemons on all nodes.
  7. The block size can be configured per file by specifying it during the copy process, as shown in the following screenshot:
    How to do it...

How it works...

The best practice is to keep the configurations the same across all nodes in the cluster, but it is not mandatory. For example, the block size of Namenode can be different from that of the edge node. In that case, the parameters on the source node will be effective. It means that the parameter on the node from which the copying is done will be in effect.