LVM with Hadoop
We will see how to use the concept of LVM (Logical Volume Management) with Hadoop cluster to provide elasticity to the storage shared by datanode.
knowledge of creating basic Hadoop cluster is required.
What we want to achieve,
◾ Integrating LVM with Hadoop and providing Elasticity to DataNode Storage.
We are performing this task on virtual machine in RHEL8 operating system.
I attached a disk of size 8 GiB (/dev/sdb here)
We are creating two primary partitions in the attached disk each of size 3 GiB, to enter the disk use fdisk command
fdisk /dev/sdb
when creating new partition it by default selects primary type, and type +3G in last sector for creating partition of 3 GiB,
So we have now two storage devices sdb1 and sdb2.
Now, creating physical volume (pv) using both the devices. ( We could also use complete /dev/sdb device and add more storage devices for creating physical volume)
pvcreate /dev/sdb1
pvcreate /dev/sdb2
pvdisplay command is used to see available physical volumes,
A volume group is created using available physical volumes, So we are creating a volume group ( with name ‘mygroup’ ) using the two partitions created earlier. Since each pv is of size 3GiB, the total size of volume group becomes 6 GiB.
vgcreate mygroup /dev/sdb1 /dev/sdb2
use vgdisplay command to see volume groups.
Now, we have created a volume group so we can create a logical volume (it’s like a partition). Here I created a logical volume of size 2 GiB (from ‘mygroup’ vg in which 6 GiB is available so four is remaining after this) and named this lv as ‘mylv’.
lvcreate --size 2G --name mylv mygroup
use lvdisplay command to see all logical volumes,
Now when we have created a logical volume we have to format it before mounting,
mkfs.ext4 /dev/mygroup/mylv
Let’s mount this logical volume to the shared directory of the datanode in the hadoop cluster (here it is /dn), so the size of this directory becomes size of the logical volume.
mount /dev/mygroup/mylv /dn
After that starting the datanode service ,
hadoop-daemon.sh start datanode
So, if we check the storage contributed by datanode, it will be around 2 GiB.
hadoop dfsadmin -report
Now, our datanode is running while sharing 2 GiB of storage. We will increase this storage to 4 GiB on the fly. Means we will add more 2 GiB from the vg to this lv (obviously vg should have that much space and lv can take space from only one vg),
Using lvextend command,
lvextend --size +2G /dev/mygroup/mylv
After increasing the size we have to format the added part of the storage without losing the previous data, use resize2fs command for this.
resize2fs /dev/mygroup/mylv
When we again check the storage contribution of datanode, it will now become 4 GiB
Finally, we were able to change the storage contribution of datanode to the hadoop cluster by integrating it with the concept of LVM.