LVM with Hadoop

Vikas Verma
Level Up Coding
Published in
4 min readNov 17, 2020

--

We will see how to use the concept of LVM (Logical Volume Management) with Hadoop cluster to provide elasticity to the storage shared by datanode.

knowledge of creating basic Hadoop cluster is required.

What we want to achieve,

◾ Integrating LVM with Hadoop and providing Elasticity to DataNode Storage.

We are performing this task on virtual machine in RHEL8 operating system.

I attached a disk of size 8 GiB (/dev/sdb here)

We are creating two primary partitions in the attached disk each of size 3 GiB, to enter the disk use fdisk command

fdisk /dev/sdb

when creating new partition it by default selects primary type, and type +3G in last sector for creating partition of 3 GiB,

So we have now two storage devices sdb1 and sdb2.

Now, creating physical volume (pv) using both the devices. ( We could also use complete /dev/sdb device and add more storage devices for creating physical volume)

pvcreate /dev/sdb1
pvcreate /dev/sdb2

pvdisplay command is used to see available physical volumes,

A volume group is created using available physical volumes, So we are creating a volume group ( with name ‘mygroup’ ) using the two partitions created earlier. Since each pv is of size 3GiB, the total size of volume group becomes 6 GiB.

vgcreate mygroup /dev/sdb1 /dev/sdb2

use vgdisplay command to see volume groups.

Now, we have created a volume group so we can create a logical volume (it’s like a partition). Here I created a logical volume of size 2 GiB (from ‘mygroup’ vg in which 6 GiB is available so four is remaining after this) and named this lv as ‘mylv’.

lvcreate --size 2G --name mylv mygroup

use lvdisplay command to see all logical volumes,

Now when we have created a logical volume we have to format it before mounting,

mkfs.ext4 /dev/mygroup/mylv

Let’s mount this logical volume to the shared directory of the datanode in the hadoop cluster (here it is /dn), so the size of this directory becomes size of the logical volume.

mount /dev/mygroup/mylv  /dn

After that starting the datanode service ,

hadoop-daemon.sh start datanode

So, if we check the storage contributed by datanode, it will be around 2 GiB.

hadoop dfsadmin -report

Now, our datanode is running while sharing 2 GiB of storage. We will increase this storage to 4 GiB on the fly. Means we will add more 2 GiB from the vg to this lv (obviously vg should have that much space and lv can take space from only one vg),

Using lvextend command,

lvextend --size +2G /dev/mygroup/mylv

After increasing the size we have to format the added part of the storage without losing the previous data, use resize2fs command for this.

resize2fs /dev/mygroup/mylv

When we again check the storage contribution of datanode, it will now become 4 GiB

Finally, we were able to change the storage contribution of datanode to the hadoop cluster by integrating it with the concept of LVM.

Thank YOU! 😀

--

--

Tech and Programming, MLOps, DevOps Assembly Lines, Hybrid Multi Cloud, Flutter and Ansible Automation