SuperUser Account / Tuesday, April 28, 2015

Big Data - on VMware?

As told by SureSkills Principle Cloud Evangelist, Calvin Riskowitz.

Apache Hadoop provides a platform for building distributed systems for massive data storage and analysis using a large cluster of standard x86 based servers. It uses data replication across hosts and racks of hosts to protect against individual disk, host, and even rack failures. A job scheduler can be used to run multiple jobs of different sizes simultaneously, which helps to maintain a high level of resource utilisation. Given the built in reliability and workload consolidation features of Hadoop it might appear there is little need to virtualise it. However, there are several use cases that make virtualisation of this workload compelling:

• Enhanced availability with capabilities like VMware High Availability and Fault Tolerance

• Easier deployment with vSphere tools or Serengeti, leading to easier and faster datacentre management

• Sharing resources with other Hadoop clusters or completely different applications, for better data centre utilisation

• Elasticity enables the ability to quickly grow a cluster as needs warrant, and to shrink it just as quickly in order to release resources to other applications.

VMware Big Data Extensions enables the rapid deployment of Hadoop clusters on a VMware vSphere virtual platform.

Big Data Extensions is the enterprise version of Project Serengeti and is a supported feature of VMware vSphere, the main features of VMware’s Big Data solution are:

Project Serengeti:

SureSkills_VMware An open source project initiated by VMware, Project Serengeti lets users deploy and manage big data clusters in a vCenter Server managed environment. Themajor components are the Serengeti Management Server, which provides cluster provisioning, software configuration, and management services; an elastic scaling framework; and command-line interface. Serengeti deploys Hadoop Distributed File System (HDFS), MapReduce, Pig, Hive, and HBase on vSphere. {The project is endorsed by all major Hadoop distributions, including Cloudera, Hortonworks, MapR, and Pivotal.}

Serengeti Management Server:

Provides the framework and services to run Big Data clusters on vSphere. The Serengeti Management Server performs resource management, policy-based virtual machine placement, cluster provisioning, software configuration management, and environment monitoring.Big Data Extensions runs on top of Project Serengeti. Big Data Extensions is delivered as a vCenter Server Appliance.

Big Data Extensions includes all the Project Serengeti functions and the following additional features and components:

• Enterprise level support from VMware.

• Hadoop distribution from the Apache community.

• The Big Data Extensions plug-in, a graphical user interface integrated with vSphere Web Client. This plug-in lets you perform common Hadoop infrastructure and cluster management administrative tasks.

Big-data-image • Elastic scaling lets you optimise cluster performance and utilisation of physical compute resources in a vSphere environment. Elasticity-enabled clusters start and stop virtual machines, adjusting the number of active compute nodes based on configuration settings that you specify, to optimise resource consumption. Elasticity is ideal in a mixed workload environment to ensure that workloads can efficiently share the underlying physical resources while high-priority jobs are assigned sufficient resources.

Big Data Extensions performs the following steps to deploy a big data cluster.

• The Serengeti Management Server searches for ESXi hosts with sufficient resources to operate the cluster based on the configuration settings that you specify, and then selects the ESXi hosts on which to place Hadoop virtual machines.

• The Serengeti Management Server sends a request to the vCenter Server to clone and configure virtual machines to use with the big data cluster.

• The Serengeti Management Server configures the operating system and network parameters for the new virtual machines.

• Each virtual machine downloads the Hadoop software packages and installs them by applying the distribution and installation information from the Serengeti Management Server.

• The Serengeti Management Server configures the Hadoop parameters for the new virtual machines based on the cluster configuration settings that you specify.

• The Hadoop services are started on the new virtual machines, at which point you have a running cluster based on your configuration settings.

To find out more about Big Data on VMware contact one of our Learning Consultants today on 01 240 2262 or Email info@sureskills.com (Dublin Office) or contact +44 (0) 28 9093 5555/Email niinfo@sureskills.com for our (Belfast Office).

If you’re interested in setting up your own VMware virtualised Hadoop cluster, VMware Big Data Extensions may be downloaded from here: https://labs.vmware.com/flings/big-data-extensions-for-vsphere-standard-edition )

Print

2024 Rate this article:

No rating

Big Data - on VMware?

VMware Big Data Extensions enables the rapid deployment of Hadoop clusters on a VMware vSphere virtual platform.

Theme picker