Tuesday, October 11, 2016

Hortonworks Hears a Who

Hortonworks Hears a Who


I foresee that someday HDInsight will be a great product. Taking away all the pain of configuring Hadoop clusters and providing a cloud service with direct connections to the other Windows Azure services is something of great value.

Yet, there is a problem when you take a dependency on open source: it evolves at times too quickly for one to be able to have a stable build. It is one of the beauties of open source that anyone can fix a bug, and the first person to hit one can jump right into the code and fix it. There is a problem with that approach: such person may not be the best qualified one to fix the problem. Or, in the process of fixing it, may introduce yet another bug, or even more than one. When being the early adopter of new technologies, one risks stepping into quicksand and being stuck there for a while. What makes it ironic that virtual machines with "stable" configurations of several services properly bound together are named a "sandbox".

A while ago, Microsoft partnered with Hortonworks to have the HDP (Hortonworks Data Platform) being the basis for the HDInsight clusters in Azure. You can go to the Azure Management Portal and get a HDInsight cluster provisioned in minutes. It would be all nice and splashing... and we could all be enjoying the jungles great joys... except for... the quicksand. If you create an HDInsight cluster as of midway through March 2014, you get Hadoop 1.2.0, HCatalog 0.11.0, Pig 0.11.0, Hive 0.11.0, Oozie 3.3.2, and some other services in some other versions. If you go to the Hortonworks site and download their Sandbox version 2.0, which was built around late October 2013, you get Hadoop 2.2.0, HCatalog 0.12.0, Pig 0.12.0, Hive 0.12.0, Oozie 4.0.0, and some other services in some other versions. Noticed a pattern? A difference in version is a difference in version, no matter how small. Even worse if it is not that small after all...

A Sandbox has an important value: when you are developing some Hadoop-based workflow and something is failing, you would very much prefer to debug issues locally. With laptops nowadays having i7 CPUs and SSDs large enough to hold at least some medium size data for tests, it is definitely worth to have local Sandbox you can use. As long as it doesnt differ that much from what you will get in your cloud service. And as long as getting the Sandbox up and running is quick and easy.

Which bring us to the fact that Hortonworks, for a reason that I fail to understand, decided to make the HDP2 Sandbox available for VirtualBox and VMWare, and no longer for Microsofts Hyper-V. Maybe that is waiting for when the HDInsight image is in sync. Yet, in the meantime, a solution would be to use VirtualBox, or VMWare. However, if you just download and install both VirtualBox and the corresponding HDP2 Sandbox, you will face the following error message.
VirtualBox - Error
Failed to open a session for the virtual machine Hortonworks Sandbox 2.0.
VT-x is not available. (VERR_VMX_NO_VMX).
Details

Result Code: E_FAIL (0x80004005)
Component: Console
Interface: IConsole {db7ab4ca-2a3f-4183-9243-c1208da92392}


Quite annoyning. Yet, searching the web indicates this could likely be resolved just disabling hardware virtualization. I proceeded and did that.
C:Program FilesOracleVirtualBox>VBoxManage list vms
"Hortonworks Sandbox 2.0" {93a61b40-5bb4-4038-bf3d-e7a1285f5063}
C:Program FilesOracleVirtualBox>VBoxManage modifyvm "Hortonworks Sandbox 2.0" --hwvirtex off


That at least starts the VM in VirtualBox, only to get to this new error message:
This kernel requires an x86-64 CPU, but only detected an i686 CPU.
Unable to boot - please use a kernel appropriate for your CPU.


It looks like since a while ago VirtualBox decided to depend by default on hardware virtualization. So, the way out is to enable virtualization in the BIOS, and enable back the hardware virtualization for VirtualBox, and then disable the Microsoft Hyper-V, which would be competing with VirtualBox for the hardware-assisted virtualization:
C:Program FilesOracleVirtualBox>VBoxManage modifyvm "Hortonworks Sandbox 2.0" --hwvirtex on
>dism.exe /Online /Disable-Feature:Microsoft-Hyper-V

After a reboot, all is working! Should you need to enable again the Microsoft Hyper-V feature, that can be done via the GUI to enable/disable Windows features, or this command line:
>dism.exe /Online /Enable-Feature:Microsoft-Hyper-V /All

Now, the only the remaining problem is to get the Hortonworks Sandbox and the HDInsight images in sync... Hope someone is hearing the calls for help all over the web...

Available link for download