Hadoop software vendor, Tresata, announced this week that they have developed the first open source algorithm library built completely on scalding, designed to work within/in Mahout & Hadoop. Scalding, a Scala library, wicks away low-level Hadoop complexities, aiming to make it easier for developers to specify Hadoop MapReduce jobs. The library is used as the core API for all of Tresata’s Hadoop-focused software.
Tresata’s boasts that their new library, dubbed “ganitha,” is the first open source implementation of machine learning and statistical techniques on Scalding. “The core idea behind ganitha was to make complex pieces of MapReduce logic available in a much more clean, simple and powerful abstraction that allows you to run real world algorithms at massive scale,” explained Koert Kuipers, CTO of the company in a blog post this week. Kuipers says that the nedd to have sparce vectors available in scalding with compact in-memory and serializable representations drove the integration of Mahout vectors into scalding.
Abhishek Mehta, COE and co-founder of Tresata, explained that the decision to open source ganitha stems from a fundamental belief that the intellectual property isn’t in the algorithms, but in how and where they are applied.
Ganitha can be found at its GitHub home here.