Start Creating
Data For AI

Try for Free

Want to talk to our sales team instead?

By logging in, you agree to our terms of service and privacy policy

Blog

TECH: announcing our 1st open source project – ganitha

blog-details-banner

blog-details-userAbhishek Mehta

blog-details-eye-slashAug 19, 2013

blog-details-banner

this will probably rank amongst the proudest of our achievements when we look back on the path to making tresata into the company we believe will lead and define what next-gen analytics software should do…

so without much ado, we are announcing the first open source algorithm library built completely on scalding…and designed to work with/in mahout & hadoop

we decided to name it  “ganitha” – derived from the sanskrit word for mathematics, or science of computation

ganitha is a collection of algorithms we have written driven by a need to perform at-scale machine-learning and statistical analysis on hadoop. the reasons we picked scalding and hadoop would be obvious to the most passionate of developers in the scala and hadoop eco-systems…

but why open-source parts of our machine learning library?  isn’t our software all about the algorithms anyway?

well…we fundamentally believe that the ‘IP’ isn’t in the algorithms (it never is)…but how and where they are applied…

not to mention we are fervent supporters of the OSS movement – are vociferous in our support of it, our gratitude towards it and our drive to give back…and while we have done the same through numerous small & some significant contributions to other open source projects like hadoop, scala, avro, scalding, cascading…we wanted to push the ball rapidly forward on ‘top-of-the-stack’ data analysis technology…and hence ganitha

as a start, we will be open-sourcing…

1.  our integration of mahout vectors into scalding, and

2.  our clustering (k-means||) implementation

…with plans to keep adding to the library at a rapid pace, both ourselves and with the help of the growing community of developers looking to get maximum analytics power out of their hadoop implementations.

all the software open sourced here has been rigorously tested and deployed to work against very very large sets of transactional and social data with very very good results…

the details on ganitha are coming in the next couple of days in posts by koert and andy…

time to make the elephant eat humble ‘pi’ (i know you got that one)