Sunday, November 23, 2014

Big Data on Amazon - 1

Too much talk about the Big Data, I wrote another one after coming back from Oracle OpenWorld this year.

To learn a technical skill, it'd better to use it. Then how? In the companies, especially large companies, it's hard to do what you want (for new technologies) unless you happen to join the new projects or new established teams. Another way to learn by using is to join some open source projects outside your company work. 

But like most of IT professionals, you might not have enough spare time to join those open source projects if you have family to take care after daily work. So can we learn to use Big Data by testing in an existing platform with some examples?

I am sure you are thinking the same thing like myself. How about checking Amazon? It is still the main provider of PaaS and IaaS in the market

To test Big Data in Amazon, it offers the following through Amazon EMR, which simplifies running Hadoop and related big-data applications on AWS and can be used to manage and analyze vast amounts of data. 
To know more about Amazon EMR, you can visit the FAQs. For the beginners like me, we can focus on the following areas:

Q: Where can I find code samples?
Check out the sample code in these Articles and Tutorials.
Q: How do I develop a data processing application?
You can develop a data processing job on your desktop, for example, using Eclipse or NetBeans plug-ins such as IBM MapReduce Tools for Eclipse (http://www.alphaworks.ibm.com/tech/mapreducetools). These tools make it easy to develop and debug MapReduce jobs and test them locally on your machine. Additionally, you can develop your cluster directly on Amazon EMR using one or more instances.