Wednesday, February 20, 2013

If you want to be a data scientist, you should know these 67 questions.

Recently, I read a blog by Vincent Granville on Data Science Central. If you would like to apply for a data scientist job or even prepare for this kind of job, you may try to answer these 66 open-ended questions.

66 job interview questions for data scientists

Here are some questions from it:

What is the curse of big data? (answer)

Examples where mapreduce does not work? Examples where it works very well? What are the security issues involved with the cloud? What do you think of EMC's solution offering an hybrid approach - both internal and external cloud - to mitigate the risks and offer other advantages (which ones)?  (answer)

I would like to add a question (not Why not 67 questions?):

Why do we need more data scientists? (answer)

Tuesday, February 12, 2013

Big Data in Financial Service Industry

In order to get senior management's buy-in on Big Data, you will have to show them some use cases.

Let's start from the financial service industry including the banks and others.

From Oracle:

This Oracle White Paper briefly talks about Oracle Big Data technology and several use cases in the financial services industry.

Financial Services Data Management:Big Data Technology in Financial Services

From IBM:

IBM solutions for big data provides banks with an integrated and scalable set of cost-effective, high-performance tools that support the rapid ingestion of important customer data from a variety of sources and the fast analysis of large volumes of data at transactional, product or enterprise levels.

See the link from IBM website: Deriving Business Insight from Big Data in Banking

And White Paper: IBM Information Agenda for Banking - Financial Crisis and Integrated Risk Management for Financial Institutions

From IDC:

The document is not free. You will have to  pay US$1,000 to get it.

Big Data - Use Cases in Financial ServicesPrice: US $1,000

Author: Michael Versace

Insights Presentation
July, 2012  -  Doc # FIN236035
Number of Pages: 18
Abstract
Data is the currency of competition in financial service. The effective use of data and information is the foundation upon which firms compete. Services are wrapped around data to differentiate products and services. For example, knowing which customers represent the best credit revenue and profitability opportunity to a bank is a question that only data and analysis can answer.
As an extension, IDC Financial Insights believes that Big Data and business analytics can quickly deliver competitive advantage for those firms that effectively harness and leverage the trend.. In this IDC Financial Insights presentation, we describe some of the drivers behind big data with examples for how big data technologies are being applied against some demanding business imperatives in the financial markets today. The presentation concludes with Essential Questions and Guidance to practitioners.


Sunday, February 10, 2013

The History of Big Data - 2

In my blog "The History of Big Data", it says the name "big data" originated as a tag for a class of technology with roots in high-performance computing.

After reading NYTimes article "The Origins of ‘Big Data’: An Etymological Detective Story" ,  I found out the origins of Big Data might not be different. 

In the article, it mentioned Francis X. Diebold, an economist at the University of Pennsylvania and his most recent paper. In the paper it concludes: “The term Big Data, which spans computer science and statistics/econometrics, probably originated in the lunch-table conversations at Silicon Graphics in the mid-1990s, in which John Mashey figured prominently.”

Wednesday, February 6, 2013

Big Data skills

To get into the field of Big Data, lots of people especially IT professionals are wondering what kinds of skills are required.

Here are some skills you should have or plan to have:



It will take time to learn and explore. But all the above skills will help you build your Big Data career path such as Data Scientist.

Here are some articles for your reference:

"Big data analytics is sometimes sold as a boon for IT workers, with analyst house Gartner predicting that within three years there will be 4.4 million staff working on big data projects. "

"The U.S. faces a substantial shortage of workers with data science skills, according to a much-talked about report published last year by consulting firm McKinsey and Company. The report predicted that by 2018 the country will lack 1.5 million analysts who can make strategic decisions using big data and between 140,000 to 190,000 workers with the proper data-processing technology skills."

"Regardless if they are called Data Scientists or Data Analysts, Data geeks need to be more in control of their destiny. "

Saturday, February 2, 2013

Big Data University

You may be wondering where you should start your Big Data learning journey. After a bit research, I found Big Data University is a good place to try.

Big Data University is an online educational site run by new and experienced Hadoop, Big Data and DB2 users who want to learn, contribute with course materials, or look for job opportunities. And it is hosted on the Cloud and using Moodle 2 course management system enabled to run on DB2. It is in the Beta stage.

The site includes free and fee-based courses delivered by experienced professionals and teachers.
When I saw DB2 but not other databases (including open source ones), I guess this site is either sponsored by IBM or run by IBM product lovers. Anyway, it is no harmful for you to learn Big Data.


In order to study in this "university", you should register by either using your Google, Facebook, Yahoo or ChannelDB2 account or creating your Big Data University account. Most of IT or data professionals should already have at least an account from Google, Facebook or Yahoo. If you don't use DB2, you might not even know ChannelDB2.


According to the site statistics, there are 63339 registered students (as of today - Feb.2, 2013 - not sure if it publishes the latest number). If you put this number under the perspective of real universities,  it is about 3 times size of Harvard (about 20,000 students)  or Stanford (about 18,000).


So, you want to join?