Sunday, January 27, 2013

Big Data Use Case #2 - Netflix

Just last week, Netflix stock soared after it fourth-quarter results top forecasts. On Jan.24, shares of Netflix rose $43.60 to $146.86 on Nasdaq, their highest level since September 2011.

Also in its earnings report, the company predicted it will add as many as 2.1 million U.S. streaming members in the first quarter, more than it gained during the first three months of last year.

How will Netflix to attract new subscribers? Although the company didn't tell, analyzing the Big Data should be one of the techniques. The people who has been following Netflix should know the Netflix Prize contest. The following provides a bit detail:

Netflix is all about connecting people to the movies they love. To help customers find those movies, we’ve developed our world-class movie recommendation system: CinematchSM. Its job is to predict whether someone will enjoy a movie based on how much they liked or disliked other movies. We use those predictions to make personal movie recommendations based on each customer’s unique tastes. And while Cinematch is doing pretty well, it can always be made better.

Now there are a lot of interesting alternative approaches to how Cinematch works that we haven’t tried. Some are described in the literature, some aren’t. We’re curious whether any of these can beat Cinematch by making better predictions. Because, frankly, if there is a much better approach it could make a big difference to our customers and our business.

So, we thought we’d make a contest out of finding the answer. It’s “easy” really. We provide you with a lot of anonymous rating data, and a prediction accuracy bar that is 10% better than what Cinematch can do on the same training data set. (Accuracy is a measurement of how closely predicted ratings of movies match subsequent actual ratings.) If you develop a system that we judge most beats that bar on the qualifying test set we provide, you get serious money and the bragging rights. But (and you knew there would be a catch, right?) only if you share your method with us and describe to the world how you did it and why it works.

Serious money demands a serious bar. We suspect the 10% improvement is pretty tough, but we also think there is a good chance it can be achieved. It may take months; it might take years. So to keep things interesting, in addition to the Grand Prize, we’re also offering a $50,000 Progress Prize each year the contest runs. It goes to the team whose system we judge shows the most improvement over the previous year’s best accuracy bar on the same qualifying test set. No improvement, no prize. And like the Grand Prize, to win you’ll need to share your method with us and describe it for the world.

According to the company blog,  Netflix announced the $1M Grand Prize winner of the Netflix Prize contest as team BellKor’s Pragmatic Chaos for their verified submission on July 26, 2009 at 18:18:28 UTC, achieving the winning RMSE of 0.8567 on the test subset.  This represents a 10.06% improvement over Cinematch’s score on the test subset at the start of the contest.

To know how much Cinematch has contributed to Netflix's financial result, it will need another project to make the calculation. One thing for sure, the company should collect more data from its subscribers not only from its business but also from other social source. The more the data they get, the better the recommendation they should provide, the larger the revenue they should make.

Friday, January 25, 2013

Big Data Use Case #1 - NBA

Have you ever heard of a company named Ayasdi? I didn't know this name until I recently Sarah Reedy's blog Ideas Watch: Ayasdi Gives Big-Data a Name.

In her blog, she talked about Ayasdi just got $10.25 million in Series A funding. For what? Ayasdi's cloud-based Insight Discovery Platform uses distributed computing, machine learning, and user-experience technologies to take all the guess work out of massive data sets. In company's own website, it says "Solving Today’s Biggest Problems Requires an Entirely New Approach to Data" and "A New Way to Discover Insights Leading to Breakthrough Outcomes".

I was amazed by the following picture named "Big-Data Basketball". If I didn't read the note under the picture, I thought it was about the new discovered galaxies by NASA or some new genetic maps found by scientists. It is actually a topological similarity network of 452 NBA players during the 2010-2011 season. Ayasi used its software to discover patterns from those NBA players' data and broke down the player into 13 classifications beyond the 5 normal positions on the court ( point guard, shooting guard, small forward, power forward and center).
 

Then what? The result from the analysis could change how coaches and general managers think about the roles their players fill and help team win more games. Also, the analysis could help team find good players and potential good players. In other words, the software makes the Big Data create value (money).  You can get more detail from the WIRED magazine article "Analytics Reveal 13 New Basketball Positions".

This use case also tells that this valuable analysis of big data was not done by those large companies like IBM, Oracle and Microsoft, but a startup.

Big Data provides huge opportunities to the startup companies.

Thursday, January 24, 2013

Oracle and Big Data

When people talk about Oracle, they first think about its RDBMS (relational database). After Oracle acquired so many companies including BEA and Sun, people know about its Java and Weblogic. So where is Oracle's Big Data product?

On its own site, Oracle provides the information about its products to help customers acquire and organize big data and analyze them alongside customers' existing data to find new insights and make better business decision. Oracle's Big Data platform provides end-to-end solution - all the components the customers need to get real results from their big data initiatives.

Acquire Big Data

Making the most of big data means quickly analyzing a high volume of data generated in many different formats. Oracle offers a range of products for acquiring all your data including:
Oracle NoSQL Database
Oracle Database

Organize Big Data

A big data platform needs to process massive quantities of data in batch and in parallel—filtering, transforming and sorting it before loading it into an enterprise data warehouse. Oracle offers a choice of products for organizing big data including:
Oracle Big Data Appliance
Oracle Data Integrator
Oracle Big Data Connectors

Analyze Big Data

Analyzing big data within the context of all your other enterprise data can reveal new insights that can have a significant impact on your bottom line. Oracle offers a portfolio of tools for statistical and advanced analysis that complement Oracle Exadata, including:
Oracle Advanced Analytics
Oracle Exadata Database Machine
Oracle Data Warehousing
Oracle Exalytics In-Memory Machine

You can watch the Oracle Bigdata Videos on YouTube:



Also, you can read this Oracle White Paper - Oracle Information Architecture: An Architect's Guide to Big Data.

Tuesday, January 22, 2013

IBM and Big Data

Most of large companies like IBM, Microsoft and Oracle are promoting Big Data ideas and their related software relating to Big Data.

There is a good site maintained by IBM. It tells you where to start on the Big Data.

IBM Big Data - Where do I start?

One of the sections is as follows:

If you are new to BigData concepts you can start with this
1. http://www.ibm.com/bigdata - Quick introduction to Big Data. Reading time - 5 minutes
2. http://www-01.ibm.com/software/data/bigdata/enterprise.html - Give you an overview of the two products in IBM Big Data - InfoSphere Streams and InfoSphere BigInsights - Reading time 10 minutes 
3. http://bigdatauniversity.com - This contains an excellent Certification Course for Hadoop Fundamentals - and has a good coverage on the open source foundational components such as Hadoop, MapReduce concepts, Pig, Hive, Flume JAQL etc. There are videos, hands-on downloadable VM, lab exercises, reading material etc. The bible of Hadoop and MapReduce reference pdf book is available for download. If your expertise so far has been one line summary of each of the technolgies mentioned above, you will need to spend about 3 to 4 days to cover this course, reading time + exercises. There's a test that you can appear for at the end of the course and yes, you get a certificate if you clear it. Reading up and clearing certification time 4 to 5 days

Also, another site from IBM:

Big Data - Find developer and DBA resources, tutorials, and articles to help you grow your knowledge on big data technology and IBM's integrated big data platform.

Monday, January 21, 2013

The History of Big Data

Suddenly, every people talks about Big Data. When did Big Data start? Who invented the name - Big Data?

I like GilPress' blog -  "A Very Short History of Big Data" which summarizes big data's brief history starting 1944 when Fremont Rider, Wesleyan University Librarian, published The Scholar and the Future of the Research Library. In December 2008, Randal E. Bryant (CMU), Randy H. Katz (Berkeley), and Edward D. Lazowska (Univ of Washington) published “Big-Data Computing: Creating Revolutionary Breakthroughs in Commerce, Science and Society.”  They wrote: “Big-data computing is perhaps the biggest innovation in computing in the last decade. We have only begun to see its potential to collect, organize, and process data in all walks of life. A modest investment by the federal government could greatly accelerate its development and deployment.”

According to another article - "Forrester: Big data – start small, but scale quickly",  the name "big data" originated as a tag for a class of technology with roots in high-performance computing, as pioneered by Google in the early 2000s. It means the history of "Big Data" - people started to use the name -  is about 10 years.

Thursday, January 17, 2013

McKinsey and Big Data

Big Data is hot.  Big Data is as hot as Mobile and Cloud. If we add Mobile, Cloud and Big Data together (M+C+BD), the result will be the hottest thing (MCBD) in the world now.

This blog will focus only on Big Data. So I call it "Big Data Big".

When I started to pay attention to Big Data a while ago (even recently when I joined a seminar about Big Data), McKinsey had always been mentioned. So you go to Google and type in McKinsey and Big Data, the first result will be the following (except the paid result):


Big data: The next frontier for innovation, competition - McKinsey ...

www.mckinsey.com/.../big_data_the_next_frontier_for_innov...Share
MGI studied big data in five domains—healthcare in the United States, the public ... For example, a retailer using big data to the full could increase its operating ...

Print

Big data will become a key basis of competition, underpinning new ...


After you click the link, you will see the content of the article (Big data: The next frontier for innovation, competition, and productivity, dated May 2011) which briefly tells you about the Big Data and McKinsey Global Institute (MGI)'s seven key insights about Big Data. It will eventually lead you to download the full report about Big Data.