Definition of Big data
Big data is a
buzzword, or catch-phrase, used to describe a massive volume of both structured
and unstructured data that is so large it is difficult to process using
traditional database andsoftware techniques. In most enterprise scenarios the
volume of data is too big or it moves too fast or it exceeds current processing
capacity. Despite these problems, big data has the potential to help companies
improve operations and make faster, more intelligent decisions.
Or
Big data is a broad
term for data sets so large or complex that traditional data processing applications
are inadequate. Challenges include analysis, capture, data curation, search,
sharing, storage, transfer, visualization, and information privacy. The term
often refers simply to the use of predictive analytics or other certain
advanced methods to extract value from data, and seldom to a particular size of
data set. Accuracy in big data may lead to more confident decision making. And
better decisions can mean greater operational efficiency, cost reduction and
reduced risk.
What is Big data - really ?
There's nothing new about the notion of big data, which has been around since at least 2001. In a nutshell, Big Data is your data. It's the information owned by your company, obtained and processed through new techniques to produce value in the best way possible.
Ask any Big Data
expert to define the subject and they'll quite likely start talking about
"The three V's" - "volume, velocity and variety," concepts
originally coined by Doug Laney in 2001 (PDF) to refer to the challenge of data
management. In short, it's a lot of data produced very quickly in many
different forms. This could involve customer transactional histories,
production databases, web traffic logs, online videos, social media
interactions, and so forth.
An August, 2013 blog
post by Mark van Rijmenam titled "Why The 3V's Are Not Sufficient To
Describe Big Data," added "veracity, variability, visualization, and
value" to the definition, Broadening the realm even further. Rijmenam
stated "90% of all data ever created, was created in the past two years.
From now on, the amount of data in the world will double every two years."
Beyond Volume, Variety and Velocity is the issue of BIG DATA Veracity( 3V)
We have all
heard of the the 3Vs of big data which are Volume, Varietyand Velocity. Yet,
Inderpal Bhandar, Chief Data Officer at Express Scripts noted in his
presentation at the Big Data Innovation Summit in Bostonthat there are
additional Vs that IT, business and data scientists need to be concerned with,
most notably big data Veracity. Other big data V’s getting attention at the
summit are: validity and volatility. Here is an overview the 6V’s of big data.
Volume
Big data
implies enormous volumes of data. It used to be employees created data. Now
that data is generated by machines, networks and human interaction on systems
like social media the volume of data to be analyzed is massive. Yet, Inderpal
states that the volume of data is not as much the problem as other V’s like
veracity.
Variety
Variety refers
to the many sources and types of data both structured and unstructured. We used
to store data from sources like spreadsheets and databases. Now data comes in
the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. This
variety of unstructured data creates problems for storage, mining and analyzing
data. Jeff Veis, VP Solutions at HP Autonomypresented how HP is helping
organizations deal with big challenges including data variety.
Velocity
Big Data
Velocity deals with the pace at which data flows in from sources like business
processes, machines, networks and human interaction with things like social
media sites, mobile devices, etc. The flow of data is massive and continuous.
This real-time data can help researchers and businesses make valuable decisions
that provide strategic competitive advantages and ROI if you are able to handle
the velocity. Inderpal suggest that sampling data can help deal with issues
like volume and velocity.
Veracity
Big Data
Veracity refers to the biases, noise and abnormality in data. Is the data that
is being stored, and mined meaningful to the problem being analyzed. Inderpal
feel veracity in data analysis is the biggest challenge when compares to things
like volume and velocity. In scoping out your big data strategy you need to
have your team and partners work to help keep your data clean and processes to keep
‘dirty data’ from accumulating in your systems.
Validity
Like big data
veracity is the issue of validity meaning is the data correct and accurate for
the intended use. Clearly valid data is key to making the right decisions. Phil
Francisco, VP of Product Management from IBM spoke about IBM’s big data
strategy and tools they offer to help with data veracity and validity.
Volatility
Big data
volatility refers to how long is data valid and how long should it be stored.
In this world of real time data you need to determine at what point is data no
longer relevant to the current analysis.
Big data
clearly deals with issues beyond volume, variety and velocity to other concerns
like veracity, validity and volatility.
Big data applications: Real-World strategies for managing
BIG-DATA
Even if your organization is compelled to become more
data-driven, many don’t know how to transform themselves out of the
use-your-gut mentality and into a data-first one.
The easiest way? Take shortcuts by refusing to reinvent the
wheel and following the trails blazed by early adopters. Here are 5 cool Big
Data apps, along with the use cases (and end users) that are helping to change
the meaning of “business as usual.”
1. Big Data application: Roambi
How this Big Data app works: One thing often overlooked in
the rush towards data-driven decision making is mobility. Increasingly mobile
workforces need more ways to manipulate data from a smartphone that just basic
business tools, which are so often stripped down for mobile. Mobile workers
need the ability to access and analyze the same business data they use in the
office in order to make smart, on-the-go decisions.
Roambi contends that it was founded to solve this very
problem. Roambi’s goal is to reinvent the mobile business app to improve the
productivity and decision-making of on-the-go employees. Roambi re-designs the
way people interact with, share, and present data from a completely mobile
perspective.
Use case of note: The Phoenix Suns. In addition to their goal of consistently
performing at an elite level on the court, the Phoenix Suns are making big
strides off the court through the use of analytics, which they use to help
drive strategy for both business and basketball decisions.
While considered by some in the NBA as a small business in
terms of the infrastructure and processes in place, in the past three years,
the Suns organization has invested significant resources in not only organizing
the data they accumulate, but in also guaranteeing the accuracy of that data
and ensuring that it is being used by all decision makers across the
organization.
Whether it’s an off-site meeting or a long road trip, as is
the nature with any professional sports team, a majority of their work is done
away from the office. The organization’s ownership was looking for a way to
make their critical business data available wherever their decision makers were
located.
As the Suns began taking steps to become more mobile, there
was a healthy amount of skepticism that a mobile solution could be found that
was both valuable and, more importantly, easy enough for end users (most of
whom don’t have a very technical background) in the organization to adopt.
That changed when the Suns adopted Roambi. The Suns started
using Roambi. analytics with their front office, organizing and visualizing key
player scouting information all in one place, as well as making this
information available in real time.
After the success of the initial rollout, the Suns decided to
expand their use of Roambi to their back office. On the business side, the Suns
optimized their operations by providing KPIs across sales and marketing,
reporting on everything from ticket sales to game summary reports to in-stadium
promotions to customer buying behavior to inventory – all via mobile devices,
so executives were all working off of same set of numbers and were able to make
critical business decisions in a moment’s notice.
2. Big Data application: Esri ArcGIS
How this Big Data app works: Esri ArcGIS, as the name
implies, is a Geographic Information System (GIS) that makes it easy to create
data-driven maps and visualizations.
Use case of note (in this case it is more of a partnership):
In mid-July at the Esri User Conference, the company radically updated its Developed in
partnership with Richard Saul Wurman and Radical Media and originally launched last
year, the Urban Observatory helps cities use the common language of maps to
understand patterns in diverse datasets.
“Our world has always had Big Data surrounding us that, until
recently, has remained untapped for any real understanding,” said Wurman. “We
are several iterations into developing a common language for mapping
urbanization. It will allow cities to understand not only the major threads of
their performance, land use, and contents comparatively but [also,] eventually,
the nuance of change and action.”
I attended the Esri UC last week and spent plenty of time
playing with (and before that standing in line to get access to) the Urban
Observatory exhibit, an interactive exhibit that makes it easy to compare and
contrast data from cities worldwide, all on a touch screen.
At least half of the world's population is currently living
in urbanized areas. The Global Health Observatory (GHO) projects that by 2050,
7 out of 10 people will live in a city. This year, nearly 60 cities are part of
the Urban Observatory.
Participation in Urban Observatory is open to every city
around the globe. Any city that has data its officials would like to share is
eligible to be included. In February 2015, Urban Observatory will go on
permanent display in the Smithsonian Institution.
3. Big Data Application: Cloudera Enterprise
How this Big Data app works: Not long after nailing down one
of the (if not the) biggest funding rounds in history, Cloudera is now making
inroads into the Internet of Things market with its app, locking down a deal
with a major home automation company in mid-July. Oh, and I almost forgot: that
close partnership with/funding from Intel is something you just can’t ignore
either.
Use case of note: Cloudera has a ton of customers, but Wells
Fargo and home automation company Vivent are two to pay attention to. Wells Fargo has used Cloudera Enterprise to
build an enterprise data hub.
Vivent is the use case that really caught my attention,
though, since it ties two of the hottest, most promising tech trends of the
moment together. Vivent is using Cloudera Enterprise to glean insights from the
data generated from intelligent devices and systems embedded with sensors in
and around homes. “[With Cloudera, we can now] look across many data streams
simultaneously for behaviors, geo-location, and actionable events in order to
better understand and enrich our customers’ lives. This platform has
differentiated our business and given us a tremendous competitive advantage,”
said Brandon Bunker, senior director, Customer Analytics and Insights
Vivent says that it has acquired more than 800,000 customers
using a variety of third-party smart-enabled devices – roughly 20-30 sensors
per home. Many of those devices come in the form of thermostats, smart
appliances, video cameras, window and door sensors, and smoke and carbon
monoxide detectors. Without a central internal repository to gather and analyze
the data generated from each sensor, Vivent was previously limited in its
ability to innovate and to add higher intelligence to its security offerings.
For example, knowing when a home is occupied or vacant is
important to security – but when tied into the HVAC system (which tends to be
the largest contributor to a home’s energy bill and carbon emissions), you can
add a layer of energy cost savings by cooling or heating a home based on
occupancy. Similarly, by adding geo-location into the equation, you can begin
to adjust temperature changes to a home based on the proximity to an owner’s
arrival, for instance, when the owner has a connected vehicle. Studies have
shown that consumers could see 20 to 30 percent energy savings by turning off
HVAC systems when residents are away or sleeping
4. Big Data application: Zaloni Bedrock
How this Big Data app works: Many businesses know they want
to implement a Hadoop data lake, but don’t know how to do so in a
cost-effective, scalable way. Moreover, simply putting data into Hadoop does
not make it ready for analytics. To use common analytics toolsets, you must
know where data is, how it’s structured (or not) and where it came from.
You may also need to prepare it by filtering or joining
datasets together, or masking out parts that are sensitive in nature. This
typically takes a significant amount of time and effort and can be highly error
prone. If you’ve done a poor job ingesting, organizing, and preparing data for
analytics, the results of your analytics will be equally poor. Flawed analytics
can lead to flawed business decisions and making better business decisions was
the whole point of the data lake in the first place.
With Zaloni Bedrock, the process is automated. According to
Zaloni, you set it up once and you’re done. It doesn’t matter how much data you
are adding to the lake, since there is no technical limit.
Zaloni argues that without a product like Bedrock to help you
along, 60 percent or more of the time and effort you spend to build an
analytics system using a Hadoop data lake will be spent on data management and
data preparation alone.
Use case of note: UnitedHealth Group’s Optum division, an IT
and tech-enabled health services business, uses Bedrock as part of their data
platform to manage services like data ingest and workflow execution. Bedrock
enables Optum to monitor multiple data sources, capture and store
schema/operational metadata, and provides features like data catalog search for
end users.
5. Big Data application: Tamr
How this Big Data app: Tamr is a data-connection and
machine-learning platform designed to make enterprise data as easy to find,
explore, and use as Google. According to Tamr, due to the cost and complexity
of connecting and preparing the vast, untapped reserves of data sources
available for analysis, most organizations use less than 10 percent of the
relevant data available to them.
It’s just too manual, too inefficient and too expensive to
connect and ready the massive variety of internal and external data for
analytics and other applications critical for business growth. Tamr argues that
if the industry is going to be successful at helping customers manage the
growth and variety of data that lies ahead – from internal sources, external
public and private sources, Internet of Things feeds, etc. – a complete
overhaul of traditional methods of information integration and quality
management will be required.
Use case of note: Multinational media and information company
Thomson Reuters faced challenges maintaining critical, accurate data. It had
outgrown its manual curation processes and looked to Tamr to provide a better
solution for continuously connecting and enriching its core enterprise information
assets (data on millions of organizations with more than 5.4 million records
pulled from internal and external data sources).
Using Tamr, one project that Thomson Reuters estimated would
take six months was completed in only two weeks, requiring just forty hours of
manual review time – a 12x improvement over the previous process. The number of
records requiring manual review shrunk from 30 percent to 5 percent, and the
number of identified matches across data sources increased by 80 percent – all
while achieving Thomson Reuters’ 95-percent precision benchmark.
Tamr says that the disambiguation rate (or the rate of
resolving conflicts) rose from 70 percent to 95 percent. Furthermore, the
knowledge Tamr gleaned from its machine learning activities means that future
data integration will take even less time per source.
Sign up here with your email
ConversionConversion EmoticonEmoticon