Continuing the Research of “Top 25 Digital Startup ideas and technologies for 2017,” in this section, we evaluate and highlight aspects of “Big Data – Data Analytics and Analysis”
Big Data Integration Tools
Big data is a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them. Challenges include capture, storage, analysis, data curation, search, sharing, transfer, visualization, querying, updating and information privacy. The term “big data” often refers simply to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set. – Wikipedia
Massive volume of both structured and unstructured data so large that it is difficult to process using traditional database and software techniques. Some of the key attributes of big-data include:
- Volume – Data at rest. Addition of Terabytes to Exabyte’s of existing data to process.
- Variety – Data in many forms that includes structured, unstructured, text, multimedia.
- Velocity – Data in motion that includes Streaming data, milliseconds to seconds to respond.
- Veracity – Data in doubt.Uncertainty due to data inconsistency, incompleteness, ambiguities, latency, deception, approximations.
Data integration, in the context of Big data is all about the V’s:
- Size: large volume of sources, changing at high velocity.
- Complexity: huge variety of sources, of questionable
Big data integration, ensuring consistent transfer of large volumes of data across systems and platforms is a key operational challenge that Information Technology (IT) leaders need to address. Most of the data integration infrastructure combines advanced hybrid data integration capabilities and centralized governance with flexible self-service business access for analytics.
Why is Integration challenging?
In addition to the typical data integration challenges highlighted above, Volume and Velocity are of specific focus for Data Integration specialists.
Volume: Number of structured sources. Big-data sources can involve millions of high quality relational tables, high quality deep web sources and useful relational tables from web lists are available on the web.
Challenges can include:
- Difficulty in schema alignment and data mapping (source to target)
- Expensive to warehouse all the integrated data
- Infeasible to support virtual integration
Velocity: Rate of change in structured sources: many sources provide rapidly changing data, e.g. stock prices. There are millions of high quality deep web sources. As per some estimates, there are 96,000 deep web sources, 450,000 databases, 1.25 Million query interfaces on the web.
Challenges can include:
- Difficulty in understanding the evolution of semantics
- It is extremely expensive to warehouse data history
- Infeasible to capture rapid data changes in a timely fashion
How should IT leaders approach Big Data Integration?
Many organizations are unaware of the benefits of automating data integration. Data integration can enhance and streamline business processes in several ways including closing deals faster, increase revenue, and create deep-seated roots within accounts. IT leaders should be considering evolving data trends and how their strategy will scale to handle expanding volumes of data from several sources. Integration can enhance business processes such as:
- Marketing Automation – Businesses and organizations need hubs that combine data from all their customer data sources, to allow them to make the best possible business decisions. Not to mention this will help oversee customer relationships more adequately. Integration for marketers means better conversions, customer insights, and improved customer retention.
- Big Data Analytics – “Big data” is unstructured and structured data that comes from many different sources. Without integration it will hard to manage the movement of all of this from one storage to another. If you have the right integration technology you can perform analytics faster, and in turn gain important business insights.
- Cloud Integration – So many companies are utilizing cloud storages for their data such as Amazon Web Services. A company needs to synchronize their data and use integration to maintain that the specific data is consistent without the process. Without integration there is a chance of another silo of data being created and thus leading to inconsistencies in work.
- Business Intelligence – Data integration is important to agile business intelligence. The secret here is to implement smart integration. Effective decision-making happens when data integration happens in real time or close to.
A few popular tools for Big-Data analysis include
|Adeptia Integration Suite (AIS) is an enterprise-class solution that simplifies all aspects of cloud and on-premise integration including B2B Integration, Application Integration (ESB), Business Process Management, and Data Integration (ETL) — eliminating the need for multiple vendors & applications, maximizing your investment, and ensuring your organization is prepared for the future. With Adeptia’s unified approach, you pay for only the functionality you need, while being able to scale as your needs evolve.
|Attunity is a leading provider of information availability software solutions that enable access, sharing and distribution of data, including Big Data, across heterogeneous enterprise platforms, organizations, and the cloud. Attunity data integration solutions include data replication, change data capture (CDC), data connectivity, enterprise file replication (EFR) and managed-file-transfer (MFT).
|Denodo is a market leader in Data Virtualization. Founded in 1999 and based in Palo Alto, California, Denodo offers high-performance Data Integration and abstraction across a wide range of Big Data, enterprise, cloud, unstructured data sources and real-time data services. Denodo also enables access to unified business data for agile BI, analytics, and single-view applications. They are privately held and have roughly 300 customers of varying sizes.
|HVR provides real-time data replication for Business Intelligence, Big Data, and hybrid cloud. HVR can solve for a variety of Data Integration use cases including data migrations, Data Lake consolidation, geographic replication, database replication, and cloud integrations.
|Informatica is a leading provider of data integration software. Informatica’s enterprise data integration and management solutions and both mainframe and cloud based and include data governance, data migration, data quality, data synchronization and data warehousing.
Opportunity for Startups in Big Data Integration
Corporate digitization efforts and the need for expertise to guide the transformation translates to opportunity for consultants and software product development firms. Startups have begun exploring new and innovative techniques for big-data management, data aggregation and visualization.
Analyst firms periodically review Trends and Systems Integrators in this space. A few articles and research reports on the topic:
- What is Big Data? – IBM
- Big data: The next frontier for innovation, competition, and productivity – McKinsey report 2011
- Visualization is the future: 6 startups re-imagining how we consume data – Interesting article from 2013. Many of these startups have since been acquired.
- Big Data – Data aggregation and visualization tools
Compiled and Edited by: Mohan K | Research assisted by Sanwar Tagra, who is currently pursuing his Bachelors in Engineering at BML Munjal University. |
| June 2017 Reproduction with permission only |