DataFlow - The Architecture and Ecosystem

Data Volumes are Exploding

The amount of data a business has to access and manage today is exploding not only in volume, but in complexity and heterogeneity of formats and sources as well. The proliferation of mission critical software applications and services, combined with the requirement for better, almost universal access makes the difficult problem of data integration bigger and more complex than ever before.

alt

“…According to estimates, mankind created 150 exabytes (billion gigabytes) of data  in 2005. This year, it will create 1,200 exabytes. Merely keeping up with this flood, and storing the bits that might be useful, is difficult enough. Analysing it, to spot patterns and extract useful information, is harder still. Even so, the data deluge is already starting to transform business, government, science and everyday life…”

Not only is the quantity of data produced projected for exponential growth, but the major sources of that data are projected to shift from predominantly enterprise sources to cloud and hosted SaaS applications and the web itself.

A key problem with this shift is that unlike the historical growth pattern of more and more data from a relatively consolidated, and therefore limited, number of leading enterprise applications, the projected growth in SaaS and web data sources comes from a much larger number of rapidly proliferating SaaS, cloud and web applications as well as data streams such as those from rapidly proliferating social networks and syndicated feeds such as RSS and Atom. This rapid proliferation of many and varied data sources is creating a “Long Tail” of data sources that need to be integrated into everything from enterprise applications and databases to rich internet applications (RIAs), enterprise mashups, and both business and market analytical tools.

This “Long Tail” creates a second, even more critical issue for those desiring to integrate these important, widely diverse data sources. Gartner estimates that today, even with all the “traditional” data integration solutions available from vendors for implementing EAI, EII, ETL and the newer approaches of ESB’s, SOA’s and other MOM solutions, 80% of the $10B integration market still consists of hand coded, point-to-point solutions.

As the rapid proliferation of new data sources floods organizations with needs for even more data connections, this need for a better solution than current integration products or hand coded patchwork solutions will only grow. If vendors’ current business models cannot keep up with anything more than 20% of the integration demand, it’s clear that a new model is required to address what is an ever increasing long tail.

Obviously, this ever increasing demand for connectivity to an increasingly diverse universe of data sources can’t be addressed by conventional architectures and
business models. A new approach is required.

Alphabet Soup Doesn’t Solve the Problem

The alphabet soup of ETL, EAI, EII, approaches to data integration architectures evolved to solve a specific information management challenge, and just as quickly became rigidly defined based on the need to solve a specific type of data integration as a monolithic, proprietary and purpose-built integration solution.

However, the kinds of integrations that developers spend most of their time on today generally don’t fit within the confines of a conventional ETL, EAI, or EII solution.1 This is because they require integrating data with a wide variety of endpoints including files, spreadsheets, reports, public websites, cloud-based SaaS applications and services as well as web services, social media and the like.

The availability of easy-to-use dynamic languages like Perl, Python, PHP, and Ruby has exacerbated this bespoke approach. Unfortunately, this approach also results in code that is fragile, hard to maintain, not reusable, and not easily extensible as APIs and data requirements change. This is the classic iceberg scenario where the cost of development is just the visible tip of the iceberg -- the cost of maintenance is the menacing amount of ice below the water and is often overlooked.

alt

1 ETL: Extract, Transform, and Load. EAI: Enterprise Application Integration.EII: Enterprise Information Integration.

As a rapidly increasing portion of important business data either originates or migrates outside an organization, a greater portion of developers’ time is spent on
these integrations. And, as the number of data feeds and the volume of web based application data grows exponentially, the challenge of integrating, filtering, modifying and analyzing this data multiplies exponentially as well.

Unfortunately, hand-coded integration solutions have high hidden costs due to their single-use focus and fragility as well as their lack of documentation and metadata support. The irony of the easy-to-use programming methodologies is that they – like the ice below the waterline -- mask the growth of the underlying problem of integrating an exponentially growing and diverse collection of data sources. Clearly, this situation is unsustainable: The problem grows exponentially as
requirements accelerate, while the solutions scale linearly, and IT staff and budget continue to shrink. To solve the integration problem in an era of modern IT
architectures, a fundamentally new approach is needed.

That approach must address several key issues:

  • Simple, consistent and reusable integration modules that can be written once
    and reused in infinite combinations to satisfy the exponential growth of the
    combinations of data sources that they address.
  • A simple, powerful and infinitely extensible server framework that can be
    distributed across multiple instances and yet supports, manages and
    configures the needed combinations of integration modules to allow for both
    infinite connectivity and infinite combinations of data operations between
    endpoints while not imposing the overhead of any unneeded functionality.
  • A simple internal architecture for both server and modules that maximizes
    ease of use, ease of interconnection, robustness and security.
  • An open architecture that allows developers to create, share and sell
    reusable components, and a store and ecosystem that enables development
    and reuse of connectors to meet the exploding demand.

We have the solution. Contact one of our specialists today.

YOU ARE HERE: