We call it the “Last Mile Problem” and we see it a lot. Financial institutions consume a lot of market data and they tell us it is hard to manage and needs a lot of work. Meanwhile, we also speak with market data vendors and they often tell us that onboarding new feeds into clients or upselling more data to existing clients is slow and costly. This is the same problem seen from both sides.
Most financial institutions make significant investments in best-of-breed applications. All these applications need operational and market data to run, but none of them use it in exactly the same way, or with the same requirements.
Application vendors deploy data interfaces and other APIs to allow data flows to be wired up from sources to destinations. All these interfaces are different. They have different physical structures and embed different semantics. The consultants who implement these systems are not bad people. They do a difficult job, but often what the customer is left with is the equivalent of this:
The root cause of this unhappy and potentially dangerous mess is an impedance mismatch between producer and consumer.
Market data vendors are huge data generators, built like factories. They are utilities. Their purpose is to transport what they produce to a wide audience as efficiently as they can. They are not trying to directly deliver exactly what each of your applications needs. Therefore, plugging your applications directly into them is not sensible. There is a long network in between what they produce and how it gets consumed.
For many years the green box above (a step-down transformer) was, to extend this data management analogy, an FTP server. Nowadays there are some more options, but the concept remains. This step is the limit of the producer’s responsibility. From this point on, the consumers are on their own. The data vendor typically dumps a batch of files here and the customer takes it from there.
Application vendors have tried to solve this problem by selling market data interfaces, specialized to particular vendor feeds. To consolidate this mushrooming situation, in the 1990’s enterprise data management systems came to market to enable customers to load multiple feeds into a monolithic data warehouse and have discrepancies removed. This store, often called a “single version of the truth” would then be wired up to all the applications. An obvious advantage is they provided a central place to make fixes to data that could then be replicated, rather than repeating this work in each application. A neat idea, but in practice, one that has completely failed.
This analogy is carefully chosen. In the real world, data is indeed very much like electricity. It always takes the path of least resistance. Individual departments, incompletely served by the central data warehouse, often wire up additional data sources to their applications. This may be frowned upon, but often is the only way a business can operate or innovate. The task of creating a single data store to support a whole financial institution, across all its departments is a gargantuan task. We have seen many attempted and concluded it is probably impossible – certainly with a sensible timeframe and budget. Usually the scope gets cut during implementation leaving gaps in the content, or the solution is so rigid that it impedes future innovation.
Instead of building another generator internally, we think a better, more flexible and more closely aligned solution is to think about data management as in essence a networking problem. A better solution than a gargantuan data warehouse is intelligent middleware that sits in between raw data and the consumers. Its primary purpose is to cut data down to size and route it to where it needs to go and to do error detection while in transit. We believe that the first-class citizen in a good data infrastructure is the consuming application, not the single, central data warehouse. Indeed, building such a thing can actually make the problem worse and given the challenges ahead, that’s the last thing we need.