Integrating Data on a Shared Hadoop Platform for the Financial Sector

Posted by Michael Shaw


Financial services firms are notorious record-keepers. They have to be—the financial/banking industry is among the most regulated in the nation. Despite the fastidious data-gathering, banks are like most organizations in that their information architecture evolved organically over time and consists of a patchwork of systems and solutions. This approach may have worked for a while, but the piecemeal setup now prevents timely insights, thwarts accurate reporting and hinders financial crime prevention efforts.

Facing these problems, a major player in the financial services sector turned to Clarity Insights to revamp their data infrastructure. Our consultants used big data technologies to overhaul the company’s data and analytics capabilities and used an Agile approach with the goal to reshape the corporate culture to one driven by insights.

Key Technologies

Using a shared Hadoop data lake, we built a robust, bespoke solution that integrates data from a multitude of available sources and streamlines data integration and reporting. Spark and Impala act as data transformation engines. IBM Infosphere CDC captures changed data, and Sqoop extracts historical data from relational sources. Flume scans the master data management system for message data, while Oracle Exadata provides support for the EDW, and Atlassian tools such as Git, Bamboo and Bitbucket Server enable continual building, testing and deployment.

In addition, we developed several key analytical models, including a global and domestic transactions model, a positions and valuations one, and a payments model. The transactions and positions and valuations models support customer reporting while the payments model ensures that data from the Automated Clearing House, tellers, ATMs, lockboxes and the Federal Reserve Paper Check Clearing Services is all streamlined into a common story arc.

Change From Within

Ultimately, a successful data implementation relies on people and corporate culture as much as technology. So we used a Spotify Agile methodology, where teams are divided into squads and chapters. Squads provide line-of-business services, while chapters address cross-sectional capabilities like development and testing.

Our comprehensive solution has empowered the client to accomplish their business-critical goals and rethink their approach to governance, data processing, and analytics. The result will enhance regulatory compliance, tighten data security and provide near real-time insights. Additionally, the company will improve their fraud and financial crimes detection (including their anti-money laundering efforts), accelerate reporting and significantly reduce server costs.

Interested in doing something similar in your organization? Contact us to get started!


Written by Michael Shaw

Managing Director of Data Engineering, Mike is a solution architect, developer, engagement manager, and thought leader in distributed technologies for big data analytics and cloud computing.

Topics: Data Lake, Financial Services

Subscribe to our Blog

Start a conversation

Read Recent Posts: