How To Avoid Data Lake Crocodiles

Latest Comments

No comments to show.

Perforce Software, a company that develops version control and collaboration software, has turned its attention to working with data lakes.

Data lakes are massive, by definition. They work to house the morass of unstructured and semi-structured data that is generally unfiltered, often duplicated, typically unparsed and low-level (i.e. log files, system status readings, website clickstream data) and increasingly machine-generated by sensors in the Internet of Things, or by AI agents that now start to pour their output into the data lake as well.

On balance, data lakes are regarded as a good thing. They allow organizations to make sure they are capturing all the data that they might channel through every operational pipe of their IT stack. Having access to as-yet-untapped data stores when needed is a comfortable position for the chief data scientist in any business. Viewed as a key move for firms to future-proof their data strategy, a data lake also represents a democratization of data i.e. it’s a really deep pool and – as long as you wear a life jacket (adhere to security and compliance guidelines) anyone including business users can potentially take a dip at any time.