The contrarian bet: Sensor time series data quality/observability

Bert Baeck
4 min readFeb 16, 2023

After Software AG acquired my first startup in 2018, I joined the technology VC firm Smartfin Capital as one of the four partners. During this time, I had the opportunity to study the entire market, focusing on IoT, data, AI/ML, and industrial software. Having worked for over a decade in analytics, I noticed a trend within the data ecosystem. The market had moved directly from data collection to AI/analytics without ensuring that the data was fit for purpose and reliable end to end. We were part of that movement; however, thought leaders in the industry began to realize that better data was necessary as a prerequisite for taking the next step in analytics maturity rather than just focusing on better models.

By studying the market, I observed that an old category, data quality, suddenly gained traction due to the ‘great unlock’; With companies able to store and compute data at scale, data-driven applications became mission-critical, and data reliability was paramount. The quadrant of data quality 2.0 players began to take shape in 2019, breaking down into data observability and pure data quality. All those players were mainly focused on relational data, working with data teams and focusing on use cases such as data arriving late in a dashboard and schema changes over the last 24 hours. Use cases that do not hold in industrial operations or IoT…

My contrarian thinking (the strategy of going against the prevailing consensus and taking positions that are opposite to what most people in the market are doing) emerged when I questioned why nobody focused on time series/sensor data.

Working with sensor time series data is different from working with relational data. There is no database schema or data model, the role of causality is essential, connectivity is different, no baseline is available, local anomalies are possible, and cleaning/preparation and triage workflows are different. Additionally, not all issues arise from data quality problems, as some can be related to second-order sensor or asset problems. This requires deep domain expertise to distinguish.

After observing the trend that time series databases were the fastest growing segment for several years, and considering the advent of 5G and the increasing number of cheap sensors in the field, I recognized the potential to focus entirely on sensor time series data.

On a technical level, 90% of all data quality metrics in existing platforms were based on freshness, validity, and accuracy, with only 10% on distribution. However, working with time series data is the opposite, with 90% related to distribution changes. Realizing the potential of focusing on time-series/sensor data, I abandoned my promise never to start a company again and reassembled a part of my core dream team to pursue this mission… We each had more than a decade of experience with time series data.

In what way is this journey different from other data quality/observability players?

  1. Working with sensor time series data presents a unique set of challenges that require a different approach compared to other data types. In particular, issues with sensor time series data can be linked to data quality issues or sensor issues in the first order and asset or process issues in the second order.
  2. Moreover, the verification and scanning of sensor data need to happen at the source, which is not necessarily in data warehouses. Instead, sensor data is often distributed across the edge, fog (OT systems), and eventually the cloud. As a result, any data downtime can have a significant impact on business operations, ranging from hazardous situations to abnormal situations and operational downtime.

These challenges make it critical for companies to have reliable, high-quality sensor time series data, and our company is uniquely positioned to provide solutions that address these specific needs. By focusing solely on sensor time series data quality and observability, we can provide targeted solutions that help companies optimize their operations and avoid the negative consequences of data downtime.

At Timeseer.AI, we predict that by 2030 all consumed device and sensor data globally will be verified and monitored for reliability to ensure data is fit for purpose and treated as a strategic asset. For that, we double down on our mission to elevate 📈 our clients’ sensor data excellence to drive their digital maturity ⚡. 20% of the fortune 5000 companies have sensors in the heart of their operations. They are the ones that need to work with us for sensor time series data and with the horizontal data quality data observability players on an IT level.

We are confident that our company will have a significant impact on industrial and IoT data operations, as we can provide solutions that can be linked with short-term, medium-term, and long-term strategic objectives.

Source: Gartner

--

--

Bert Baeck

Serial Entrepreneur & VC. Knowledge domains: AI, ML, Data Quality, Low Code AI, Data Engineering, Big Data and IoT.