The startup Heartex has raised $25 million

23 May, 2022

Heartex, a startup that bills itself as an “open source” platform for data labeling. The service creates markup tools that machine-learning algorithms use to learn how to make decisions where humans used to make them. 

This May, startup landed $25 million in a Series A funding round led by Redpoint Ventures. Unusual Ventures, Bow Capital and Swift Ventures also participated, bringing Heartex’s total capital raised to $30 million. The new money will be put toward improving Heartex’s product and expanding the size of the company’s workforce from 28 people to 68 by the end of the year.

The company is a data partitioning system for machine learning. Its customers are companies that have accumulated large amounts of data. For example, insurance companies that have masses of photos of insurance claims: they use Heartex solutions to put this data into a form that is more comprehensible to computers. 

The start-up was launched in 2019 by Mikhail Malyuk, Maxim Tkachenko and Nikolai Lyubimov. Heartex currently has an office in San Francisco, California, but several of the company's engineers are based in Georgia.

“Coming from engineering and machine learning backgrounds, [Heartex’s founding team] knew what value machine learning and AI can bring to the organization", said co-founder and CEO Mikhail Malyuk. He argues that more and more companies are looking to incorporate artificial intelligence into their operations to process large amounts of data. However, unsystematised and unlabelled photos, videos or other files have no practical value. 

So, the marking process produces a qualitative transformation – raw data is supplemented by metadata and turned into information.

"Data labeling is receiving increased attention from companies pursuing AI, it’s because labeling is a core part of the AI development process. Many AI systems “learn” to make sense of images, videos, text and audio from examples that have been labeled by teams of human annotators," explains Malyuk.

The problem is that not all labels are the same. Labelling data such as legal contracts, medical images and scientific literature requires specialist knowledge that not every annotator has. And – being human – annotators make mistakes. In an analysis of popular AI datasets conducted by the Massachusetts Institute of Technology, researchers found mislabelling. For example, one breed of dog was confused with another, and the high note of Ariana Grande was classified as a whistle.

Malyuk makes no claim that Heartex completely solves these issues. But the platform aims to work through as many AI scenarios as possible, and to make data quality management, reporting and analytics more transparent.  For example, data engineers using Heartex can see the names and email addresses of data annotators and reviewers, which are tied to the tags they have contributed or verified. This helps to monitor the quality of tags and fix problems before they affect training data.

Note that the startup was one of the winners of the international IT accelerator Sber500 in 2019. Heartex's pitch presentation can be viewed at the link.