Visualization techniques to drive scientific discovery, customer analytics, etc.
Researchers from Skoltech and AIRI, the Artificial Intelligence Research Institute, have devised a visualization technique that makes it possible for humans to access highly complex biomedical, financial and other datasets without without losing their multidimensional structure. Retaining this so-called data topology is crucial for drawing conclusions about cancer genes, consumer behavior, etc. However, existing methods are not good at that. This research will be presented as a conference paper at ICLR 2023and paper available on arXiv print server available.
Business analysts and scientists often have to understand datasets where every item is characterized by many so-called dimensions. For example, a bank might rate each of its customers on a range of behavioral indicators. Biologists consider cells differently in terms of how active each of a large number of genes is in them. Weather data is also of that nature, due to the number of parameters reported for all times at all locations.
However, people are not used to thinking in multiple dimensions, and without reducing the data set to a neat two- or three-dimensional representation, it can be difficult to make meaningful hypotheses and recognize important patterns.
“Visualization makes data visual, but it doesn’t necessarily reveal its ‘shape.’ A dataset can have a large-scale structure—complete with clusters, gaps, rings. iterations, etc—and we want all of that to be in a reduced dimensional representation as well. spot detection, market researchers need it to identify. consumer group, climate scientists need it to tell where a certain process starts and ends. Unlike other techniques, our technique achieves dimensionality reduction without affecting the global data structure,” said co-author Daniil Cherniavskii.
There are several approaches to data size reduction, some using so-called autoencoders. that is neural network produce lower dimensional representations of the data. “The problem is that most of the techniques used, including those involving autoencoders, work locally. They are concerned with the position of a data point relative to its neighbors. , but the large-scale structure is lost,” says Cherniavskii.
“What we’ve done is we’ve added to the autoencoder a new additional loss function. Its sole purpose is to minimize the topological difference between the original data set and its less dimensional representation. When zero loss, the ‘shape’ of the visual image is guaranteed to match the original.”
The team tested how well the dataset topology is preserved using multiple metrics that capture the relative position retention of data points in general—not just those in the immediate vicinity—retained. Testing, including datasets of different natures, confirmed that the team’s solution outperformed all of the most popular methods for dimensionality reduction (see image above).
“Topical data analysis is becoming an increasingly popular tool for investigating the properties of multidimensional data. We expect that the method we have developed and other similar methods. will become the norm in the nearest future,” said study co-author Professor Evgeny Burnaev of Skoltech Applied AI and AIRI.
Ilya Trofimov et al, Topology Learning–Preserving Data Representation, arXiv (2023). DOI: 10.48550/arxiv.2302.00136
Skolkovo Institute of Science and Technology
quote: Visualization techniques to drive scientific discovery, customer analytics, etc (2023, March 23) retrieved March 23, 2023 from https://techxplore.com/news/2023- 03-visualization-technique-scientific-Discoveries-customer.html
This document is the subject for the collection of authors. Other than any fair dealing for private learning or research purposes, no part may be reproduced without written permission. The content provided is for informational purposes only.