MMM and MMMSynth: Clustering of heterogeneous tabular data, and synthetic data generation
by Chandrani Kumari, Rahul Siddharthan
We provide new algorithms for two tasks relating to heterogeneous tabular datasets: clustering, and synthetic data generation. Tabular datasets typically consist of heterogeneous data types (numerical, ordinal, categorical) in columns, but may also have hidden cluster structure in their rows: for example, they may be drawn from heterogeneous (geographical, socioeconomic, methodological) sources, such that the outcome variable they describe (such as the... Читать дальше...