System and means generates synthetic forms of
social media data such as data from
microblogging services (e.g., Twitter) and social networking services (e.g., Facebook). This
system and means jointly generate interaction graph structures and text features similar to input
social media data. First, an interaction graph is generated by mapping
social network interactions in input (real)
social media data to graph structures. This interaction graph is fitted to a
social network model (or a composite model) by minimizing the distance between the input and the synthetic interaction graphs (of potentially different sizes). The distance is measured statistically or based on the performance of
social media analytics. Various patterns (such as anomalies), interaction types and temporal dynamics are generated synthetically. Second, text features are extracted from input social media data with topic modeling and
statistical analysis of word tuple distributions. Based on these features, synthetic social media text is generated. Third, synthetic graph structures and text features are combined to generate the synthetic social media data. The
system is particularly useful in generating data to be used for developing and testing new
social media analytics or for generating or analyzing social bot
network behavior and campaigns in social media, and for sharing
test data with others without rate and privacy concerns.