What’s the data set from? It will at least be skewed towards people most posting to the social media networks scraped for training data. I wouldn’t be surprised if foreign language social media was substantially underrepresented in the data set because the programmers putting it together weren’t as familiar with it.
The actual average probably looks much more Asian, given where the largest population centres are.
What’s the data set from? It will at least be skewed towards people most posting to the social media networks scraped for training data. I wouldn’t be surprised if foreign language social media was substantially underrepresented in the data set because the programmers putting it together weren’t as familiar with it.