Detecting Twitter users’ gender, en français
Data miners have been hard at work trying to figure out the attributes of Twitter users - such as gender and age - that aren't explicitly revealed on Twitter feeds. That information could be hugely valuable to marketers, enabling them to target messages to their desired audience. Nearly all the research done so far, however, has focused on English users and content. Now, a McGill University research team has conducted one of the first studies designed to figure out the gender of Twitter users who primarily use languages other than English. Among the key findings: by using a special detector based on French-language syntax, the researchers showed that it is very easy to classify gender for Twitter users in French - and probably for other Romance languages. In particular, the researchers developed an algorithm to look for masculine or feminine adjectives or past participles following the phrase "Je suis" (or variants such as "je ne suis pas"). Based on this construction, the detector was able to determine the gender of users with 90% accuracy - significantly higher than the accuracy rates of 80% to 85% achieved by various algorithms that have been developed to analyze English-language content.
Advert