Tonality

What is tonality?

Tonality is the different moods that are expressed in a text. A text might be made up of overall positivity, a few negative statements with a skeptic utterance here and there. Tonality is the quantification of these moods.

What different tonalities are currently available?

Positivity, Negativity, Fear, Hate, Love, Skepticism, Violence, and Desire in all supported languages.

What languages do you support?

We currently support Albanian, Arabic, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Javanese, Korean, Latvian, Lithuanian, Malay, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Thai, Turkish, Urdu, and Vietnamese. We are constantly adding new languages to the system. If you have projects in an unsupported language, don’t hesitate to contact us. We might be able to cook something up quite fast.

Can I get a custom tonality measure to fit my data or task?

Because of the nature of our system, implementing custom tonality measures is relatively easy. Do you want to measure the attitude towards car brands with respect to rust? Or perhaps the general well-being of Russians? Please contact us directly and tell us what you are thinking about.

What do the tonality scores mean?

When you call /tonality, you will get back something similar to this:

    {
        "tone": "violence",
        "score": 1,
        "normalizedScore": 0.5
    },
    {
        "tone": "love",
        "score": 0,
        "normalizedScore": 0
    },
    {
        "tone": "skepticism",
        "score": 1,
        "normalizedScore": 0.5
    }

This will be the result for each text you send in for analysis. Score here is a measurement of the amount of the given tonality available in each text. There is no upper limit to this, and the longer text you send in, the higher the score has a chance to be. NormalizedScore is simply the ratio of the given tonality compared to all others; in the example, we have an overall tonality of 2 for the text, with both violence and skepticism scoring a 1 each. As a result we get the normalizedScore 0.5 for both violence and skepticism.

How are tonality scores calculated?

We use a combination of algorithms and sources of data to determine the tonality of a text. This includes information about the importance of words in the scope of total language use, information about which words are often used in a similar manner, which words are actually more meaningful in combination than by themselves and so on. We analyze the text you send in by looking for concepts that trigger one or several different tonalities. Thereafter we have several algorithms that look into what context the keywords occur in, to decide whether to increase the scoring or reduce it.

How do I best present the tonality scores in my integrated app or website?

It depends on your application. We have seen several different implementations that have all been successful in showcasing tonality in their specific cases. One way could be to use the normalizedScore as values in a pie chart, donut chart, or area chart, to showcase the spread of each tonality. If the absolute values of the tonalities are more important, such as in cases where neutrality should be easy to deduce, individual bar charts for each tonality might be a better case. Please experiment, and tell us about your progress!

Why not call tonality sentiment like everybody else?

Sentiment generally refers to the positive-negative spectrum of semantic analysis. While tonality does include this grading, we do so much more and we wanted to call it something that reflects that. Skepticism, love, violence, are all examples of moods we would miss if we only used the classic sentiment classification approach.

How well does your tonality algorithm perform compared to others?

Our scores are for quality assessment purposes continuously evaluated against research benchmarks such as the RepLab collection of tweets for corporate reputation management and the Stanford sentiment treebank collection of positive and negative sentences. We have not optimised for these collections specifically (although we did participate in the Replab experiments with our system), and our system at present (Spring 2015) performs on par with reported human assessment performance on these collections.

How about sarcasm and irony?

Sarcasm and irony are even difficult for humans to interpret, and even more so in textual conversations. We are actively researching the possibilities of automatically detecting sarcasm, and make sure to subscribe to our blog for news in this area.

Multi-document Summarization

What is stories?

Stories is our summarization algorithm. It looks through heaps of documents and finds similarities between them. Documents that are similar will end up in the same story. Each story is then summarized based on the content of the document members. The result is a list of the most significant stories of your document collection.

How does the summarization work?

We make pairwise comparisons of your texts looking for similarities using our semantic memories. This means that they don’t have to necessarily express everything in the exact same way, or be written in what is considered “correct” English – our technology works just as well for the written words of a seven year old as for a famous author.

What is the difference between stories and topics?

/stories brings you the basic functionality of multi-document summarization. Here you will get back the gathered text collections with a summary for each cluster.

/topics takes this to the next level, by allowing you to make a more fine-grained analysis. By providing keywords that are of particular interest, and words that the summarizer should ignore in its analysis (if you are Burger King providing documents about user experience, its own brand might not be that relevant for the clustering). At a first glance this might seem to be a rather small improvement, but this really opens up for a complete new type of analysis. We developed Gavagai Explorer on top of this functionality to showcase the possibilities.

I cannot get any topics from my data. What does this mean?

How large is your text collection? Usually this is a sign of too little data (you should have at least 20 or so documents in your collection), or a text collection that is too heterogenous. If you are unsure about what the problem is, please send us your question to our support and we will look into your problems more closely.

Other

What is a Gavagai?

If you travel to a foreign country and bump into a native, you might not share a similar language for communication. Despite this both of you make your best efforts to be understood. They might point their finger towards a rabbit shouting gavagai, that based on the situation and your intuition you would deduce to mean rabbit. But it might as well mean hairy animal, white fluffy thing, eatable thing, or really anything. But in the end, it doesn’t matter. After a while you will learn how to use it in more and more types of conversations, and while your internal understandings of the word might not match up exactly, you can both use the word in conversations. This was a thought experiment regarding the indeterminacy of translation first mentioned by Willard Quine, and our semantic memories pretty much works the same way.

I cannot access my account

Please contact our support and we will help you out as soon as possible.

How do I contact you about other questions?

Our support mail is suitable for technical inquiries. For any other type question, please use our contact form.