The field of analytics is inherently dependent upon statistical patterns and mathematically governed models. Numbers can play a vital role in effective representation of success or failure; similarly they can unintentionally – or sometimes intentionally – mislead the observer and create false impression. Take the commonly used expression “I am 99% sure that …”, now surety isn’t something which can be measured in percentage. It is merely a true or false situation hence one can be either sure or not sure. However, by including percentage word with a high accuracy value, one can efficaciously convince the audience. Psychological Wallop So why do people do it? That’s a question which correlates human psychology and empowerment of numerical values. People intend to compare things for quality, efficiency and value. If someone likes a cotton shirt then s/he would prefer to find a 100% cotton shirt, relatively expensive than the one with 95% cotton in it. The sense of purity and high quality is achieved easily when there is an obvious difference in numbers representing these measures. Similarly in text analytics, accuracy is used to influence someone’s mind with higher values. In reality, accuracy or precision is dependent on nature of data for which it is reported and relevancy of results as per requirement of the end user of that system. A highly accurate system is good for nothing if it cannot produce outcome as desired by its user. Is Accuracy a Measure of Dominance? Suppose system A is 80% accurate, while system B is 87% accurate, does the difference in both numbers justify the supremacy of system B over system A? There might be other factors to ponder like how much similarity was between the inputs on which both systems were tested? And if the expected outcome of both systems is comparable with each other or not? Data from different domains can have different accuracy levels which cannot be compared to each other. Response versus Reality Let’s take an example of data cleaning that is an integral part of any text analytics project. We all are aware of the fact that in order to improve the quality of any product, raw input should be as much pure as possible. Data cleaning module does that job of purification for text analyzers. If we claim that our system does an outstanding job in this area and cleans data with an accuracy of 97%. The straightaway response would be a WOW! Ninety-seven percent is sky high accuracy, claiming that output is beau idéal and majority of noise has been either cleaned or transformed into meaningful content. The reality may differ from this depiction. There can be issues still present in the output e.g. format of cleaned text. Date normalization is an example, may be the format used is not acceptable for the user or some abbreviations (like Jan for January) were considered correct but are not satisfactory. So in a nut shell, the accuracy number tells the story of sample data and its manual verification, but in case of text this is not as simple as it seems. The Pursuit of Perfection Then what is the ideal system and where to find it? The answers are none and nowhere. “Perfection is not attainable, but if we chase perfection we can catch excellence.” Said by Vince Lombardi, head coach of the Green Bay Packers. Alright, we agree that perfection is impossible then next question is which system is better if accuracy can be a misleading measure? That question might be answered with a few words like adaptable, tunable and equipped with more options. In my opinion, if system A is x% less accurate than system B, but it is flexible enough to provide an interface for fine tuning of output with contextual values then system A is the winner. Bottom-line Accuracy figures are flashy, bold and eye catching which create a spark in readers mind but in fact they are part of a number game and it is not necessary that only the No.1 is meant to survive. Summary Nothing is accurate without perspective in text analytics; almost any statement can be accessed as positive or negative in its own context. The word “Cheap” is considered positive when it is mentioned with price, but when it is used to describe quality then it surely is a negative word. It is brilliance of the human brain which enables us to identify the effectiveness and potency in contextual matters. A system can be preferred over other one due to its adaptability and facilitation to engulf context.