Today, society is immersed in a hyper-connected era in which the development of technologies and globalization in which data is produced at every moment. Big Data is constructed as a massive chance for the marketplace and corporations to advance their plans and decision-making.
Darmont and Loudcher(2019, p.2) establish that a big data system can be attributed to four dimensions: volume, velocity, variety, and veracity, which form the 4Vs properties of Big data.
What does veracity in Big Data mean?
When it comes to veracity, it refers to the “biases, noise and abnormality in data” (Inside Big Data, 2013), that is, the degree of reliability of the information received.
This feature of Big Data is probably the one that poses the most significant challenge. The enormous capacity of data that is produced can make us suspect the range of veracity of all of them because the countless variety of data produces many of them to arrive incorrectly, consequently depending on its origin, “data processing technologies, and methodologies used for data collection and scientific discoveries, big data can have more/fewer biases” (Lukoianova and Rubin 2014, p.7).
Veracity and Social Networks
Facebook is working with artificial intelligence tools that rely less on human-generated input, which can be influenced by opinion or lack of data. Big Data's veracity is being filtered to avoid misinformation, fake news, and hate speech. For that, the company has deployed SimSearchNet++, which is “an improved image matching model that is trained using self-supervised learning to match variations of an image with a very high degree of precision and improved recall.” (Facebook, 2020).
This fact-checking also works with predictions, that according to Assiri (20202 p.11), they have been used to forecast the veracity of social media information since it is one of the primary accessible sources of big data.
According to Facebook, this strategy is 95% effective, as someone who is warned that content contains erroneous information will usually decide not to see it. Nevertheless, producing those labels is proving to be a real challenge. Not only because of the large amount of unverified data that exists on the platform but also because of the content with hate speech in images and videos.
SimSearchNet engineers state that the tool is “currently inspecting every image uploaded to Instagram and Facebook — billions a day” (Tech crunch 2020). Then, a check is performed against specific databases for the task. In other words, this represents billions of images verified per day.
The difficulty is that the software used for detection must be "trained" to find duplicates or slightly modified versions of that content, which involves a great deal of time and resources that could delay these publications' correct labeling. However, Facebook claims to have already found the answer to this problem.
Not all companies will opt for the same methodology regarding developing and creating their capabilities with Big Data technologies. However, in all sectors, there is the possibility of using these new technologies and analytics to improve decision-making and performance, both internally and in the market.
Have you had a terrible experience believing fake news? What was that?
Written by: Carlos Sáez Muñoz
Assiri F. (2020) “Methods for assessing, predicting, and improving data veracity: A survey” [Online] Available athttps://revistas.usal.es/index.php/2255-2863/article/download/ADCAIJ202094530/25035/ (Accessed 23 February 2021).
Darmont J. and Loudcher S. (2019) “Utilizing Big Data paradigms for Business Intelligence.” IGI Global, USA.
Facebook (2020) “Here is how we are using AI to help detect misinformation” [Online] Available at https://ai.facebook.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/ (Accessed 23 February 2021).
Inside Big Data (2013) “Beyond Volume, Variety, and Velocity is the issue of Big data veracity” [Online] Available athttps://insidebigdata.com/2013/09/12/beyond-volume-variety-velocity-issue-big-data-veracity/ (Accessed 23 February 2021)
Lukoianova T., Rubin V. (2014) “Veracity roadmap: Is big data objective, truthful and credible?. Advances in classification research online.” [Online] Available at https://ir.lib.uwo.ca/cgi/viewcontent.cgi?article=1267&context=fimspub (Accessed 23 February 2021)
Tech crunch (2020) “Facebook upgrades its AI to tackle COVID-19 misinformation better and hate speech” [Online] Available at https://techcrunch.com/2020/05/12/facebook-upgrades-its-ai-to-better-tackle-covid-19-misinformation-and-hate-speech/ (Accessed 23 February 2021)
Social media today has a very large and strong impact on sharing news, especially fake news related to real world emergencies. Credible sources are limited and portrayed as unavailable most of the times. Many networks have tackled this problem, which are called fact checking, unlike Facebook, Twitter also carried out the process named Rumor Gauge.
ReplyDeleteRumor Gauge is a model developed for automated rumor verification. To predict veracity of rumors three aspects were examined, the aspects are as below
1) Linguistic Style applied to express rumors
2) People’s/ Individual characteristics involved in propagating the rumor/information
3) Network used to propagate, and other dynamics involved
Veracity of 75% of the rumors was predicted faster than any other public sources like journalists or law enforcement officials.
Very Insightful Article! Great Work.
Since the internet has become such a significant source of knowledge for many people, it is clear that the need to verify the veracity of claims in order to, for example, detect the dissemination of false information. Since almost anyone individuals, businesses, organizations, and so on can write and publish something on the internet, the information is often incomplete, vague, contradictory, biased, or incorrect. Furthermore, due to the large quantities of heterogeneous information generated and the speed at which it is generated, manually assessing its veracity rapidly becomes impossible. A decision support system is just as effective as the data it uses to make decisions. When data is obtained from social media and other open sources, the issue of data veracity arises in particular. As a result, computerized, automated methods and tools capable of processing and analyzing large quantities of data are needed.
ReplyDeleteGood content carlos.