Skip to main content

Veracity of Big Data and fake news


    Today, society is immersed in a hyper-connected era in which the development of technologies and globalization in which data is produced at every moment. Big Data is constructed as a massive chance for the marketplace and corporations to advance their plans and decision-making.

    Darmont and Loudcher(2019, p.2) establish that a big data system can be attributed to four dimensions: volume, velocity, variety, and veracity, which form the 4Vs properties of Big data. 


Typing Fake news on a computer



What does veracity in Big Data mean?


    When it comes to veracity, it refers to the “biases, noise and abnormality in data” (Inside Big Data, 2013), that is, the degree of reliability of the information received.


    This feature of Big Data is probably the one that poses the most significant challenge. The enormous capacity of data that is produced can make us suspect the range of veracity of all of them because the countless variety of data produces many of them to arrive incorrectly, consequently depending on its origin, “data processing technologies, and methodologies used for data collection and scientific discoveries, big data can have more/fewer biases” (Lukoianova and Rubin 2014, p.7).


Veracity and Social Networks


    Facebook is working with artificial intelligence tools that rely less on human-generated input, which can be influenced by opinion or lack of data. Big Data's veracity is being filtered to avoid misinformation, fake news, and hate speech. For that, the company has deployed SimSearchNet++, which is “an improved image matching model that is trained using self-supervised learning to match variations of an image with a very high degree of precision and improved recall.” (Facebook, 2020).

This fact-checking also works with predictions, that according to Assiri (20202 p.11), they have been used to forecast the veracity of social media information since it is one of the primary accessible sources of big data.


According to Facebook, this strategy is 95% effective, as someone who is warned that content contains erroneous information will usually decide not to see it. Nevertheless, producing those labels is proving to be a real challenge. Not only because of the large amount of unverified data that exists on the platform but also because of the content with hate speech in images and videos.


SimSearchNet engineers state that the tool is “currently inspecting every image uploaded to Instagram and Facebook — billions a day” (Tech crunch 2020). Then, a check is performed against specific databases for the task. In other words, this represents billions of images verified per day.



The difficulty is that the software used for detection must be "trained" to find duplicates or slightly modified versions of that content, which involves a great deal of time and resources that could delay these publications' correct labeling. However, Facebook claims to have already found the answer to this problem.


Not all companies will opt for the same methodology regarding developing and creating their capabilities with Big Data technologies. However, in all sectors, there is the possibility of using these new technologies and analytics to improve decision-making and performance, both internally and in the market.


Have you had a terrible experience believing fake news? What was that?


Written by: Carlos Sáez Muñoz



Assiri F. (2020) “Methods for assessing, predicting, and improving data veracity: A survey” [Online] Available athttps://revistas.usal.es/index.php/2255-2863/article/download/ADCAIJ202094530/25035/ (Accessed 23 February 2021).


Darmont J. and Loudcher S. (2019) “Utilizing Big Data paradigms for Business Intelligence.” IGI Global, USA.


Facebook (2020) “Here is how we are using AI to help detect misinformation” [Online] Available at https://ai.facebook.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/ (Accessed 23 February 2021).


Inside Big Data (2013) “Beyond Volume, Variety, and Velocity is the issue of Big data veracity” [Online] Available athttps://insidebigdata.com/2013/09/12/beyond-volume-variety-velocity-issue-big-data-veracity/ (Accessed 23 February 2021)


Lukoianova T., Rubin V. (2014) “Veracity roadmap: Is big data objective, truthful and credible?. Advances in classification research online.” [Online] Available at https://ir.lib.uwo.ca/cgi/viewcontent.cgi?article=1267&context=fimspub (Accessed 23 February 2021)


Tech crunch (2020) “Facebook upgrades its AI to tackle COVID-19 misinformation better and hate speech” [Online] Available at https://techcrunch.com/2020/05/12/facebook-upgrades-its-ai-to-better-tackle-covid-19-misinformation-and-hate-speech/ (Accessed 23 February 2021)

Comments

  1. Prajakta Jadhav10 April 2021 at 04:57

    Social media today has a very large and strong impact on sharing news, especially fake news related to real world emergencies. Credible sources are limited and portrayed as unavailable most of the times. Many networks have tackled this problem, which are called fact checking, unlike Facebook, Twitter also carried out the process named Rumor Gauge.
    Rumor Gauge is a model developed for automated rumor verification. To predict veracity of rumors three aspects were examined, the aspects are as below
    1) Linguistic Style applied to express rumors
    2) People’s/ Individual characteristics involved in propagating the rumor/information
    3) Network used to propagate, and other dynamics involved

    Veracity of 75% of the rumors was predicted faster than any other public sources like journalists or law enforcement officials.

    Very Insightful Article! Great Work.

    ReplyDelete
  2. Since the internet has become such a significant source of knowledge for many people, it is clear that the need to verify the veracity of claims in order to, for example, detect the dissemination of false information. Since almost anyone individuals, businesses, organizations, and so on can write and publish something on the internet, the information is often incomplete, vague, contradictory, biased, or incorrect. Furthermore, due to the large quantities of heterogeneous information generated and the speed at which it is generated, manually assessing its veracity rapidly becomes impossible. A decision support system is just as effective as the data it uses to make decisions. When data is obtained from social media and other open sources, the issue of data veracity arises in particular. As a result, computerized, automated methods and tools capable of processing and analyzing large quantities of data are needed.

    Good content carlos.

    ReplyDelete

Post a Comment

Popular posts from this blog

Big Data - Business and Chatbots

  Use of Big Data in Business The presence of Big Data is omnipresent. This data generally overwhelms the business on the daily basis. Big represents massive data that arises from various mediums such as emails, survey reports, social media, conversations in chatbots. This data is available in Structured  Unstructured format.  Big Data analysis can really work in favour of the business for their strategic planning and revenue growths along with providing the consumers with customized experience. Hence, the business should emphasize on data optimization in comparison to the collection of data. While Big Data is immense and complex in nature, storage in traditional method is difficult. So, options that are value for money platforms like data lakes, Hadoop, Spark (that manages both physical and real time data) have helped in the process. Big Data relies on important characteristics such as:     Volume   Variety    Velocity Big Data analys...

Programmatic marketing an cost-effective solution

Programmatic marketing is one of the most significant innovations in digital marketing. This is a type of marketing that allows you to target specific customers when they are on a specific page.   Programmatic marketing is an advanced marketing strategy that employs an automated, real-time bidding process to buy ad inventory on your behalf, allowing you to target specific users in specific contexts, resulting in hyper-targeted, highly effective advertising. How Programmatic Marketing works. This may appear to be a difficult process, but it is a simple one that takes milliseconds to complete. The steps that explain how programmatic marketing works are summarized below. Someone selects a page by clicking on it. The owner of the page holds an auction for an open advertising spot. The marketplace holds an auction between the advertisers bidding for the spot.   The advertiser who places the highest bid for the spot will have their advertisements  displayed. The winning a...

Value of Data Governance in technology industry

  Data Management (DG) is the process of managing the availability, usability, integrity and security of data in business processes, in accordance with data standards and internal policies governing data usage. Effective data management ensures that the data is consistent and reliable and does not misuse it. It is becoming increasingly critical as organizations are subject to new privacy laws and rely heavily on data analytics to help make operations more efficient and business decisions made. A well-designed data management system usually consists of a management team, a steering committee that serves as the governing body, and a data management team. They work collaboratively to create data management standards and policies, as well as implementation and enforcement processes primarily developed by data managers. Managers and other representatives from the organisation's business operations are involved, in addition to IT and data management teams. Data management objectives an...