Traditionally the data could be collected only from databases and spreadsheets, today data is gathered from various sources like images, video, emails, PDF’s audio, Social media post and much more. The sourced big data generally is diverse and does not dive into neat and relational structures.
Examples of Big Data Variety
- Format – Format refers to different types of database and files in which the data is stored, this equals to any business storing data in dozens of database formats.
- Media – Media files such as photos, audio, videos exhibit extreme data variety
- Structured – Structure refers to variety of data models and data types, instances business having hundreds of database each with its unique own data model
- Unstructured – Unstructured data describes the data that is raw, and not ready or structured to be read by machine
- Natural Language Processing – NLP is complex in nature, in the case of French language that exhibits a great deal of variety as there are many ways to communicate the same thing. The complexity triggers when the machine might review a sarcastic statement as positive while the right reaction would be negative.
Challenges that Business/ Companies face while handling the Variety data
- Confusing Variety data and Big Data technologies
- Complexities of managing variety data
- Complex process of converting big data into valuable insights for business
- Insufficient understanding of Big data
- Big data security loopholes
Amidst of the challenges if efficient sources put together business can work with the variety big data with effective ease and convenience
While businesses are finding ways to work on their big data and analytics, there is a serious need to harness the variety of these data to maximize their return of analytics to leverage the benefits across the business areas.
Business can put the big data into the data lake repositories and then run analytics, later on by adding the query language like Hive and Pig can help the companies to sort through the big data. Also, by adding the right business context in order to ask right analytics questions can benefit the business outcomes. The Business could use their system records and inherent the big data that could be the drivers for the big data analytics.
Curation is one way to tackle the attack of variety data that has to navigate through multiple system and records. Machine learning and algorithms can help in the data quality in the task of cross-referencing and connecting data from a variety of sources into a single source. The ultimate result would not be a system record but a system reference that can easily cope with the data variety that is a large concern to many businesses.
In order to achieve the high quality of data for the business while going through a variety of data is important but what is also important that business work towards steps like (extract, transform and load) ETL and (master data management) MDM.
If the business context put correctly in the unstructured and semi structured data, the business can effectively utilize the variety of data to get favourable growth and measure the same through analytics.
Do you think data variety can be effectively handled if YES why? And if NO why?
Written by Prajakta Jadhav
Keywords – Big Data, Data Variety, Challenges, Business, Structured, Unstructured data
_________________________________________________________________________________
References
6 Examples of Data Variety - Simplicable (no date). Available at: https://simplicable.com/new/data-variety (Accessed: 17 March 2021).
7 Major Big Data Challenges and Ways to Solve Them (no date). Available at: https://www.scnsoft.com/blog/big-data-challenges-and-their-solutions (Accessed: 17 March 2021).
Tech Republic: How to cope with the big data variety problem (2014) Tamr Inc. Available at: https://www.tamr.com/blog/tech-republic-big-data-variety-problem/ (Accessed: 17 March 2021).
Thank you for the useful blog post.
ReplyDeleteVariety is quickly becoming a third "V-factor" in big data, alongside volume and velocity. The issue is particularly prevalent in large organizations, which have numerous record-keeping systems and a large amount of structured and unstructured data to handle. Because of the functional duplicity, these companies often have numerous buying, production, sales, finance, and other departmental roles in separate branches and branch facilities, resulting in "siloed" structures. As a result, as they work on their big data and analytics initiatives, businesses are discovering that they need to use a variety of data and system sources to get the most out of their analytics and to spread the benefits of what they learn across as many areas of the company as possible.A good example is decentralized purchasing functions, which have their own purchasing systems and data repositories.
ReplyDeleteUseful blog to read.
This comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteIt is likely that on more than one time, we have heard that Big Data is nothing more than BI but in an extensive format. However, more data does not essentially mean that it is Big Data. It needs a particular quantity of data, but owning a substantial amount of data does not certainly mean working on Big Data.
ReplyDeleteIt might be an error to consider that most parts of Big Data are BI. Big Data is not restricted or well-defined by the purposes that are sought with this initiative. Yet, it will be due to the features of the data itself.
Today, we can base our judgments on narrow data got through Big Data. Thanks to this technology, each action of competitors, consumers , suppliers, and so on will make prescriptive information that will vary from structured and easy-to-manage data to unstructured information that is hard to practice for decision-making. Thanks for the info.
Variety is one the most interesting developments in technology as more and more information is digitized. Traditional data types (structured data) include things on a bank statement like date, amount, and time. These are things that fit neatly in a relational database.
ReplyDeleteStructured data is augmented by unstructured data, which is where things like Twitter feeds, audio files, MRI images, web pages, web logs are put — anything that can be captured and stored but doesn’t have a meta model (a set of rules to frame a concept or idea — it defines a class of information and how to express it) that neatly defines it.
Unstructured data is a fundamental concept in big data. The best way to understand unstructured data is by comparing it to structured data. Think of structured data as data that is well defined in a set of rules. For example, money will always be numbers and have at least two decimal points; names are expressed as text; and dates follow a specific pattern.
Comment by
Amarbant Singh
insightful information and very well explained challenges of big data variety.
ReplyDelete