Human generated content is comprised of all the files and emailsthat we create every day including all the presentations, wordprocessing documents, spread sheets, audio files and otherdocuments our employers ask us to produce hour byhour. 

|

These are the files that take up the vast majority of digitalstorage space in most organizations. They are kept for significantamounts of time and they have huge amounts of metadata associatedwith them. Human generated content is huge, and its metadata iseven bigger. Metadata is the information about a file: who mighthave created it, what type of file it is, what folder it is storedin, who has been reading it and who has access to it. The contentand metadata together make up human generated bigdata. 

|

The problem is that most of us, meaning organizations andgovernments, are not yet equipped with the tools to exploit humangenerated big data. The conclusion of a recent survey of over 1,000Internet experts and other Internet users, published by the PewResearch Centre and the Imagining the Internet Center at ElonUniversity, is that the world may not be ready to properly handleand understand big data. These experts have come to the conclusion that thehuge quantities of data, which they term “digital exhaust,” whichwill be created by the year 2020 could very well enhanceproductivity, improve organizational transparency and expand thefrontier of the knowable future. However, they are concerned aboutwhose hands this information is in and whether government orcorporates will use this information wisely.

|

The survey found that “human and machine analysis of big datacould improve social, political and economic intelligence by 2020.The rise of what is known as Big Data will facilitate things likereal-time forecasting of events; the development of “inferentialsoftware” that assesses data patterns to project outcomes; and thecreation of algorithms for advanced correlations that enable newunderstanding of the world.” 

|

Of those surveyed, 39% of the Internet experts asked, agreedwith the counter-argument to big data's benefits, which positedthat “Human and machine analysis of Big Data will cause moreproblems than it solves by 2020. The existence of huge data setsfor analysis will engender false confidence in our predictivepowers and will lead many to make significant and hurtful mistakes.Moreover, analysis of Big Data will be misused by powerful peopleand institutions with selfish agendas who manipulate findings tomake the case for what they want.”

|

As one of the study's participants, entrepreneur Bryan Trogdon put it: “Big Data is the new oil,” observing that“the companies, governments, and organizations that are able tomine this resource will have an enormous advantage over those thatdon't. With speed, agility, and innovation determining the winnersand losers, Big Data allows us to move from a mindset of 'measuretwice, cut once' to one of 'place small bets fast.'”

|

Jeff Jarvis, professor and blogger, said: “Media and regulatorsare demonizing Big Data and its supposed threat to privacy. Suchmoral panics have occurred often thanks to changes in technology.But the moral of the story remains: there is value to be found inthis data, value in our newfound ability to share. Google'sfounders have urged government regulators not to require them toquickly delete searches because, in their patterns and anomalies,they have found the ability to track the outbreak of the flu beforehealth officials could and they believe that by similarly trackinga pandemic, millions of lives could be saved. Demonizing data, bigor small, is demonising knowledge, and that is never wise.”

|

Sean Mead, director of analytics at Mead, Mead & Clark,Interbrand, said: “Large, publicly available data sets, easiertools, wider distribution of analytics skills, and early stageartificial intelligence software will lead to a burst of economicactivity and increased productivity comparable to that of theInternet and PC revolutions of the mid to late 1990s. Socialmovements will arise to free up access to large data repositories,to restrict the development and use of AIs, and to 'liberate'AIs.”

|

These are very interesting arguments and they do begin to get tothe heart of the matter, which is that our data sets have grownbeyond our ability to analyze and process them withoutsophisticated automation. We simply have to rely on technology toanalyze and cope with this enormous wave of content andmetadata. 

|

Analysing human generated big data has enormous potential. Morethan potential, harnessing the power of metadata has becomeessential to manage and protect human generated content. Fileshares, emails, and intranets have made it so easy for end users tosave and share files that organizations now have more humangenerated content than they can sustainably manage and protectusing small data thinking. 

|

Many organizations face real problems because questions thatcould be answered 15 years ago on smaller, more static data sets,can no longer be answered. These questions include: where doescritical data reside, who accesses it, and who should have accessto it? As a consequence, IDC estimates that only half the data thatshould be protected is protected.

|

The problem is compounded with cloud-based file sharing, asthese services create yet another growing store of human generatedcontent requiring management and protection; one that lies outsidecorporate infrastructure with different controls and managementprocesses.

|

David Weinberger of Harvard University's Berkman Center said:“We are just beginning to understand the range of problems Big Datacan solve, even though it means acknowledging that we're lessunpredictable, free, madcap creatures than we'd like to think. Ifharnessing the power of human generated big data can make dataprotection and management less unpredictable, free, and madcap,organizations will be grateful. 

|

Rob Sobers is technical manager at Varonis.
Contact 877-292-8767 [email protected]

Complete your profile to continue reading and get FREE access to CUTimes.com, part of your ALM digital membership.

  • Critical CUTimes.com information including comprehensive product and service provider listings via the Marketplace Directory, CU Careers, resources from industry leaders, webcasts, and breaking news, analysis and more with our informative Newsletters.
  • Exclusive discounts on ALM and CU Times events.
  • Access to other award-winning ALM websites including Law.com and GlobeSt.com.
NOT FOR REPRINT

© 2024 ALM Global, LLC, All Rights Reserved. Request academic re-use from www.copyright.com. All other uses, submit a request to [email protected]. For more information visit Asset & Logo Licensing.