It is argued that: (1) Big Data and new data analytics are disruptive innovations which are reconfiguring in manyinstances how research is conducted; and (2) there is an urgent need for wider critical reflection within the academy onthe epistemological implications of the unfolding data revolution, a task that has barely begun to be tackled despite therapid changes in research practices presently taking place. After critically reviewing emerging epistemological positions, it is contended that a potentially fruitful approach would be the development of a situated, reflexive and contextually nuanced epistemology.
Has the following qualities:
Following Kuhn's lead, Kitchen suggests that we may be in a new paradigm of research, brought on by the possibilities of Big Data. The development of four major paradigms are described in the below table, created from the text.
|First||Experimental science||Empiricism; describing natural phenomena||Pre-renaissance|
|Second||Theoretical science||Modeling and generalization||pre-computers|
|Third||Computational science||Simulation of complex phenomena||pre-Big Data|
|Fourth||Exploratory science||Data-intensive; statistical exploration and data mining||Now|
Taken from this text, which compiled it from Hey et al. (2009)
Some have argued that the "end of theory" is here, brought on by Big Data and computers. Here are the central arguments:
Lots of these idea have originated in business / marketing circles where understanding the world is not necessarily important.
Kitchin points out that,
Whilst this empiricist epistemology is attractive, it is based on fallacious thinking with respect to the four ideas that underpin its formulation.
Big Data may seek to be exhaustive, however, this all-seeing-eye perspective is limited by regulations and logistical realities. Thus, Big Data is still a sample that may not be representative and is subject to sampling bias like any other data.
Big Data does not "arise from nowhere, freefrom the ‘the regulating force of philosophy’ (Berry,2011: 8)"
"Data are not generated free from theory, neither can they simply speak for themselves free of human bias or framing"
The idea that data can speak for themselves suggests that anyone with a reasonable understanding of statistics should be able to interpret them without context or domain-specific knowledge.
Domain-specific knowledge will always be valuable
Kitchin rips into computer and data scientists, as well as physicists, who (specifically, in the study of cities) he claims
… willfully ignore a couple of centuries of social science scholarship, including nearly a century of quantitative analysis and model building. The result is an analysis of cities that is reductionist, functionalist and ignores the effects of culture, politics, policy, governance and capital (reproducing the same kinds of limitations generated by the quantitative/positivist social sciences in the mid-20th century).
The central point being made here is that the more data-minded folks are going to likely make fools of themselves if they ignore literature that already exists.
Basically, Kitchin suggests that we need to find a middle ground between the classical science and the hardcore Big Data folks. Clearly there are benefits to employing scientific methods and approaches, however, we should not limit ourselves to only these methods. Thus, using Big Data in a scientific, abductive manner is likely they way forward.
Abduction is a mode of logical inference and reasoning forwarded by C. S. Peirce (1839–1914)(Miller, 2010). It seeks a conclusion that makes reason-able and logical sense, but is not definitive in its claim.For example, there is no attempt to deduce what is th ebest way to generate data, but rather to identify an approach that makes logical sense given what is already known about such data production.
Just use your noggin to do stuff. 😏
Moreover, the advocates of data-driven science argue that it is much more suited to exploring, extracting value and making sense of massive, interconnected data sets, fostering interdisciplinary research that conjoins domain expertise (as it is less limited by the starting theoretical frame), and that it will lead to more holistic and extensive models and theories of entire complex systems rather than elements of them (Kelling et al., 2009).
Nonetheless, as Kitchin (2013) and Ruppert (2013) argue, Big Data presents a number of opportunities for social scientists and humanities scholars, not least of which are massive quantities of very rich social, cultural, economic, political and historical data. It also poses a number of challenges, including a skills deficit for analyzing and making sense of such data, and the creation of an epistemological approach that enables post-positivist forms of computational social science. One potential path forward is an epistemology that draws inspiration from critical GIS and radical statistics in which quantitative methods and models are employed within a frame work that is reflexive and acknowledges the situatedness, positionality and politics of the social science being conducted, rather than rejecting such an approach out of hand. Such an epistemology also has potential utility in the sciences for recognizing and accounting for the use of abduction and creating a more reflexive data-driven science. As this tentative discussion illustrates, there is an urgent need for wider critical reflection on the epistemological implications of Big Data and data analytics, a task that has barely begun despite the speed of change in the data landscape.
Notes by Matthew R. DeVerna