The Internet, Data Analytics and Big Data

Abstract

For observing a huge online data generation through an expanded Internet (IoT and the Industrial Internet) one can use the flux of information for monitoring, predicting, controlling and decision making. During the time of applying methods of statistical inference and statistical decisions some 70 years ago, information derived from data collection was considered costly. Models were built where information was linked to payoff relevance of a decision making criterion (utility or payoff function), therefore statistical information was handled to satisfy these criteria. What really is subsumed under ‘Big Data’ (BD) qualifies under a few main characteristics: (i) BD is primarily network generated on a large scale by volume, variety and velocity and comprises large amounts of information at the enterprise or public level, in the categories of terabytes (1012 bytes), petabytes (1015) and beyond of online data. (ii) BD consists of a variety and diversity of data types and formats, many of them are dynamic, unstructured or semi-structured and are hard to handle by conventional statistical methods. (iii) BD is generated by disparate sources as in interactive application through IoT from wireless devices, sensors, streaming communication generated by machine-to-machine interactions. The traditional way of formatting information from transactional systems to make them available for ‘statistical processing’ does not work in a situation where data arrived in huge volumes from diverse sources, and where even the formats could be changed.

Keywords: Big Data, Business Intelligence, Cloud Computing, Complexity, Cybersecurity, Cyber Physical System (CPS), Data Analytics, Data Mining, Exploratory Data Analysis (EDA), GPS Systems, Hadoop, IBM, IDC, Industry 4.0, M2M Communication, MapReduce, McKinsey, Predictive Analytics, Statistical Decisions, Statistical Inference.