For observing a huge online data generation through an expanded Internet
(IoT and the Industrial Internet) one can use the flux of information for monitoring,
predicting, controlling and decision making. During the time of applying methods of
statistical inference and statistical decisions some 70 years ago, information derived
from data collection was considered costly. Models were built where information was
linked to payoff relevance of a decision making criterion (utility or payoff function),
therefore statistical information was handled to satisfy these criteria. What really is
subsumed under ‘Big Data’ (BD) qualifies under a few main characteristics: (i) BD is
primarily network generated on a large scale by volume, variety and velocity and
comprises large amounts of information at the enterprise or public level, in the
categories of terabytes (1012 bytes), petabytes (1015) and beyond of online data. (ii) BD
consists of a variety and diversity of data types and formats, many of them are
dynamic, unstructured or semi-structured and are hard to handle by conventional
statistical methods. (iii) BD is generated by disparate sources as in interactive
application through IoT from wireless devices, sensors, streaming communication
generated by machine-to-machine interactions. The traditional way of formatting
information from transactional systems to make them available for ‘statistical
processing’ does not work in a situation where data arrived in huge volumes from
diverse sources, and where even the formats could be changed.
Keywords: Big Data, Business Intelligence, Cloud Computing, Complexity,
Cybersecurity, Cyber Physical System (CPS), Data Analytics, Data Mining,
Exploratory Data Analysis (EDA), GPS Systems, Hadoop, IBM, IDC, Industry
4.0, M2M Communication, MapReduce, McKinsey, Predictive Analytics,
Statistical Decisions, Statistical Inference.