Data
Fundamentals
Etymology and Terminology
The word "data" originates from the Latin datum, the neuter past participle of dare meaning "to give," thus translating to "something given" or "a thing granted." As the plural form data, it entered English in the mid-17th century, with the Oxford English Dictionary recording its earliest evidence in 1645 in the writings of Scottish author and translator Thomas Urquhart, where it referred to facts or propositions given as a basis for reasoning or calculation in scientific and mathematical contexts.[10][11] Initially borrowed directly from Latin scientific texts, the term appeared in English via scholarly works emphasizing empirical observations and computations. A historical milestone in the application of data occurred in 1662 with John Graunt's Natural and Political Observations Made upon the Bills of Mortality, which analyzed London parish records to derive demographic patterns, representing one of the earliest systematic uses of aggregated numerical data in what is now recognized as descriptive statistics, even though Graunt himself did not employ the specific term "data." The concept gained further traction in scientific discourse throughout the 17th and 18th centuries. By the 1950s, "data" was widely adopted in computing, notably by IBM in naming its systems, such as the 1953 IBM 701 Electronic Data Processing Machine, which processed large volumes of numerical information for business and scientific purposes, solidifying the term's role in technological contexts.[12][13] In the 20th century, particularly with the expansion of computing, the usage of "data" evolved from its traditional plural form—taking verbs like "are"—to a mass noun treated as singular, as in "data is," reflecting its conceptualization as an undifferentiated collection rather than discrete items; Google Books Ngram analysis shows the singular form rising from a minority in the early 1900s to parity with the plural by the late 20th century.[14] Key terminological distinctions include raw data, defined as unprocessed facts, figures, or symbols without inherent meaning or context, versus information, which arises when raw data is organized, processed, and interpreted to convey significance, as outlined in standards like the U.S. Department of Defense's data management framework.[15] Modern style guides address the singular/plural debate: the American Psychological Association (APA) recommends plural treatment ("data are") in formal and scientific writing for precision, while the Chicago Manual of Style permits either, favoring singular for general audiences but plural in technical contexts to honor the word's Latin roots.[16][17]Definitions and Meanings
Data is defined as the representation of facts, concepts, or instructions in a formalized manner suitable for communication, interpretation, or processing by humans or automated systems. This encompasses numerical values, textual descriptions, symbolic notations, or other discrete units that capture observations or measurements without inherent context or significance on their own. For instance, raw sensor readings from a thermometer recording temperatures at specific intervals exemplify data as unprocessed inputs awaiting analysis. A key distinction lies between data and related concepts like information, where data serves as the raw, unstructured foundation, while information emerges from its organization, contextualization, and interpretation to convey meaning. This relationship is formalized in the DIKW hierarchy, which progresses from data (basic symbols or signals) to information (processed and related facts), knowledge (applied understanding through patterns and rules), and wisdom (evaluative judgment for decision-making). The hierarchy, introduced by Russell L. Ackoff in 1989, underscores that data alone lacks meaning until transformed, as seen in examples like isolated numbers from a database becoming meaningful sales trends when aggregated and analyzed. In philosophical contexts, data refers to empirical observations or sense-data that form the basis of perceptual experience and epistemic justification, distinct from interpretive thought. These are immediate sensory impressions, such as visual or auditory inputs, that philosophers like Bertrand Russell analyzed as mind-independent entities grounding knowledge claims. In legal settings, data functions as evidentiary facts—recorded information or predicate details that support inferences in judicial proceedings, such as digital logs or witness statements admissible under rules like Federal Rule of Evidence 703. Everyday usage treats data as personal records, including health metrics, financial transactions, or location histories, which individuals manage for practical purposes like budgeting or fitness tracking. Since the early 2000s, the meaning of data has evolved to incorporate digital traces of user behavior, driven by the rise of Web 2.0 platforms and big data analytics, where unstructured logs from social interactions and online activities are treated as valuable raw inputs for predictive modeling. This shift, exemplified by the growth of user-generated content on sites like early social media, expanded data's scope beyond traditional records to encompass behavioral patterns analyzed for targeted advertising and personalization.Types of Data
Data can be broadly classified into qualitative and quantitative types based on its nature and measurability. Qualitative data, also known as categorical data, consists of non-numerical information that describes qualities, characteristics, or attributes, such as text, images, audio, or observations that capture themes, patterns, or meanings without assigning numerical values.[18] In contrast, quantitative data is numerical and measurable, allowing for mathematical operations like counting, averaging, or statistical analysis; it includes values such as heights, temperatures, or sales figures that represent quantities or amounts.[19] This distinction is fundamental in research and analysis, where qualitative data provides depth and context, while quantitative data enables precision and generalizability.[20] Another key categorization distinguishes structured from unstructured data based on organization and format. Structured data is highly organized and stored in a predefined format, such as rows and columns in relational databases or spreadsheets, making it easily searchable, analyzable, and integrable with tools like SQL; examples include customer records in a CRM system or sensor readings in fixed schemas.[21] Unstructured data, comprising about 80-90% of all data generated today, lacks a predetermined structure and includes free-form content like emails, social media posts, videos, or documents that require advanced processing techniques for extraction and interpretation.[21] This divide impacts storage, processing efficiency, and application, with structured data suiting traditional analytics and unstructured data fueling modern AI-driven insights.[22] Additional classifications refine these categories further. Discrete data consists of distinct, countable values with no intermediate points, such as the number of items sold (integers) or categories like gender, which can only take specific, separated states.[23] Continuous data, however, forms a spectrum of infinite possible values within a range, measurable to any degree of precision, as in weight, time, or temperature, often represented by real numbers.[23] Separately, primary data is original information collected firsthand by the researcher for a specific purpose, through methods like surveys or experiments, ensuring direct relevance but requiring more resources.[24] Secondary data, derived from existing sources compiled by others, such as published reports or databases, offers broader scope and cost savings but may introduce biases or outdated elements.[24] Emerging types of data reflect evolving technological and analytical needs. Big data is characterized by the "three Vs"—high volume (massive scale of data generation), velocity (rapid speed of data creation and processing), and variety (diverse formats from structured to unstructured sources)—demanding innovative handling beyond traditional systems, as defined by Gartner in 2011. Metadata, or "data about data," provides descriptive context for other data, including details like creation date, author, format, or location, standardized by ISO/IEC 11179 to facilitate interoperability and management across systems. Spatiotemporal data integrates spatial (location-based, e.g., coordinates) and temporal (time-based) dimensions, capturing changes over geographic areas and periods, essential in fields like GIS for modeling phenomena such as climate patterns or urban growth.[25] These types underscore the increasing complexity and interconnectedness of data in contemporary applications.Acquisition
Data Sources
Data sources are the origins from which raw data derives, including natural phenomena and artificial systems that produce information via observation, measurement, or recording. They supply inputs for data acquisition in scientific, social, and technological fields.[26] Natural sources capture data from environmental and biological processes using observational tools. Weather sensors track atmospheric variables like temperature, precipitation, and wind, yielding real-time data for climate analysis; automated stations on the Juneau Icefield monitor glacial changes.[27] Geological samples from field and lab experiments reveal subsurface structures and resources, aiding assessments of aquifers and deposits.[28] Biological sources include DNA sequences in databases for genomic and biodiversity research.[29] Medical scans like MRI produce imaging data for diagnostics, enhanced by genomic fusion.[30] Human-generated sources stem from activities, yielding structured and unstructured data on behaviors. Government surveys gather demographic and economic data to track trends and policy effects.[26] Social media platforms like Twitter and Facebook provide user-generated content on sentiment and dynamics.[31] E-commerce and financial logs record transactions for analysis.[32] Digitized historical documents and archives preserve outputs for cultural evolution studies.[33] Technological sources use engineered systems for scalable, real-time production. Internet of Things (IoT) sensors in urban settings stream data from assets for predictive maintenance.[34] Satellites with remote sensing gather Earth observation imagery for land cover and change detection.[35] Web scraping tools aggregate online data for market research.[36] Data sources evolved from manual to digital methods. Before 1900, they relied on ledgers and paper, as in early censuses with in-person tallies.[37] The U.S. Census from 1790 used quill pens for population and agriculture data.[38] Post-1980s, digital sensors and networks automated capture; the U.S. Census shifted to tabulators in 1890 and digital by the late 20th century, boosting volume and accessibility.[37][39]Data Collection Methods
Data collection methods systematically acquire raw data from sources, ensuring quality, accuracy, and relevance for analysis. Choices depend on research goals, resources, and data nature. Observational, experimental, sampling, and digital techniques capture phenomena, with ethics protecting participants. Observational methods passively record events. Direct measurement, like calibrated thermometers for temperature, yields precise readings; hydrological thermistors follow calibration to reduce errors. Remote sensing uses satellites or aircraft with radar or infrared to detect radiation, mapping inaccessible areas without contact. Experimental methods test hypotheses via controlled interventions. Physics labs manipulate variables, like magnetic fields, to isolate effects. A/B testing in software randomizes variants to compare metrics, optimizing designs empirically. Sampling selects population subsets to cut costs while preserving representativeness. Random sampling ensures equal chances but needs large sizes for rare events. Stratified sampling proportions from subgroups for precision. Cluster sampling selects entire groups, suiting dispersed areas despite variance. Neyman allocation distributes sizes by variability to minimize sampling error.[26] Digital methods scale online gathering. APIs query structured data in JSON from platforms. Web scraping parses HTML for unstructured content. Crowdsourcing via Amazon Mechanical Turk (launched 2005) enables tasks like annotation. Ethics demand informed consent, detailing purpose, risks, and usage to ensure voluntary participation and privacy, per regulations.Storage and Management
Data Formats and Documents
Data formats define the structure and representation of data in files or records, enabling efficient storage, interchange, and processing across systems. These formats vary based on the data's nature, such as tabular for structured records, hierarchical for nested relationships, and binary for compact, machine-readable encoding. Tabular formats like Comma-Separated Values (CSV) organize data into rows and columns separated by delimiters, with CSV formalized in RFC 4180 as a standard for text-based tabular interchange.[40] Spreadsheets, such as Microsoft Excel introduced in 1985 for the Macintosh, extend this by providing interactive tabular documents with formulas and formatting.[41] Hierarchical formats represent data as trees or nested structures, suitable for complex, interrelated information. Extensible Markup Language (XML), a W3C Recommendation since 1998, uses tags to define hierarchical elements for document and data exchange.[42] JavaScript Object Notation (JSON), derived from ECMAScript and standardized in RFC 8259, offers a lightweight alternative with key-value pairs and arrays for web APIs and configuration files.[43] Binary formats encode data directly in machine-readable bytes to minimize size and parsing overhead; for instance, Joint Photographic Experts Group (JPEG) compresses images lossily under ISO/IEC 10918, finalized in 1992.[44] Relational databases, queried via Structured Query Language (SQL), store tabular data in binary files or blocks for efficient indexing and transactions, as seen in systems like MySQL, developed in 1995.[45] Data documents encompass tools and systems for managing formatted records. Spreadsheets like Excel support user-editable tabular data with built-in computation, evolving from early versions to handle millions of cells. NoSQL databases, such as MongoDB released in 2009, use document-oriented binary storage for flexible, schema-less hierarchical data like JSON-like BSON.[46] These documents facilitate usability by combining format with metadata, such as headers in CSV or indexes in databases. Standardization efforts ensure interoperability in data representation. The Resource Description Framework (RDF), a W3C Recommendation from 2004, provides a schema for semantic web data as triples in graph structures, enabling linked data across domains.[47] Electronic Data Interchange (EDI), with ANSI X12 standards established in 1979, defines protocols for structured business document exchange, reducing errors in supply chain transactions.[48] The evolution of data formats reflects technological advances in storage and scale. Punch cards, pioneered by Herman Hollerith in the 1890s for the U.S. Census, encoded data as perforations for mechanical tabulation, marking an early shift from paper to automated processing.[49] Modern cloud-native formats like Apache Parquet, introduced in 2013 by Twitter and Cloudera, employ columnar binary storage optimized for big data analytics, compressing and partitioning datasets for distributed systems like Hadoop.[50] This progression from rigid, physical media to efficient, scalable digital structures has enabled handling vast, diverse data volumes.Data Preservation and Longevity
Data preservation involves a range of strategies designed to ensure that digital information remains intact, accessible, and usable over long periods, countering the inherent fragility of electronic media. Key techniques include regular backups, which can be full—capturing an entire dataset—or incremental, recording only changes since the last backup to optimize storage and time efficiency. Another critical method is data migration, where information is transferred to newer formats or storage media to prevent obsolescence, such as converting legacy files from outdated systems to contemporary standards like PDF/A for long-term archiving. Emulation further supports preservation by simulating obsolete hardware and software environments, allowing access to data on formats like early 1980s floppy disks without original equipment. Despite these approaches, several challenges threaten data longevity. Bit rot, or silent data corruption, occurs when errors accumulate in storage media over time due to hardware degradation or transmission faults, potentially rendering files unreadable without detection. Format obsolescence exacerbates this issue; for instance, floppy disks from the 1980s and 1990s became largely unreadable by the 2020s as compatible drives vanished from common use. Environmental factors also pose risks, including the high energy demands of data centers for cooling to prevent overheating, which can lead to hardware failures if power or climate controls falter. To address these challenges systematically, international standards and initiatives have emerged. The Open Archival Information System (OAIS) reference model, formalized in ISO 14721 in 2003, provides a framework for creating and maintaining digital archives, emphasizing ingestion, storage, and dissemination processes to ensure long-term viability. Organizations like the Internet Archive, founded in 1996, exemplify practical implementation through vast digital repositories that preserve web content and other media via web crawling and redundant storage. Archival laws further institutionalize these efforts; in the United States, the National Archives Act of 1934 established federal requirements for preserving government records, later extended to digital formats. Metrics for assessing data longevity highlight the urgency of proactive preservation. Estimates for the half-life of digital scientific data without intervention vary by field and storage medium, often falling within years to decades due to format shifts and hardware evolution. Such figures underscore the need for ongoing migration and verification to extend usability beyond this threshold.Data Accessibility and Retrieval
Data accessibility and retrieval encompass the technologies and protocols that enable efficient location, access, and sharing of data across systems. Retrieval systems rely on indexes to optimize query performance by creating structured pointers to data, allowing databases to avoid full table scans during searches.[51] For instance, in relational databases, SQL queries use indexes to retrieve specific records rapidly, forming the backbone of structured data access.[52] Search engines like Elasticsearch, first released in February 2010, extend this capability to unstructured and large-scale data through distributed indexing and full-text search.[53] Additionally, APIs facilitate data sharing by providing standardized interfaces for programmatic access, enabling seamless integration between disparate systems without direct database exposure.[52] Key principles guide the design of accessible data systems, emphasizing openness and usability. The open data movement promotes public release of government and institutional data under permissive licenses, as outlined in the International Open Data Charter adopted in 2015 by over 170 governments and organizations.[54] Complementing this, the FAIR principles—Findable, Accessible, Interoperable, and Reusable—provide a framework for scientific data stewardship, introduced in a 2016 paper to ensure data can be discovered and utilized by both humans and machines.[55] These principles advocate for persistent identifiers, metadata standards, and open protocols to enhance discoverability and reuse. Despite these advancements, significant barriers hinder data accessibility. Paywalls restrict access to subscription-based datasets, limiting availability to paying users or institutions. Proprietary formats, such as certain vendor-specific file structures, impede interoperability by requiring specialized software for decoding. Digital divides exacerbate these issues; as of 2023, approximately 33% of the global population—over 2.6 billion people—lacks internet access, primarily in low-income regions.[56] To address these challenges, specialized tools support data discovery and management. Data catalogs like Google Dataset Search, launched in 2018, index over 25 million datasets from repositories worldwide, allowing users to search and filter based on metadata.[57] For tracking changes and maintaining lineages, version control systems such as Git, often extended with tools like Git LFS for large files, enable collaborative data versioning and audit trails.[58] These mechanisms ensure that data retrieval remains reliable and traceable, fostering broader usability while respecting preservation needs.Processing and Analysis
Data Processing Techniques
Data processing techniques encompass a range of methods used to clean, transform, and prepare raw data for subsequent analysis or storage, ensuring accuracy, consistency, and usability. These techniques address common issues in datasets, such as incompleteness, inconsistencies, and varying scales, which can otherwise lead to erroneous outcomes in downstream applications. Cleaning focuses on identifying and rectifying errors, while transformation standardizes data formats and structures. Automation through scripting and pipelines further enhances efficiency, particularly in large-scale environments. Cleaning is a foundational step that involves handling missing values and detecting outliers to maintain data integrity. Missing values can be addressed through imputation methods, such as mean substitution, where absent entries are replaced with the average of observed values in the same feature; this approach is simple and preserves the dataset size but may introduce bias if the data is not randomly missing. For outlier detection, the Z-score method calculates the standardized distance of a data point from the mean, defined as $ z = \frac{x - \mu}{\sigma} $, where $ \mu $ is the mean and $ \sigma $ is the standard deviation; values with $ |z| > 3 $ are typically flagged as potential outliers, as they deviate significantly from the normal distribution under the assumption of approximate normality. These techniques are essential for mitigating the impact of anomalies. Transformation techniques prepare data by rescaling, encoding, and aggregating features to make them compatible with analytical models. Normalization via min-max scaling rescales features to a fixed range, usually [0, 1], using the formula $ x' = \frac{x - \min(X)}{\max(X) - \min(X)} $, which preserves the relative relationships while bounding values to prevent dominance by large-scale features in algorithms like distance-based clustering. For categorical variables, one-hot encoding converts them into binary vectors, creating a new column for each category with 1 indicating presence and 0 otherwise; this avoids ordinal assumptions and enables numerical processing, though it increases dimensionality for high-cardinality features. Aggregation summarizes data by grouping, such as summing daily sales figures to monthly totals, which reduces granularity and computational load while highlighting trends like seasonal patterns. ETL (Extract-Transform-Load) processes form structured pipelines for integrating data from disparate sources into a unified repository. In ETL, data is first extracted from operational databases or files, then transformed to resolve inconsistencies—such as standardizing formats or applying business rules—and finally loaded into a target system like a data warehouse; this paradigm originated in the 1970s for mainframe data integration and remains central to business intelligence. Tools like Apache Airflow, released in 2015 by Airbnb as an open-source workflow orchestrator, automate these pipelines by defining dependencies as directed acyclic graphs (DAGs), enabling scheduling and monitoring of complex ETL jobs. Automation in data processing leverages scripting and processing paradigms to handle volume and velocity. The Python library Pandas, developed by Wes McKinney starting in 2008, provides data structures like DataFrames for efficient cleaning and transformation operations, such as filling missing values or applying one-hot encoding via built-in functions, making it a standard for interactive data manipulation. Processing can occur in batch mode, where fixed datasets are handled offline, or stream mode for real-time ingestion; Apache Kafka, introduced in 2011 by LinkedIn as a distributed messaging system, supports stream processing by enabling low-latency pub-sub pipelines that handle millions of events per second, contrasting with batch systems like Hadoop MapReduce by processing data incrementally as it arrives.Data Analysis and Interpretation
Data analysis and interpretation involve applying statistical and computational methods to processed datasets to uncover patterns, test hypotheses, and derive actionable insights. This process builds on cleaned and structured data, transforming raw information into meaningful knowledge that informs decision-making across various domains. Key approaches include descriptive, inferential, and predictive analyses, each serving distinct purposes in summarizing, generalizing, and forecasting from data. Descriptive analysis focuses on summarizing the main characteristics of a dataset without making broader inferences about a population. Central tendency measures such as the mean, which calculates the arithmetic average of values, and the median, which identifies the middle value in an ordered dataset, provide essential overviews of data distribution.[59] These summaries help identify trends and outliers; for instance, the mean is sensitive to extreme values, while the median offers robustness in skewed distributions. Visualizations enhance this process: histograms display the frequency distribution of continuous variables by dividing data into bins, revealing shape, central tendency, and variability.[60] Scatter plots, meanwhile, illustrate relationships between two continuous variables, plotting points to highlight potential correlations or clusters.[61] Inferential analysis extends descriptive insights to make probabilistic statements about a larger population based on sample data. Hypothesis testing evaluates claims about population parameters; for example, the Student's t-test, developed by William Sealy Gosset in 1908 under the pseudonym "Student," assesses whether observed differences between sample means are statistically significant, accounting for small sample sizes through the t-distribution.[62] This method assumes normality and equal variances, yielding a p-value that indicates the probability of the result occurring by chance. Confidence intervals complement hypothesis testing by providing a range of plausible values for a population parameter, such as the mean, with a specified level of confidence (e.g., 95%), derived from sample statistics and standard error.[63] These tools enable generalization while quantifying uncertainty, though they require careful consideration of assumptions to avoid misleading conclusions. Predictive analysis employs models to forecast future outcomes or classify new data points. Linear regression, pioneered independently by Adrien-Marie Legendre in 1805 and Carl Friedrich Gauss around 1795, models the linear relationship between a dependent variable $ y $ and one or more independent variables $ x $ using the equation $ y = mx + b $, where $ m $ represents the slope and $ b $ the intercept, minimizing the sum of squared residuals via least squares estimation.[64] This approach assumes linearity, independence, and homoscedasticity, making it foundational for predicting continuous outcomes like sales or temperatures. In machine learning, decision trees extend predictive capabilities by recursively partitioning data based on feature thresholds to minimize impurity or variance; the Classification and Regression Trees (CART) algorithm, introduced by Leo Breiman and colleagues in 1984, uses Gini impurity for classification and mean squared error for regression, creating interpretable tree structures that handle nonlinear relationships without assuming data distribution. These methods facilitate forecasting but demand validation to ensure generalizability. Interpreting analytical results presents significant challenges, particularly in distinguishing correlation from causation and avoiding practices like p-hacking. Correlation measures the strength and direction of linear associations between variables, but it does not imply causation, as confounding factors or reverse causality may explain observed patterns; for instance, ice cream sales correlate with drownings due to seasonal weather, not direct influence.[65] P-hacking involves selectively analyzing data—such as choosing subsets, transformations, or multiple tests—until a statistically significant p-value (typically <0.05) emerges, inflating false positives and undermining reliability.[66] The replication crisis, highlighted by the Open Science Collaboration's 2015 study replicating 100 psychological experiments, revealed that only 36% produced significant results compared to 97% in originals, attributing low reproducibility to p-hacking, publication bias, and underpowered studies, prompting calls for preregistration and transparency in the 2010s.[67]Applications and Implications
Data in Computing and Information Science
In computing and information science, data is fundamentally organized using data structures to enable efficient storage, retrieval, and manipulation within algorithms and programs. Basic data structures include arrays, which provide contiguous memory allocation for fast indexed access with O(1) average time complexity for retrieval but O(n) for linear search in unsorted cases; linked lists, which allow dynamic insertion and deletion in O(1) time per operation at known positions through pointer-based connections; trees, such as binary search trees that support balanced O(log n) operations for search, insert, and delete; and graphs, which model relationships via nodes and edges, with traversal algorithms like breadth-first search achieving O(V + E) efficiency where V is vertices and E is edges. These structures are essential for optimizing computational performance, as analyzed through Big O notation, which upper-bounds the growth rate of resource usage relative to input size.[68] Information theory formalizes data's quantitative aspects, particularly through Claude Shannon's concept of entropy, which measures the uncertainty or average information content in a message source. Introduced in 1948, Shannon entropy is defined as
where $ p_i $ is the probability of each possible symbol in the source, providing a foundation for data compression techniques like Huffman coding that minimize redundancy by assigning shorter codes to frequent symbols, and for quantifying channel capacity in noisy communication systems. This metric underpins modern data encoding and error-correcting codes, ensuring reliable transmission while maximizing efficiency.[69]
Database management systems (DBMS) handle persistent data storage and concurrent access, enforcing reliability through ACID properties: Atomicity ensures transactions complete fully or not at all; Consistency maintains data integrity rules; Isolation prevents interference between concurrent operations; and Durability guarantees committed changes survive failures. These principles, building on Jim Gray's 1981 work on transaction concepts and formalized in full by Härder and Reuter in 1983, enable robust operations in relational databases like SQL Server. For large-scale data, frameworks like Apache Hadoop, initiated by Doug Cutting in 2006 as an open-source implementation inspired by Google's MapReduce and GFS, distribute processing across clusters using HDFS for fault-tolerant storage of petabyte-scale datasets.[70][71][72]
Modern trends in data handling emphasize scalability and decentralization, with data lakes emerging in the 2010s as repositories for raw, unstructured data in its native format, allowing schema-on-read processing without upfront transformation. Coined by James Dixon in 2010, data lakes store diverse types like images and logs using scalable object storage, often integrated with Hadoop for analytics on voluminous, schema-flexible data. Complementing this, edge computing has gained prominence post-2020 by shifting data processing to devices near the source, reducing latency and bandwidth demands in IoT ecosystems; the market for edge solutions grew from $44.7 billion in 2022 to a projected $101.3 billion by 2027, driven by real-time applications in telecom and healthcare.[73][74]