glossary

GLOSSARY

ACID Compliance

With the acronym ACID we refer to a set of desiderable properties that a Database could have. Specifically: Atomicity: A transaction must be completed in its entirety or not at all. If a transaction results in an error, all its operations are reverted back. Consistency: A transaction must transform a database from one consistent state to another consistent state. If a transaction occurs and results in data that does not follow the rules of the database, it will be ‘rolled back’ to a previous iteration of itself (or ‘state’) which complies with the rules. Isolation: The ability to concurrently process multiple transactions in a way that one does not affect another. Durability: Committed transactions must be fully recoverable in all but the most extreme circumstances. Write-ahead logs provide absolute data durability until data is eventually written into permanent data and index files.

Business Vertical Solution

It is a synonym of a domain-specific solution. It represents something (generally a software) which tends to be specific to a given industry or a specific group of customers within in a given industry

CCPA

The California Consumer Privacy Act (CCPA) is a state-wide data privacy law that regulates how businesses all over the world are allowed to handle the personal information (PI) of California residents. For this kind of requirements, we provide a module called Wasp Privacy than enables our customers to control how privacy is applied to their data. More details here: https://www.agilelab.it/wasp-wide-analytics-streaming-platform/

CDC (Change Data Capture)

Change data capture (CDC) is a process that captures changes made in a database, and ensures that those changes are replicated to a destination such as a data warehouse or a data lake. Wasp provides a module called Auto Data Lake which provides a way to manipulate the mutations coming from a CDC. More details here: https://www.agilelab.it/wasp-wide-analytics-streaming-platform/

CI/CD

With CI/CD (Continuous integration/Continuous delivery) we generally refer to a set of pipelines that are capable to automatically release a new version of a software.

Data Anonymization

Data Anonymization is a processing technique that removes or modifies critical and personal information in order to preserve the users’ privacy. With this technique, an adversarial is not able to clearly distinguish one user from another one. Wasp offers an entire module that is capable of managing PII (Personally identifiable information). For more details, please refer to https://www.agilelab.it/wasp-wide-analytics-streaming-platform/

Data Harvesting

Data harvesting means getting the data and information from an online resource. It is usually interchangeable with web scraping, web crawling, and data extraction

Data Imbalance

In Machine Learning, with Data imbalance we refer to a specific problem related to a set of data which has an unequal distribution per-class. This problem is ubiquitous – for example in Anomaly Detection – and there are many techniques which try to compensate or change the data’s distribution.

Data Lake

A Data Lake is a centralized repository that allows you to store all your structured and unstructured data at any scale. Typically, data is saved in a raw format.

Data Retention

Data retention refers to the non-limited storage of an organization’s data for compliance or business reasons.

Data Streaming

Streaming data is data that is generated continuously by thousands of data sources, which typically send in the data records simultaneously. Streaming data includes a wide variety of data such as log files, applications and social networks. Dealing with data at this scale can be challenging and tricky. With WASP, you can easily manage thousands of records with little to no effort. Read more at https://www.agilelab.it/wasp-wide-analytics-streaming-platform/

Data Warehouse

A data warehouse is a large collection of business data used to help an organization make decisions. Periodically, data is cleaned and then inserted into it, waiting to be analyzed – typically by BI systems. In this setting, processing data near real-time is essential. WASP offers a rich set of features (Streaming and Batch) that ease this critical process. Take a look at https://www.agilelab.it/wasp-wide-analytics-streaming-platform/

ElasticSearch

Elasticsearch is a distributed, free and open search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. Elasticsearch is built on Apache Lucene and it’s part of the well-known ELK stack, along with Kibana and Logstash.

End-to-End Solution

An end-to-end solution refers to a system (or workflow) that addresses all of your business needs and processes, excluding external vendors.

ETL

ETL refers to a process which is composed by three different steps: Extract – structured and unstructured data is imported and consolidated into a single repository (a Data Lake, a DWH, etc) Transform – Apply a set of rules to your data. For example, remove data which has spurious values (Data Cleaning) or deduplicate records Load – After that data is transformed, it is loaded and trasferred to a new destination (with a full or incremental load) WASP is a Big Data tool built to let you create your own ETL’s in a simple – yet powerful – way. Explore it by visiting the dedicated page https://www.agilelab.it/wasp-wide-analytics-streaming-platform/

Fault-Tolerant

It is the ability of a system to continue operating even if a failure occurs. WASP offers out-of-the-box fault-tolerant mechanisms that keeps your batch/streaming jobs alive in mission-critical applications. See https://www.agilelab.it/wasp-wide-analytics-streaming-platform/ for more details

HBase

Hbase is a key-value NoSql database based on Google’s Bigtable. It is available as open-source software and runs in the Hadoop environment. WASP provides a plugin that allows to seamlessly read or write to Hbase. For more details please see https://www.agilelab.it/wasp-wide-analytics-streaming-platform/

Kafka

Apache Kafka is an open-source distributed event streaming framework. It aims to provide a high-throughput, low-latency platform for handling real-time data feeds. Its architecture is based on the Publish/Subscribe pattern (between Producers and Consumers) and the concept of Topic, which stores data. It is a fundamental piece in the Big Data environment. Wasp offers a plugin module which is able to read or write to Kafka’s topics in a very simple way. For more details, you can consult https://www.agilelab.it/wasp-wide-analytics-streaming-platform/

NoSQL

NoSQL stands for “Not only SQL”. The term is often used when referring to data stores that do not manage standard relational tables, but instead they have a non-tabular model. A NoSQL database’s model can be key-value, document-based, column-oriented or graph-based. WASP is able to interoperate with a series of NoSQL databases, such as Hbase or MongoDB. For more details, please see https://www.agilelab.it/wasp-wide-analytics-streaming-platform/

Ontology

The term ontology – firstly introduced in philosophy – aims to describe the existing world by describing its entities. In Computer Science, it refers to the definition of properties, categories and relations between concepts or entities.

RDF triples

A RDF triple – or Semantic triple – belongs to the RDF (Resouce Description Framework) model that is often used in an Ontology. It is a set of three entities: – subject – predicate – object The triple represents a relation that can be used to automatically derive knowledge or hidden relations between entities that can be seen as a directed labeled graph. For this reason, they are a fundamental piece in the RDF model, which aims to build a machine-readable Web.

Scala

Scala is a programming language that mixes the Functional Programming (FP) and Object-oriented (OO) paradigm. It runs on the Java Virtual Machine and it’s compatible with Java. A very high number of frameworks used in the Big Data environment – such as Apache Spark, or our WASP – are coded using Scala.

SLA

SLA (or Service Level Agreement) defines the level of service expected by a customer from a supplier. Usually, it involves a list of metrics to measure (for example, a measurement that tracks how fast data is processed by a pipeline). WASP uses the concept of “Telemetry” to automatically track common metrics usually involved in a SLA. For more details, please see https://www.agilelab.it/wasp-wide-analytics-streaming-platform/ or contact us

SPARQL

SPARQL is a query language that is able to express queries in the Resource Description Framework (or RDF). It is often used to extract knowledge by querying the underlying RDF triples.

Time-series Database

NoSQL stands for “Not only SQL”. The term is often used when referring to data stores that do not manage standard relational tables, but instead they have a non-tabular model. A NoSQL database’s model can be key-value, document-based, column-oriented or graph-based. WASP is able to interoperate with a series of NoSQL databases, such as Hbase or MongoDB. For more details, please see https://www.agilelab.it/wasp-wide-analytics-streaming-platform/

Trend Degradation

NoSQL stands for “Not only SQL”. The term is often used when referring to data stores that do not manage standard relational tables, but instead they have a non-tabular model. A NoSQL database’s model can be key-value, document-based, column-oriented or graph-based. WASP is able to interoperate with a series of NoSQL databases, such as Hbase or MongoDB. For more details, please see https://www.agilelab.it/wasp-wide-analytics-streaming-platform/

Tuples

NoSQL stands for “Not only SQL”. The term is often used when referring to data stores that do not manage standard relational tables, but instead they have a non-tabular model. A NoSQL database’s model can be key-value, document-based, column-oriented or graph-based. WASP is able to interoperate with a series of NoSQL databases, such as Hbase or MongoDB. For more details, please see https://www.agilelab.it/wasp-wide-analytics-streaming-platform/

UBI (Usage Based Insurance)

NoSQL stands for “Not only SQL”. The term is often used when referring to data stores that do not manage standard relational tables, but instead they have a non-tabular model. A NoSQL database’s model can be key-value, document-based, column-oriented or graph-based. WASP is able to interoperate with a series of NoSQL databases, such as Hbase or MongoDB. For more details, please see https://www.agilelab.it/wasp-wide-analytics-streaming-platform/

Virtual Machine

A Virtual Machine (VM) is a compute resource that uses “software” instead of a “physical” computer to run programs on top of it. The VM is tipically sandboxed from the system: it runs inside it but it’s not able to communicate with the “outside” system.