TECH ARTICLE

NewSQL….the new era of Relational Databases?

NewSQL….the new era of Relational Databases?

Subscribe

Subscribe

The term NewSQL represents a new generation of Relational Database Management Systems

whose goal is reaching the same scalability guarantees of NoSQL databases without giving up ACID properties and the relational model.

ACID

  • Atomicity: A transaction should be visible only if all the updates to the datastore are applied successfully, no intermediate state should be visible to clients.
  • Consistency: No referential constraints should be violated by a transaction, if any of those constraints is violated the whole transaction must fail, the database as a whole can evolve only from one consistent state to another.
  • Isolation: Each transaction should be independent from other transactions running on the database, operations done inside a transaction should not have side effects on other transactions.
  • Durability: Each successful transaction should be persisted on stable storage by the database system, data committed by a trasaction should never be lost.

ACID properties are easy to obtain when the database scales vertically [1] but vertical scaling is not pratical because price of a single machine does not scale linearly with performance.[2]

NoSql databases prefer to give up ACID properties to horizontally scale, just add more machines and your performance should scale linearly in their number.

If your application needs ACID properties [3]  you are stuck reimplementing those properties in your application, you could have better spent your money on a good RDBMS engine.

NewSQL engines can give you the best of both worlds.

Popular NewQSL Database Management System

CockroachDB

Inspired by Google Spanner[4] CockroachDB is a NewSQL database built on a transactional and strongly-consistent key-value store (RocksDB).

CockroachDB enables your use case to scale horizontally, survives disk, machine, rack, and even datacenter failures.

CockroachDB supports strongly-consistent ACID transactions and provides a familiar SQL API for structuring, manipulating, and querying data.

CockroachDB shards your data in ranges and, as more data comes in, splits those ranges. They are continuosly rebalanced between cluster nodes.

Horizontal scaling is achieved by assigning ranges to different cluster nodes, surviving failures is achieved by making use of replicas of the ranges, cluster split and rebalance is mediated by the raft consensus algorithm.

NuoDB

NuoDB like CoackroachDB is a NewSQL database that achieves horizontal scale by employing a two level architecture.

The transaction processing layer consists of in-memory process nodes called Transaction Engines (TE). TEs allow NuoDB to maintain high, in-memory performance. As an application makes requests of NuoDB, TEs naturally build up in-memory caches with affinity for that application’s workload. Requests for data not currently in cache (cache misses) can be fulfilled from the memory caches of other TEs or from the storage management layer.

The storage management layer consists of process nodes called Storage Managers (SM), which have both in-memory as well as on-disk storage components. The SM also caches data in-memory to speed up data retrieval from disk. SMs also provide on-disk data durability guarantees. Multiple SMs can be used to increase data redundancy.

Benchmark

All workloads were performed using synthetic data 3 aimed at simulating those often found in the IoT world. More precisely, three tables have been created: devices, measurements and periods of validity.

All the benchmarks have been carried out in the same conditions:

  • The NewSQL database has been installed on three nodes.
  • The queries were carried out by ten worker processes, instantiated on the same physical machine, using as many connections to the database.
  • For both databases, a replication factor of three was set. This means that each data point is replicated on each node.
  • The tables used have the following dimensions:
  • devices : 200 rows and 4 columns
  • measurements: 300,000 rows and 5 columns. With 1500 measurements for each device
  • validity periods: 10,000 rows and 6 columns. With 50 validity periods for each device.

Single row get

The first workload under analysis consists of 200,000 reading operations by primary key. As a result, each query selects a single line of the table involved. The transactions were equally distributed among the three tables.

Write Operations

15,000 write operations were carried out. The number of rows inserted by each operation varies from one to five. Here again, the writes were equally distributed between the three tables.

 

Joins

Workload composed of 75,000 read operations, which require joins among three relationships. Queries select only two lines of the table resulting from the two inner joins.

Joins 2

This workload is similar to the one just analyzed, but it highlights the changes in the performance of the two databases as the number of selected rows increases. In this case, the operations carried out were 15,000 reads with joins involving the three tables, only 300 rows are transferred to the clients.

Mixed workload

The last benchmark in analysis was performed with a workload consisting of mixed operations. In particular, 100,000 queries were carried out of which 80% are reads and 20% are writes. 1% of the reads are joins on the three tables, producing two rows. The remaining read requests were all made by primary key, returning a single record as the output. Finally, the writes inserted a number of rows that varies from one to five.

Comparison

CockroachDB

Pros

  • Surely the first big advantage of CockroachDB is to be open source. On the other hand, NuoDB only provides a free Community Edition, with different limitations.
  • Another point in favour of CockroachDB is the presence of good documentation and a support forum, which simplify setup and product configuration.
  • CockroachDB also offers a graphical interface presenting rich data and usage statistics. This interface has been very useful during the benchmark to monitor the database performance.
  • In terms of performance, CockroachDB performed better than NuoDB for read operations involving several records in the primary key scans.

Cons

  • In general, the performance of CockroachDB was lower than that of the of NuoDB, especially for join and write operations.
  • CockroachDB does not offer an integrated load balancer, which is why it was necessary to use the one provided by AWS for a fee. Without load balancer, it would not have been possible to distribute the load equally between the available nodes.

NuoDB

Pros

  • NuoDB provides on average better performance than CockroachDB, especially for join and write operations.
  • The presence of an integrated load balancer, configurable with different policies for load distribution, is another great advantage offered by NuoDB.

Cons

  • One of the great flaws of NuoDB is that it is not open source. The Community Edition, provided free of charge, also has too many limitations and is therefore not pratical to use in production. In particular, this version allows you to have only three transaction engines and a single storage manager. The second limitation results in the impossibility to have the replication of the data resulting in no fault tolerance. All the benchmarks reported above were carried out with the Enterprise Edition of NuoDB.
  • The documentation provided by NuoDB was often confusing and sometimes incomplete. This has influenced the setup times in a negative way, no public support forum is available.
  • NuoDB, unlike CockroachDB, does not provide a graphical interface for monitoring the database. This lack, although not serious, has made observing the performance during benchmarks harder.
  • NuoDB performance decreases when the number of selected lines in the scanning process grows.

[1] To scale you have to buy one machine with better performance.

[2] Machines capable of the needed performance are not available yet.

[3] Applications do not often really need ACID properties.

[4] But not requiring hardware-assisted clock synchronization using GPS clocks and atomic clocks to ensure global consistency.

Written by Andrea Fonti - Agile Lab Software Mentor & Big Data Engineer

 

If you found this article useful, take a look at our Knowledge Base and follow us on our Medium Publication, Agile Lab Engineering!

 

 

Similar posts