Você está na página 1de 3

Bending the Rules for SQL

by Avi Kapuya

NoSQL is a disruptive trend that changed


many years of rigid perception of data stores
and how they should be serving
applications. NoSQL evolved from
distributed theories and the common
notion that distributed foundations are an
absolute necessity for making the next leap
in data management.

The data management needs of large scale


services such as web search engines and
global social networking cannot be properly
served by ‖old school‖ data management systems. Comparing today’s hardware to the
past, your average $200 netbook has the computing power of a $10M mainframe of the
90’s. Yet the data growth we’re experiencing these days far exceeds hardware
improvement and scaling, and many databases cannot be contained in a single server.

Distribution of data across multiple machines enables you to deal with large amounts of
data and improve throughput. However, once the data is distributed, it becomes more
difficult to maintain data consistency. For example, transactions may change data that
resides on multiple servers. To be able to synchronize all the changes made by the
transaction into a single known state (before and after committing), a synchronization
point is needed, in the form of a transaction state.

Some NoSQL solutions loosen the requirements when it comes to consistency by not
supporting transactions, thus reducing the need for synchronization points. This relaxed
consistency approach may suit some use cases where consistency is less important, but
would be unsuitable for others.

Let’s consider the example of social networking applications. In these applications,


operations over the data objects are well spread, and there is a low likelihood of collision
to start with. On top of that, data is mostly added and rarely updated. By contrast,
NoSQL’s compromise of consistency can create major problems for most applications,
some of which include CRMs, ERPs and trading applications — those cannot accept data
integrity inconsistencies, even if they are temporary.

To date, I have yet to see proper research that recommends a viable approach to dealing
with the problem of data inconsistency. Most applications are very intolerant to the ―third
state of a bit‖ –computer systems by nature are very dichotomic — it’s either Yes or No.
There is no easy way to obtain a programming paradigm that deals well with an unknown
state of data.
Unless a worthy solution at the application level is found that can deal with the data
inconsistency issues in NoSQL, most applications can only find limited use of NoSQL in
the form of caches, accelerators for specific tables, and more.

The question is: can distributed data services (NoSQL) help to scale data, even if it takes
into account the existence of synchronization points and their limitations? What would
be sacrificed if synchronization and transaction mechanisms were added on top of
NoSQL foundations? The answer is performance. Distributed and consistent data
services will always be slower than pure raw distributed data service.

Is sacrificing performance worth it?


Yes. Despite the performance penalty introduced by adding the consistency governing
mechanisms, performance would still be reasonable for the majority of applications. It
would not be as fast as a single server, all in-memory data service – but for most needs,
the performance provided by a distributed yet consistent service would be good enough
and comparable with standard database performance.

What do I get in exchange for higher performance?


The gains would be:

 Larger storage space


 X times more throughput than a single node
 More stable and predictable performance
 Better average performance under load

What would be the performance of a SQL


implementation on top of Consistent NoSQL
foundations?
Because the level of parallelism in a distributed system is high (yes, even with locks and
transactions), the average performance of such a solution could exceed that of a standard
database if the right optimizations are done to realize the parallelism and distribution
potential.

So what sacrifices would I need to make for a


distributed SQL database as opposed to a single server
database?
When the database load is low, a single server will probably be faster. In this case,
parallelism has a smaller effect and single server has no network overheads. Single server
solutions have their drawbacks in the form of scalability, availability, throughput and
stability, and therefore they are usually not a good solution.

Xeround’s cloud database is a SQL database that combines NoSQL principles to facilitate
elasticity, alongside the mechanisms to support locks, transactions and query capabilities
as in relational databases. Xeround’s SQL cloud database couples both SQL and NoSQL
to create a highly parallel, highly scalable, and available cloud database.

Try it out now!

Você também pode gostar