Before launching a software development project, you need to decide what database management system (DBMS) would be the best fit to satisfy the prospected workload. Software architects and developers put a lot of effort into considering specific requirements for simplified data modeling, transaction guarantees, read/write speed, horizontal scaling, failure resilience, etc.
Traditionally, the starting point is choosing between SQL and NoSQL categories since each represents a set of tradeoffs. At that, maximum capacity in terms of low latency and high throughput is not negotiable.
If you are considering a NoSQL database for your data store, you might have already heard about the two most popular DBMSs in this category – Cassandra and MongoDB.
This article is aimed to give you a clear understanding of what Cassandra and MongoDB are, what they are not, and in which use case scenarios they serve best. We will start with some basic ideas and move through similarities and differences to practical advice. It will help you make the right choice between them in the context of your application data modeling.
SQL vs. NoSQL: the borderline
More often than not, while discussing a development project with the customer, we have to explain simple things. Simple for us, but not that simple for those who have no data science background.
Recently, we’ve had a meeting with a board of investors, top managers, business owners, and marketers. It was about building a nation-scale online platform to handle zillion operations a day with real estate and land parcel property. As we approached the SQL/NoSQL issue on our agenda, nothing seemed to be a problem. Everybody juggled with words ‘database‘, ‘high availability and scalability,’ and popular DB names.
At some point, we discovered that our communication got stuck in the middle. We suddenly realized that the majority of the people present had very little idea about the general distinction between SQL and NoSQL databases.
So, we’ll start with refreshing some basics.
SQL
The SQL category includes relational database management systems (RDBMS) accessing and manipulating data with Structured Query Language (SQL). SQL is a common base for a variety of relational databases like MySQL, PostgreSQL, Oracle, MS SQL, SAP HANA, etc.
Relational DBMS aims to follow the so-called “ACID” requirements for transactional systems. ACID is an acronym for:
- Atomicity
- Consistency
- Isolation
- Durability
Currently, there are about a hundred of SQL DBMS, both open source and proprietary. You can see the complete list here.
“Not relational SQL, not only SQL”
NoSQL term was coined to indicate a new generation of non-relational databases regardless of any specific technology standing behind them. They evolved to resolve existing scaling and accessibility issues characteristic of traditional relational databases.
NoSQL databases represent distributed systems with parallel processing designed for linearly scalable applications, such as, search engines. Instead of ACID, they pursue BASE properties:
- Basic availability
- Soft state
- Eventual consistency
The BASE requirements, or principles, need some explanation.
Basic availability means that every query is guaranteed to be completed regardless of the outcome.
Soft state refers to the system state flexibility. The system state may change in the course of data normalization, even though during this period, no new data entries are made.
Finally, eventual consistency means that the system state may be inconsistent at times, but eventually, it comes to consistency.
The complete list of NoSQL systems can be found here.
SQL vs. NoSQL cheat sheet
SQL and NoSQL databases are in no way better than one another. They have different strengths, limitations, and weaknesses.
The main point is that if you are building your system around transactional workload with an accent on the maximum consistency and normalization requirements, NoSQL solution isn’t an option. In this case, traditional SQL DBMS would be the only choice.
NoSQL systems are distributed over multiple nodes. They are capable of handling huge volumes of unstructured data used in Big Data analytics, real-time web apps, etc.
Because all individual databases differ a lot, even inside each category, we prepared a cheat sheet to draw a general borderline between SQL and NoSQL.
* CAP Theorem, also known as Brewer’s Theorem, states that a distributed database can guarantee only two of three properties at the same time: Consistency, Availability, or Partition Tolerance.
Apache Cassandra vs. MongoDB
Cassandra and MongoDB both are enormously scalable, high-performance distributed database management systems belonging to the NoSQL family. They are designed to provide high availability across multiple servers to eliminate a single point of failure.
Despite some common features and properties characteristic of many NoSQL systems, these two databases are radically different.
Cassandra database
Released one year before MongoDB, in 2008, Cassandra is designed to manipulate huge data arrays across multiple nodes.
In contrast to the relational database organizing data records in rows, Cassandra’s data model is based on columns to provide faster data retrieval. The data is stored in the form of hash.
MongoDB database
Released in 2009, MongoDB stores data in the form of JSON-like documents instead of table records used in relational databases. Its data structure is organized as dynamic schemes allowing faster data integration.
It is interesting to draw a parallel between MongoDB document-oriented data model and that of traditional table-oriented SQL database:
Companies that use MongoDB and Cassandra
Both databases enjoy popularity among thousands of well-known and highly reputed organizations. See some of the examples below.
Cassandra vs. MongoDB: similarities
Similarities between Cassandra and MongoDB go not too far.
- NoSQL. They both belong to the NoSQL family. Just like other NoSQL databases, they evolved to address challenges of traditional SQL databases: real-time handling big amounts of unstructured data and horizontal scaling.
- Not ACID-compliant. Neither first nor second serves as a replacement of relational databases, and they are not ACID-compliant. If data consistency and normalization are primary requirements, Cassandra or MongoDB is not an option.
- Open-source. Cassandra and MongoDB are open-source software. You are free to download, modify, and use them at your discretion. Both projects are launched by reputed organizations and supported by open-source communities worldwide. There are also commercial implementations of both projects: Cassandra’s under Apache License 2.0 and MongoDB’s under GNU Affero GPL 3.0.
- Cross-platform. Cassandra and MongoDB support a variety of Windows, Linux, and macOS platforms.
- Young. Cassandra and MongoDB appeared about a decade ago, in 2008, and 2009 accordingly. They are relatively young compared to MySQL which debuted in the mid-’90s.
MongoDB vs. Cassandra: differences
Though very different in most respects, Cassandra and MongoDB play an outstanding role in their application fields. Just like their similarities reflect their common ideation, the differences are a reflection of their unique value.
Basing on our hands-on experience in many NoSQL systems, Cassandra and MongoDB, in particular, we prepared their simplified comparison.
Tips on performance benchmarking
The factors most impacting the performance of these two databases are database model/schema, the actual application load, and consistency requirements. The combination of all three factors defines the use cases, in which one of the databases will have the upper hand.
If you want to test and compare both databases with a benchmark load, it is crucial to choose a benchmark load as close to your application performance as possible. Besides, take into account whether you need a benchmark with write-heavy or read-heavy loads.
When choosing a database model, keep in mind that some schemes work better with Cassandra while others work better with MongoDB. To get more consistent results, apply such a data model that suits reasonably well for both databases.
You have to be careful with data consistency settings. Make sure that the read/write consistency requirements and corresponding settings do not disadvantage one of the databases.
Everything you need to know about differences between Relational and Non-Relational Database - is here.
Other details
There are a few more characteristics that might contribute to your decision about the preferred database.
Which database is a right fit for your business
Ultimately, choosing from these two popular databases depends on where and how you will use it. There isn’t a definite answer unless you consider all contributing factors.
We have done a lot of experimenting and benchmarking with these two NoSQL databases and every time we came to the same conclusion, they both are great players if used in the right field. So, to make your decision easier, we’ve collected the most significant points, where one database has an advantage over another.
Data model. Some complicated domains require a rich data model. In this case, MongoDB is a better choice.
Index querying. If most of the querying in your application occurs by the primary key, Cassandra is a good choice. If secondary indexes and flexible querying by them is a primary requirement for you, MongoDB is a better choice.
Availability. Data availability strategy is the most distinctive feature that sets these systems apart. If you need 100% uptime guaranteed, Cassandra is a preferable choice due to its ‘multiple master node’ model. When one or more master nodes in Cassandra fail, the database stays up and running as long as the last master node is standing.
The tradeoff here is that Cassandra’s high availability translates to costly additional infrastructure. If 40-50 seconds delay does not affect your business, you do not need to prioritize the highest availability.
Write speed. If you need to write huge amounts of data, write speed can be a crucial factor. In this case, Cassandra is a better choice because writes are not limited by the capacity of one master node. On the other hand, the write speed in Cassandra is limited by the number of master nodes in a cluster.
Language support. Cassandra is a better fit if your team already has SQL skills since CQL is very similar to SQL. Of course, if other factors play little role.
Data aggregation. In contrast to Cassandra, using external tools, MongoDB has a built-in data aggregation framework. If you want to have a native tool and your data traffic is not very high, MongoDB is a winner.
Workload. Due to the multiple master node model, Cassandra prevails in handling write-heavy workloads. In the case of read-heavy loads, the performance of Cassandra and MongoDB is a close match.
Conclusion
To generalize it all, please note that Cassandra use cases show that the biggest strength is its ability to scale enormously without compromising availability. It is easy to set up and maintain, no matter how fast your database grows. To imagine its scaling capability, think of Instagram: Cassandra handles about 80 million photos uploaded daily to the app’s database.
On the other hand, MongoDB is a superb solution when you need scalability and caching for real-time applications. It is most often used in mobile apps, IoT-related apps, content management systems, and real-time analytics.
We hope now you have a better understanding of the differences between Cassandra and MongoDB databases.
If you need to know more about NoSQL databases or have specific questions, contact our professionals for advice. Good luck, and stay tuned!
Need a qualified team of developers?
Scale your development capacity with top-level expertise and resources.