Modern web applications require a myriad of services in order to function properly; a site is expected to be able to store transactional data, generate dynamic web pages and manipulate the data being stored via server-side scripting. Most modern developers use the LAMP stack in order to provide these capabilities. The LAMP stack has done its job in meeting those requirements. However, as companies scale and demand more intensive processes from their infrastructure, a new set of technology has appeared—the SMACK stack. The open source community created this entirely new wave of technology out of necessity, and companies are adopting this stack at an alarming rate—for good reason.
The SMACK Stack
The SMACK stack is comprised of Spark, Mesos, Akka, Cassandra, and Kafka. The idea is that this stack provides fewer points of failure, and increased processing speed. It also allows for better scalability, machine learning, and more. Here is how it breaks down:
Spark is a fast and general-purpose cluster computing system. The main feature of Spark is the in-memory caching; by holding frequently-requested data in memory, the need for database queries to retrieve data is reduced. This results in increased processing speed of an application.
Mesos is a distributed resources management system. This system allows for efficient resource allocation. Applications dynamically share a pool of nodes, and applications can use this pool of resources at any given moment.
Akka is a toolkit that utilizes the actor model to make it easier to write concurrent and distributed systems. Prior to Akka, some of the problems encountered with concurrent programming included deadlock race conditions, and more. Through the actor model, Akka allows actors to communicate via asynchronous messaging instead of method calls, manage their own state, and when responding to a message, an actor can create other/child actors, send messages to other actors, or stop child actors or themselves.
Cassandra is a distributed database management system, which can handle large amounts of data with no single point of failure. The major feature of Cassandra is that it provides cluster spanning of multiple datacenters with asynchronous masterless replication. Most if not all problems caused by the primary-replica database cluster model are solved by Cassandra. This allows for a decentralized and scalable database management system.
Kafka is a distributed streaming platform. The key to Kafka is the log data structure. One core feature of Kafka is that it can retain messages for a long time, and applications can rewind to an older timestamp in the log and reprocess. A new application or algorithm that gets developed can be tested with those past events. Kafka is used heavily in the Big Data space as a reliable way to ingest and move large amounts of data very quickly.
The Stack Working Together
When choosing which stack to use, there are several things to consider, such as the type of analysis, processing methodology, and data size and type of data. If your site will be generating large amounts of data, you should consider using the SMACK stack for your development. A simplistic webpage containing information is easily served by the LAMP stack, but for anything more robust like an ecommerce site, or a site involving users that need to update data, the SMACK stack can help. This stack benefits from powerful ingestion [Kafka] and intensive query handling [Cassandra]. All of this can be managed with a resource/cluster management solution [Mesos].
Consider using the SMACK stack if your infrastructure needs to ingest data at a large scale without loss, and requires predictive analytics and real-time personalization, with the capability to move data when migrating between servers and data center. While these are the main reasons to consider the SMACK stack, the various technology within this stack offers many more benefits to your infrastructure.