Coral8 5.5: Snapshot Queries with Live Updates
History lesson: Continuous and Snapshot Queries
First generation CEP engines started with the concept of Continuous Queries. Unlike database queries that return a snapshot of data at a given time, Continuous Queries are registered once, subscribe to the relevant input streams, and then continuously produce streaming output on one or more output streams. Very nice. Continuous Queries are the foundation of all CEP engines today.As nice as Continuous Queries are, it turns out that sometimes CEP users want to issue the traditional old-style Snapshot queries. Why would they want to do that? Many reasons, really. Some of them want to integrate with traditional BI tools that cannot output results of Continuous Queries. Some really want to get the one time snapshot of the data. Some simply want to query the state kept in the memory of their CEP engine, in so called "windows".
To solve this problem, Coral8 added a while back the notion of "public windows", which can be queried using plain old standard SQL in a snapshot fashion. Problem solved? Not so fast...
Snapshot Queries with Live Updates: Why?
It turns out that what many customers really want is a combination of Snapshot and Continuous Queries. They want to get the snapshot result of the query, followed by a live stream of updates to the result of the query! Why do they want it? Let's look at a couple of examples:A financial institution may use a CEP engine to track the list of all order for a day in a window. At any given time, one may want to ask, how many orders in the incomplete state are there, grouped by trader, security, or exchange? So far it sounds like a snapshot query, right? But once the result is returned, the user typically wants to keep track of the results of this query! As more orders are added and existing orders change their state, the counts returned by the query are changing. People want their CEP engine to track these changes.
Consider an even simpler example from the RFID tags. Let's say that you are tracking the list of all assets with RFID tags in a Coral8 window. Imagine you want to query the window for the list of all tags that have not sent an update to a reader in the past 10 seconds. It's a useful list, but it can be even more useful if it's dynamically updated as the underlying window changes!
Snapshot Queries with Live Updates: How?
Hopefully these two simple example convince you that Snapshot Queries with Live Updates are very useful in a number of settings. So how do they work?One challenge with building this feature was that a standard non-public Coral8 window is only visible from the module where it's declared. Public windows are visible from the outside, but they only allow snapshot queries via SQL, and there are no live updates. So to implement Snapshot Queries with Live Updates, we added a new and powerful feature: Shared Windows.
To share and query a window across modules, one sets up a Master Window, which keeps track of all the relevant data (orders, RFID tags, etc.). Once a Master Window is set up, a Mirror Window can be created in a separate module, and it can dynamically get a subset of data from the Master window. Once this initial subset is received, the Mirror data starts receiving updates on all additions and deletions from the Master window. Thus, a continuous query on the mirror window first returns the snapshot, followed by live updates. Voila!
Of course, the beauty of this whole set up is that a mirror window and the query over the mirror window can be set up completely dynamically. Coral8 has always been known for its ability to register queries dynamically, so this is probably not too surprising. In our testing, a mirror window with a new query can be created, compiled, registered, and relied upon to produce a result in well under 1 second! I personally find this quite amazing.
Want to learn more?
Of course, this blog post has just a very short intro to the magical world of Shared Windows and Snapshot Queries with Live Updates. To learn more, you can refer to our documentation (search for Shared Windows), or better yet, read an excellent white paper by Bob Hagmann on this very subject. It covers how Snapshot Queries with Live Updates work, how they can be created in CCL, how they can be registered dynamically from Java, and lots of other relevant topics.Mark Tsimelzon, President & CTO, Coral8
Cool Demo: Coral8 for Homeland Security
But luckily, one of our very own engineers, Dave Clark, has recently put together a really cool demo that we can share with the general public. This demo showcases using Coral8 to track terrorist suspects throughout the United States, as well as correlating their phone conversation records with their locations. The goal, of course, is to prevent terrorists from doing something nasty.
The demo has four parts:
- The first one is just an intro, where the basic setup is discussed.
- The second is a cool demonstration of tracking the movement of suspected terrorists using Google Earth.
- The third is a very detailed explanation of how CCL is used to code this demo. Unless you are a programmer, you may want to skip through most of the third one.
- Finally, the fourth one showcases using Coral8 Portal to visualize information through real-time dashboards.
DISCLAIMER
Now, before somebody decides to report me to the authorities for publishing this: These demos are here for educational purpose only. They are designed to illustrate how Coral8 may be used in the homeland security context. No classified information was used to build these demos. Any similarities to any real life homeland security projects where Coral8 is or is not used are entirely coincidental.Mark Tsimelzon, President & CTO, Coral8
What's a CEP engine, anyway?
But other people have heard about CEP engines. They've been to web sites of CEP vendors. They've read articles and blogs about CEP. Yet they are still utterly confused as to what a CEP Engine is and what it's good for. To be honest, you can't blame these people either, and here is why.
I am yet to see a simple explanation of what a CEP engine is. Some folks in the CEP community (no finger pointing, please) claim that CEP is something so wonderfully complex that no CEP engine can really do it yet. I can see how this might be confusing! Some CEP vendors focus a lot more on specific applications, especially in capital markets. Some (ok, you can point at me) focus on enumerating specific CEP Design Patterns, which makes people believe that they have to understand all these design patterns to get the basic idea of what a CEP engine is.
Nothing can be further from the truth. Let me present a one sentence definition of what a CEP engine is:
A CEP engine is a platform that makes it easy to write and deploy applications that process and analyze real-time data.
That's it! This simple definition covers the Coral8 CEP engine, as well as all other CEP engines built by today's vendors. Now, let's analyze the definition word by word:
A CEP engine is a platform... This addresses a common misconception. A CEP engine is not an application! It does not magically do Risk monitoring, Intrusion detection, Fraud prevention, Web site ad targeting, or anything of this nature out of the box. All these, and many other applications can be written on top of a good CEP engine, but the engine by itself does not do this any more than a database knows how to query itself.
...that makes it easy... This is key. All the wonderful CEP applications people talk about can be written without CEP engines, using traditional languages such as C++ or Java. Anybody who tells you otherwise is not being honest. But a CEP engine makes it a whole lot easier to write these applications! The saving in time, resources, and therefore money, according to our many customers, is easily a factor of 10!
...to write and deploy... A CEP engine is not just a development tool, but also a scalable infrastructure for deploying CEP applications. Never forget about this. If you are going to analyze 100,000s of events (pieces of real-time data) per second with sub-millisecond latency, you need a special piece of infrastructure - a CEP engine.
...applications that process and analyze real-time data. This is the key. A CEP engine is all about real-time data. If you want to collect, filter, aggregate, enrich, transform, correlate, and store real-time data, or do pretty much anything else with it, then you may benefit from a CEP engine. In fact, I would be really curious to find out about applications that need to deal with real-time data that cannot utilize a CEP engine.
I hope this makes sense. Of course, in this post I have not explained how a CEP engine works with real-time data, how it is programmed, what all its wonderful features may be, and so on. My goal was to provide a one sentence definition of what a CEP engine is, period. Whether I failed or succeeded, I would like to hear from you.
Mark Tsimelzon, President & CTO, Coral8
Webinar: Third Generation Algorithmic Trading & Execution with Complex Event Processing
Date: Wednesday, October 8, 2008 (Tomorrow!)
Time: 9 am Pacific / 12 noon Eastern
More info:
Financial markets leaders are turning to third generation algorithmic trading in response to new market dynamics:
- Increased market speeds – fast shifting markets drives the need for sophisticated “sense-and-respond” algorithms, versus the simple, static ones;
- Lower latency – shortened “window of opportunity” to execute trades demands engines that make complex decisions and fire-off trades in milliseconds;
- Expanded trading venues – global market access, ATSs and ECNs increase the opportunities to find liquidity but also introduce increased information sources and complex decisions.
More sophisticated than the hard-coded algorithms of the first generation, and more dynamic than the simple market monitoring algorithms of the second generation (the early Complex Event Processing applications), the third generation algorithmic trading focuses on adaptive algorithms that respond continuously to these rapidly changing market conditions.
Please join Coral8's webinar and learn how Coral8 customers leverage a real CEP-based algorithmic trading and execution framework to deliver:
- Native platform for consistently low latency and high throughput – Coral8’s native C++ platform is able to analyze large sets of information AND execute actions before the market moves;
- Sophisticated analysis – Coral8’s Continuous Computation Language (CCL) allows developers to rapidly define applications that manage high-volumes of high-speed data and enables quants to easily implement complex algorithms;
- Agile integration – an agile application environment facilitates quick and easy integration of new market data, execution venues, and creative algorithms through a modular framework.
Mark Tsimelzon, President & CTO, Coral8
On Streaming SQL Standards
Given that this paper is published by Coral8's competitors, you may be surprised to hear that there is a lot in this paper that I very much agree with. First of all, I certainly agree that we need a streaming SQL standard. We all have been talking about this for a while. Yet some folks in the industry have always made it sound like creating a streaming SQL standard was easy. The paper, for the first time, shows why this is not so, even though it covers a very small portion of the hypothetical language standard - the execution model. It does not talk about syntax, semantics, data types, built-in operators, integration with databases and other external systems, and a host of other relevant issues.
But let's look a what this paper is actually talking about:
As the industry matures, there is a need for a single standard language. It is tempting to say that such a language is simply an agreement over simple syntactic differences. About a year ago, Oracle and Streambase embarked on a project to create a convergence language in which these simple differences would be resolved. What emerged was a realization that there were fundamental differences in the basic model that made convergence difficult. From both sides, there were things that one model could do that the other model could not do. What was needed was a new model that spanned the original two.
To really understand the two execution models you'll have to read the paper, but to give you a flavor of what we are talking about here, I'll just say that the Oracle model is time-driven, and it treats tuples with the same timestamp as happening simultaneously and being a part of a batch. The Streambase model is tuple-driven, and tuples are always considered ordered. There are no batches.
The differences in the two models give rise to very different behaviors, so the authors of the paper come up with a unified model, which combines the benefits of the two models. Do I like the new model? Of course I like it, because this model is very close to the model that Coral8 has been using for the past few years! Our execution model is also batch-driven, and the order of tuples is preserved. You'll have to read the paper for more details, though.
Now, while I think this paper is quite useful, I'd like to call for a more open standards process. Right now there are multiple cliques of companies trying to come up with a standard, and the work is not progressing very fast. It's a hard problem, as this paper demonstrates! Maybe we'd save the esteemed authors of this paper some time if we told them about our execution model. At the very least, we'd have pointed out that their understanding of how CCL works (from the Related Work section) is quite wrong! :)
Mark Tsimelzon, President & CTO, Coral8
Webinar: Real Time Risk, Profit & Loss Applications
The Webinar will take place Wednesday, August 20th, at 9 am Pacific / noon Eastern. Our very own John Morrell, VP of Product Marketing, and Justyn Trenner, CEO of ClientKnowledge, will present.
You will view a live demonstration of a flexible, granular Coral8 CEP-based risk and P&L monitoring solution developed by ClientKnowledge and learn the benefits and capabilities it offers, including:
- Developing powerful real-time risk and P&L monitoring solutions in a fraction of the time using advanced, SQL-based CEP language like Coral8 CCL;
- Quick deployment and easy extension of a Coral8 CEP-based, modular risk and P&L monitoring framework.
Mark Tsimelzon, President & CTO, Coral8
Webinar: Real Time Risk, Profit & Loss Applications
The Webinar will take place Wednesday, August 20th, at 9 am Pacific / noon Eastern. Our very own John Morrell, VP of Product Marketing, and Justyn Trenner, CEO of ClientKnowledge, will present.
You will view a live demonstration of a flexible, granular Coral8 CEP-based risk and P&L monitoring solution developed by ClientKnowledge and learn the benefits and capabilities it offers, including:
- Developing powerful real-time risk and P&L monitoring solutions in a fraction of the time using advanced, SQL-based CEP language like Coral8 CCL;
- Quick deployment and easy extension of a Coral8 CEP-based, modular risk and P&L monitoring framework.
Is CEP Mature? Or a Curious Case of Information Asymmetry
(Yes, I'm trying to learn from the best, in this case from the very excellent blog by Opher Etzion which has very cool pictures in every post.)
A lot has been written on the various CEP accomplishments over the past five years, but also about the challenges that lie ahead. Personally, when I think about the current state of CEP, I can't help but think about the famous words of Winston Churchill: "This is not the end. This is not even the beginning of the end. But, it is, perhaps the end of the beginning."
It's hard to disagree with this statement. If you compare CEP to a truly mature market, e.g., the market for relational databases, then it's easy to see that CEP has a long way to go. Yet if you compare the state of the CEP to, say the year 2003, when Coral8 was founded, it's hard to argue that the CEP community has not made a lot of progress.
Then why is this debate raging? One may argue that the people who claim that CEP is mature are mainly CEP vendors, and it's in their interest to do so. But I think the answer is a bit different.
As an amateur economist (isn't everyone these days?), I like to think about what makes markets efficient or inefficient. One concept that comes up all the time is information asymmetry. Information asymmetry refers to situations when buyers and sellers in a market have different levels of access to information. Consider, for example, the markets for used cars or health insurance. In the first case, the seller of the car knows a lot more about the car than the buyer. In the second case, the buyer of health insurance knows a lot more about his or her health than the seller. This makes these markets highly inefficient, as anybody who's tried to buy a used car or health insurance will readily testify.
What does it have to do with CEP? A lot. I claim that CEP vendors know a lot more about CEP success than the general public does. I'm sure you've seen a table compiled by Tim Bass listing the number of publicly announced customers for each CEP vendor. The table would be pretty funny, if it was not so sad. According to this table, no CEP vendor has more than 4 customers! This is hardly a sign of a mature market.
I don't want to speak for every CEP vendor out there, and I don't want to name names or exact numbers, but I know for a fact that every major CEP vendor has several dozen paying customers. Not thousands of customers, not hundreds of customers, but several dozen - for sure. I believe this is why vendors have a more positive view on CEP's maturity - they see the use cases and success stories that support the view!
As I mentioned earlier, information asymmetry is not a good thing. It prevents the CEP market from developing faster, as prospects have a skewed view of how mature the products really are. So why do existing customer resist telling their stories, and even resist being named? "We believe that the use of Corla8 gives us a strategic advantage over our competitors. Why would we want to clue them in?" - this is what we, and probably other vendors, hear over and over and over.
It's a bit hard to argue with this logic, but again this represents a problem for accelerating the growth of CEP. I think identifying the problem is half the challenge. But what can we, as a community, do to solve this problem? Perhaps we can learn something from other communities? If anybody has any great stories or suggestions, I'd like to hear them.
Mark Tsimelzon, President & CTO, Coral8
What Makes a Programming Language Successful?
But there are a number of comments posted on this page, and they have typical Slashdot tags: "Interesting", "Insightful", "Informative". One of the comments is marked "Funny", but I think it is anything but. The comment answers the question "What Makes a Programming Language Successful?" with a simple one liner: "Those who don't know how to use it."
I think this deserves at least "Insightful," and this is why. You remember that a few posts ago I wrote about what makes a Coral8 expert, and we had a lively discussion on this topic, both in this blog and others. I still stand by everything I said there, but that was the post about experts. Most users of every programming language, including CCL, are not, and will never be experts.
There is no shame in not being an expert. Every person usually learns just enough of various skills to accomplish whatever tasks he or she needs. And this is what a programming language, or any other tools, should be good at - helping people, especially beginners, accomplish whatever tasks they need.
So what's the most important quality that the programming language should have? As I said, I have not read the article, so I'll give you my own answer instead. I strognly believe that first and foremost, the programming language and the programming model should look familiar.
This is one reason why CCL is based on SQL. Is SQL the only language that you could base a CEP language on? No. It's certainly a good choice, I'd even argue it's the best choice, but it's not the only one.
But whatever you choose, you must make sure that the language looks and feels familiar. Beginner users do not have time to study lengthy manuals to learn a new language, even if this is the most elegant language in the world. The language either "makes sense" to them, or it does not. In this sense, the people who don't know the programming language, or who don't know it very well, are truly the ones who make it successful. Remember, even every expert starts as a beginner.
At Coral8, we try to keep this in mind as we are building our products. We like it that Coral8 and CCL get great reviews from CEP experts, but we also like it that somebody who has never heard of CEP can download our product and build their first real CEP application within a few days, without much, if any, help. We hear about this happening every day. Their CCL may not be the most elegant or optimal, but it works. As long as it keeps happening, I know we are doing something right here.
Mark Tsimelzon, President & CTO, Coral8
Complexity Scorecard
For each of the ten questions, please choose one answer. The more points you score, the more sophisticated your application is.
What is the combined data rate you need to support?
- Less than 1 event/sec: 0 (Just curious, why are you reading this blog?)
- 1-100 events/sec: 1
- 100-1000 events/sec: 3
- 1000-10,000 events/sec: 4
- 10,000+ events/sec: 5
What is the event processing latency you need to guarantee?
- Minutes/Hours/Days: 0 (Still not sure why you are reading this)
- Seconds: 1
- 100-1000 milliseconds: 2
- 10-100 milliseconds: 3
- 1-10 milliseconds: 4
- Less than 1 millisecond: 5
How many data streams do you have?
- One stream: 1
- More than one, but I don't need to synchronize events across streams: 2
- More than one, and I need to synchronize events across streams, and handle delayed and out-of-order events: 5
How large/complex are your input events?
- Small flat events (1-10 fields): 1
- Large flat events (10+ fields): 2
- Large non-flat / hierarchical / XML events: 5
What do most of your queries look like?
- Filtering and single-event transformations: 1
- Aggregation over different kinds of windows: 3
- Joins and state management: 4
- Event Pattern Matching: 5
How many queries do you have?
- 1-10: 1
- 10-100: 3
- 100+: 5
Do you need to interface with databases?
- No: 0
- Infrequent reads or writes: 2
- Frequent reads/writes, but no read/write caching: 3
- Frequent reads/writes, need basic caching: 4
- Frequent reads/writes, need granular on-demand row caching: 5
Do you need to scale beyond one CPU?
- No: 0
- Yes, to multiple CPUs/cores: 2
- Yes, to several machines: 4
- Yes, to a large cluster: 5
What are your High-Availability / Data Persistence requirements?
- None: 0 (I don't believe you, but that's ok)
- I need to recover from failures, but I don't care if I lose data: 2
- I don't want to lose data, but I'm ok with losing a few events here and there: 4
- I never, ever, want to lose a single event!: 5
What kind of interface do you need for business users?
- None: 0 (Who is paying for your project?)
- They are ok with static reports: 1
- They want real-time dashboards: 3
- They want real-time dashboards, and they want to configure dashboards and actions dynamically: 5
Now, add up your points:
- Less than 15: Your application is not too sophisticated. You don't really need a CEP engine right now. A CEP Engine will still save you time and money, especially over the long term, but you don't really need one
- 15-30: Your application is of medium sophistication. Using a CEP engine is highly advised, but the good news is that most CEP engines should be able to handle it.
- More than 30: You application is sophisticated. Please choose your CEP engine very carefully, as most CEP engines out there will not be able to handle it.
That's it! Of course, this scorecard is very approximate, and I have not covered a lot of areas: adapters to external systems, SDK, deployment, management, security, determinism, etc. But I hope this is still useful. Please let me know if you have any feedback or questions.
Mark Tsimelzon, President & CTO, Coral8
More on CEP and Complexity
It's easy to argue against small points in Greg's post. For example, Greg writes "Coral8 sounds more like a lab tool for the engineering department of Caltech or Stanford than a tool for everyday business users." Well, everybody knows that Coral8 Engine, as well as other CEP platforms, as opposed to CEP applications, are tools for developers, and not for business users.
Or, it's easy to point to other very successful examples of enterprise software products that are more complex than the Coral8 Engine. But I don't want to do this. Because Greg has an excellent point: if a product is needlessly complex, nobody is going to use it!
The place where Greg and I disagree is whether our product is needlessly complex. Here is my claim: the list in the previous post covers the features and skills needed to write sophisticated real-world CEP applications. Those are the applications that process tens and hundreds of thousands of events per second, coming from a number of very different data sources, integrating with databases, message buses, and numerous other third party systems, and performing highly complex analysis with millisecond latency. All this while being highly scalable, available, and secure.
Unfortunately, most of discussion around CEP today focuses on very simple use cases. Everybody publishes their implementation of VWAP (Volume-Weighted Average Price) and some benchmarks for VWAP. Well, if CEP was limited to such use cases, people would be right to consider CEP just hype. Writing a simple CEP application is indeed quite easy in C or Java, as Greg correctly notes. It's when the complexity of the application grows that CEP engines start to shine.
How do I know this? We have about 50+ customers now, and good 90% of them had started writing their applications using traditional languages, such as C or Java. And they all were happy for a while. But then they needed to add more streams, more data sources, and more complex queries. They needed to talk to databases and to Web Services. They needed to handle XML events and complex event patterns. And most of all, they needed to make sure their application would never fail.
That's when they start looking around, find this whole field of CEP, and realize that the Coral8 Engine is indeed a great choice for writing highly sophisticated CEP applications. Our product makes accomplishing all the tasks listed in the last post easy. Is there a learning curve? Sure. It does not take a week, as Greg suggests, to learn each of the 60 topics. It takes maybe an hour, give or take. Which means that in under two weeks customers can become true CEP experts, if their applications demand this level of expertise. If you consider what they get for their effort, you'll understand why they feel the effort is well worth it.
So, if I may offer Greg some friendly advice, here it is: if your CEP application is simple, by all means use your C or Java programmers. I'm sure they'll do a fine job. But if or when you realize that the requirements are more complex, feel free to give us a call. We'll be happy to talk to you and your team about our product.
Mark Tsimelzon, President & CTO, Coral8
What makes a Coral8 Expert?
At the same time, many production Coral8 applications are quite complex, and to build these powerful applications one needs a certain amount of knowledge and expertise. This is true of any CEP engine or any powerful enterprise system. So I am asked more and more often by customers and partners: What makes a Coral8 expert? How do we train our employees to best take advantage of Coral8?
We don't have a formal "Coral8 Certification" course yet, but I thought I'd put together a list of areas where one needs to develop expertise to be considered a true Coral8 expert. Of course, the list is not meant to scare anybody away. Plenty of people use Coral8 successfully without understanding 20% of these topics. But if you want to become a Coral8 Systems Integrator Partner, then here is what you need to know:
CCL
Windows
- Different kinds of windows (time-based, row-based, sliding, jumping, largest/smallest, partitioned (PER), multi-policy, bucketc, etc.)
- Named windows and implicit windows
- Using aggregations and windows
- Using windows to keep and manage state
- Querying windows (public windows)
Joins
- Stream to Window join
- Window to Window joins
- Outer joins
Patterns
- Basic pattern operators ([], ",", ||, &&, !)
- Nested patterns
- Causality tracking (XMLPatternMatch())
XML
- XML datatype
- Working with XML: parsing, constructing, transforming, joining, etc.
Modules
- Using modules for abstraction and code reuse
- Module interface: input/output streams and parameters
Procedural features
- functions
- variables
- conditionals
- loops
User-Defined Functions
- Calling scalar user-defined functions written C
- Calling aggregation user-defined functions written in C
- Calling RPC style user-defined functions
Design patterns and basic recipes
- Design Patterns
http://www.coral8.com/system/files/assets/pdf/Coral8DesignPatterns.pdf - Cookbook
http://www.coral8.com/system/files/assets/pdf/5.2.0/CCL%20Cookbook.pdf
Integrations with databases
- Reading data from a database using database subqueries in CCL
- Writing data into a database using EXECUTE STATEMENT
- Configuring database connections
- Caching
- Performance and reliability options
- Using adapters to play back data from databases
Messaging layer
- stream URL
- publish/subscribe
- timestamp assignment: message timestamp vs server timestamp
- delayed messages and synchronization across multiple streams
- out-of-order message handling
Adapters
- Adapters shipped with the product (message bus, database, file, socket, specialty, etc.)
- In-process vs. out-of-process adapters
- Writing adapter in C/C++, Java, .NET, Perl, Python
Enterprise features
- State Persistence
- High Availability: different options
- Guaranteed Message Delivery
- Combining all Persistence, HA, and GD to meet the requirements
- Clustering for performance and failover
- Security: SSL encryption, authentication, authorization, integration with LDAP and other systems
Performance
- How to maximize throughput of Coral8 Engine
- How to minimize latency of Coral8 Engine
- How to write high-performance adapters
- How to take advantage of multiple CPUs or cores
- Hot to take advantage of clustering by splitting streams or queries across machines.
Studio
- Creating CCL projects and modules
- Using environment files to manage connections to workspaces
- Monitoring server performance, throughput, latency and resources through the Studio
Development processes
- Using multiple workspaces for development, QA, production, etc.
- Using source control
- Upgrading the server & studio from one version to the next
Command-line tools and their options
- c8_compiler to compile CCL
- c8_client to manage the server, pub/sub to streams, submit queries dynamically, etc.
Portal
- Setting up the Portal
- Creating query templates for use with the Portal
- Different visualization options
- Actions
That's it! Not too scary, is it?
Mark Tsimelzon, President & CTO, Coral8
Deployed Globally!
Of course, we've had customer in the U.S., Europe, and Asia for a while. Some of them are financial services institutions, some are large web sites, some build or operate large sensor/RFID networks, and so on. But we also have a couple of OEM customers in Australia: they are in the network/server/app management space. And we have a customer in South America, who manages a large power grid using Coral8. And now with this bank in South Africa, we have Africa covered!
Just to be clear, I am talking about customers that have purchased the Coral8 product, rather than developers who've downloaded the software from our web site. If we count developers, the count will be in thousands. I don't have the exact count, and some don't even tell us where they come from, but I'm sure they come from at least 50 countries, from all over the globe.
Although I must admit, we have neither customers nor developers coming from Antarctica. Does anybody know what kind of CEP applications penguins are interested in?
Mark Tsimelzon, President & CTO, Coral8
On Premature Optimization
Now, this is a great coffee shop, and the workers usually know what they are doing. Normally, there are two people filling orders, and in this case indeed starting the drinks first would be faster. But in a different situation with only one worker present, the guy's intuition was wrong. The algorithm that was great for two "processors", was not so great for one.
What's the moral of the story? We all like optimizations, but faced with new unfamiliar circumstances, our intuition often fails us. And what does it have to do with CEP? CEP is still fairly new to most people who build CEP applications. I wrote in my previous post that one of the first questions we hear from people is "how fast is your engine". Now, you will not be surprised that one of the more common follow up questions is "Will I make it faster if I do X"? Often, this is before the requirements are even firmed up, let alone the application is coded and profiled!
Now, CEP engines are complex beasts, and they contain numerous internal optimizations. The Coral8 Engine, for example, contains a large number of optimizations, some of which, e.g. push filter to before joins are familiar to SQL developers. Other optimizations, however, are completely new: automatic data indexing, optimized memory management to conserve space, sophisticated data caching, a unique threading model to limit context switches and take advantage of multiple CPU cores, a very lightweight messaging layer, and many others. We are taking great care to optimize throughput, latency, and resource consumption.
The flip side of this, however, is that it may not be immediately obvious to someone how fast their first Coral8 application will run on our engine. Therefore, our advice to all Coral8 developers is very simple: don't worry about performance at first. Write your application, or a representative portion of it first, and then run it. The Coral8 Engine will tell you how much CPU or memory it takes, what the throughput and latency are, and will help you pinpoint the problems. Then, if necessary, you can start your optimizations. But chances are, you won't have to: The vast majority of our customers are surprised at how fast their application is once they write it.
Anyway, I know I am not saying anything new here. Don Knuth said over three decades ago that premature optimization is the root of all evil. This is especially true when you are working in a new and unfamiliar environment. The SQL-based language used by Coral8 and other CEP engines looks familiar, and indeed the similarity with SQL makes programming simpler. But it helps to remember that the implementation of any CEP engine is very different from that of relational database, and is already highly optimized for CEP applications to begin with. So it's best to build the application first, measure the performance, and then start worrying about optimizing it if necessary.
Mark Tsimelzon
President & CTO, Coral8
Something funny to start the year with
(I have no idea whether it's legal to post it here, but hopefully it's ok)
I am not going to explain here why most performance numbers and benchmarks should be treated with extreme suspicion. They typically do not fully specify what's being computed, what the incoming events look like, how they get into the engine, what processing options (guaranteed delivery, failover, etc.) are enabled, what exactly is being measured, and so on. This has already been explained in other places.
I'll just note that we get a fair number of customer inquires that look like this "Hello, could you, please, tell us 1) how much your engine costs? 2) how many messages it can process per second?" That's it! Without any explanation of what they want to do, what the application looks like, even what domain it is in. Nothing!
I wonder if they send such requests to multiple CEP vendors, and then cleverly divide the second number by the first, to establish a CEP "message/$" rating for each vendor?
Mark Tsimelzon
President & CTO, Coral8
CEP: Library or Server Infrastructure?
Yes, it is possible to link the Coral8 server into another applications. But I want to make a bold claim here. The future of CEP is not about being a cool event processing library embedded into individual applications. The future of CEP is a shared, scalable, and reliable enterprise-wide server infrastructure.
This should not be a bold claim. Think back to my favorite analogy, databases. It's true that there exists a market for embedded database libraries. For example, SQLite is a nice simple library that can be easily linked into C applications. There are many others. But this market is dwarfed by the market for database servers. Oracle, DB/2, SQL Server, Sybase - all the common databases are typically not embedded into an application, but rather run on a separate server or cluster of servers.
Why is this the case? Let's come back to the CEP world, where the advantages of an infrastructure over a library are even more obvious.
Sharing
Let's start with sharing. Even though people typically start with one application, they quickly start deploying more and more. We have customers that have deployed dozens of applications by now! These applications all talk to each other via event streams. Moreover, these applications may dynamically deploy queries to each other. Rapidly integrating new applications is very hard if each application has to link in its own standalone CEP engine.Scalability
What if the demands on a CEP applications grow, and a single server does not have enough CPU power to satisfy them? If the CEP engine is linked in as a library, there is not much one can do in this situation! On the other hand, a good CEP infrastructure can distribute event processing to multiple servers, balance the load, and satisfy the most stringent scalability demands.Reliability
What happens if a CEP engine is linked in as a library, and the machine on which it runs dies a sudden death? Well, you've got a problem. You don't have another server standing by ready to take over. You can, of course, replicate both the application and the CEP engine on two machines, but you'll have to figure out how to synchronize the two copies, how to share state, etc. All these problems are already solved by a good server infrastructure product such as Coral8.But wait...
So why do people even consider linking a CEP engine as a library? There are a few reasons, but they are somewhat misguided:First, some people don't like the idea of starting a new process for a CEP server. While I sympathize with this sentiment, the price of starting a new process is usually not that high. Complex applications these days have many parts, and always never reside in one process anyway.
The second reason sounds more serious: latency! People think that if their application has to send events to a separate server, and then receive results back, the total processing latency will be a lot higher. Thus, they want to link the CEP server into their application.
This sort of makes sense, except for where do the events come from? In most cases, they don't come from the app, but from some third place: from a financial exchange, a web site, a sensor network, a message bus, etc. The right solution is not to send these events to the application, but directly to the CEP engine! Good CEP engines come with highly optimized adapters, some of them running in the process of the CEP engine itself. These adapters can process incoming events much faster than whatever an application can do! We have seen how some Coral8 adapters can process events in well under 100 microseconds! Why would an application want to re-implement these adapters? It makes little sense.
Finally, the third reason may be more emotional than anything else: Control. Some developers feel that if the engine is "just a library", they have more control over such things as threading model, memory management, and so on. Well, again, why would somebody want to worry about these issues, when CEP vendors spend years and years optimizing the performance of the CEP engines?
The CEP Cloud
I hope I've convinced you that the future of CEP belong to server-based infrastructure rather than CEP libraries. In fact, we have customers today who don't even care about our performance on a single server. They have a vision of running a cloud of CEP servers, and deploying hundreds and thousands of applications on this cloud. Of course, the cloud of these servers has to be completely self-managing. One does not need to worry about which server or servers are running a given app at any given moment, what happens when demand changes, what happens when servers go down, etc. We like this vision a lot. This is our vision, too. You just can't realize this vision via a simple library.
Mark Tsimelzon
President & CTO, Coral8
Einstein, Time, and CEP
It turns out that handling time correctly inside a CEP engine is a very complex issue, with many implications. Some approaches to time handling in CEP systems are discussed in depth in the 2004 paper by U. Srivastava and J. Widom: Flexible Time Management in Data Stream Systems, but the paper is very technical. There is a much more approachable paper on Coral8's web site written by our very own Bob Hagmann: Fundamentals of Time in Coral8. I am not going to repeat here what these papers say, but rather I want to talk about some practical implications, relevant to anybody using a CEP engine.
Before I go there, however, I want to explain why the whole issue is so intricate. The first reaction of some folks new to CEP, when first exposed to time management issues, is predictable: why are you making this so complex??? I'm sure Einstein has heard the same objection. But in this case, the objection seems quite logical. Most non-CEP applications just don't worry about time too much. Every computer has a system clock, which reliably keeps the time. Every operating system has a number of system calls to access this clock. So what's so hard about this?
The reason this problem is so hard is that CEP applications are typically highly distributed. Events are often generated far away from where they are processed. Think of sensor networks, or of trading applications that subscribe to data feeds from multiple exchange. Even if physically the data source and the CEP engine are not that far from each other, there may be non-trivial latency in getting events in.
So by the time the events from multiple source reach the CEP engine, they may be delayed, typically by different time deltas, and may even be out of order! If you want to analyze precisely what has happened, the system clock has very little relevance to the times when events were generated! So rather than using system clock, the CEP engine must use virtual stream clock, which is the clock driven by the arrival of events on one or more input streams.
If events arrive fast and in-order, this virtual stream clock may just run slightly behind the system clock. But if events are delayed, and have to be synchronized across multiple event sources, and have to be pre-sorted to handle out-of-order issues, then the "virtual stream clock" is quite different from the system clock.
Therefore, the Coral8 Engine provides two modes for analyzing data: one according to the system clock, another according to the virtual stream clock, or how we call this in our documentation, according to event timestamps. And in this latter mode our engine can automatically synchronize and pre-sort events coming from multiple data sources. This takes the deep magic described in the Stanford paper.
If one is analyzing real-time data, virtual stream clock is normally somewhat behind the system clock, due to transport layer delay. There are also use cases where the virtual stream clock runs much faster than the system clock! For example, one of the common use cases on Wall Street is back-testing of trading strategies. Once somebody comes up with a new strategy, they need to test it on historical data.
Playing historical data back to the CEP engine is not a problem, but people typically want to speed up this process as much as possible. So now the time is compressed! Einstein would appreciate some of the issues this causes. What in reality took 1 hour, may take 1 minute in the accelerated playback mode. A jumping 1 hour long window will be emptied every 1 minute, not every 1 hour. Of course, the Coral8 Engine implements the accelerated mode correctly, where no matter how fast you go, the results are guaranteed to be exactly the same.
I've touched upon some of these notions in my post on Determinism in CEP, but I did not quite explain why some CEP engines are deterministic, and some less so. Hopefully now this is a bit clearer. Some engines handle virtual stream time, and some do not. Some handle virtual stream time for one stream, but not for many. Yet handling virtual stream time for multiple streams is the only way to address the issues we have talked about here. It may be non-trivial, but there is no way around it. Like Einstein said, everything should be made as simple as possible but not simpler.
Mark Tsimelzon, President & CTO, Coral8
A real-time view into a CEP evaluation project
Think about how things are changing! The whole process of evaluating and choosing a technology vendor is still quite opaque. The customer evaluates different products, yet nobody knows who all the candidates are, how the evaluations progress, what each product's challenges are, and how the winner is selected. The selection process is hidden from everyone. If you are lucky, a few months later the chosen vendor would publish a "use case", which will have more marketing than real technical meat.
Contrast this to the evaluation that Marc is doing. Everything is in the open. Competitors get to comment on each other's solutions. If Marc has a problem with some product, the world knows about it the next minute! This spirit of full disclosure may be uncomfortable to some, but we welcome it wholeheartedly. Both customers and vendors suffer when enterprise software is sold behind closed doors.
I guess if other customers start following Marc's lead, we'll soon have to use our very own CEP engine to subscribe to their blogs, to have real-time visibility of what our customers are doing and to respond to issues. We need to stay one step ahead of them. It's a good thing we already have an RSS adapter :)
Mark Tsimelzon
President & CTO, Coral8
Open CEP positions
Coral8 is hiring both in New York and California, but here I'm posting positions for Marc Adler, who is starting a major CEP effort at one of the main investment banks. Based on what I know about Marc and about this opportunity, this should be tremendous fun!
By the way, I think it's a very encouraging sign for CEP that despite all the current problems on Wall St, CEP projects still get funded! So here it goes:
Job Title: Various Developers for new Complex Event Processing Project
Description: Help Wanted for the Complex Event Processing Project
I have open headcount for about 4 or 5 people for 2008 for the Complex Event Processing project that I am running.
I realize that it is foolhardy to advertise for people who have prior
experience in CEP. What I am looking for are smart developers who have
a great passion to learn a new, interesting technology. The team that I
envision will consist of:
1) Visualization developer - come up with new, interesting ways to
visual events and data. The work may entail working with the .NET
framework that my team has built, integrating visualizations with
existing Java/Swing-based trader GUIs, or even exploring WPF (as the
company gradually embraced .NET 3.x). You could be investigating
visualization tools like heatmaps and you will definitely be evaluating
third-party tools (both commercial and open-source). You will be
involved in OLAP to some extent. There will be involvement in the
building out of a notification and alerting framework.
2) CEP developer. You will be building out the analysis part inside
the CEP engine. Most of the CEP engines use a variant of SQL, so you
should be fairly comfortable with SQL concepts. It would be nice if you
had previous experience with tools like Coral8, Aleri, Streambase,
Esper, etc, but even if you haven't, you should be willing to learn
these tools. You may also be interacting with consultants from these
companies.
3) Networking, messaging, and market data specialist. Help us
decide if we should migrate to a new messaging infrastructure (like RTI
or 29West). Experience with Tibco EMS is a big plus, as well as
experience with working with high volumes of data and low latency.
Interact with Reuters and Wombat infrastructures, as well as
internally-built market data infrastructures.
4) Data specialist. You will be the person who is responsible for
breaking down silos and getting good data into the CEP engine.
Experience with SQL Server 2005 and Sybase are important. Experience
with tick databases like KDB+ and Vhayu would be nice to have.
Everyone will be doing a bit of everything, so everyone on this team will be intimately aware of what everyone else is doing.
This is a highly-visible position in an investment bank that has
promised me that they will reward good talent who comes to us from the
outside.
In addition to the positions mentioned above, I have two or three
open headcount for people who want to work on the Ventana team. Ventana
is the client-side .NET framework that is being used by various groups
in our Investment Bank.
CEP as an Elephant
THE BLIND MEN AND THE ELEPHANT It was six men of Indostan, To learning much inclined, Who went to see the Elephant (Though all of them were blind), That each by observation Might satisfy his mind. The First approach'd the Elephant, And happening to fall Against his broad and sturdy side, At once began to bawl: "God bless me! but the Elephant Is very like a wall!" The Second, feeling of the tusk, Cried, -"Ho! what have we here So very round and smooth and sharp? To me 'tis mighty clear, This wonder of an Elephant Is very like a spear!" The Third approach'd the animal, And happening to take The squirming trunk within his hands, Thus boldly up and spake: "I see," -quoth he- "the Elephant Is very like a snake!" The Fourth reached out an eager hand, And felt about the knee: "What most this wondrous beast is like Is mighty plain," -quoth he,- "'Tis clear enough the Elephant Is very like a tree!" The Fifth, who chanced to touch the ear, Said- "E'en the blindest man Can tell what this resembles most; Deny the fact who can, This marvel of an Elephant Is very like a fan!" The Sixth no sooner had begun About the beast to grope, Then, seizing on the swinging tail That fell within his scope, "I see," -quoth he,- "the Elephant Is very like a rope!" And so these men of Indostan Disputed loud and long, Each in his own opinion Exceeding stiff and strong, Though each was partly in the right, And all were in the wrong! MORAL, So, oft in theologic wars The disputants, I ween, Rail on in utter ignorance Of what each other mean; And prate about an Elephant Not one of them has seen!
Now, what does this have to do with Complex Event Processing? It's just that too many discussions about CEP that I've observed or participated in, public on the Internet or private with prospects, customers, partners, reporters, analysts, and others remind me of this story. I certainly don't mean to call anybody blind, far from it, but I want to caution people against making assumptions about CEP based on limited information. All too often people focus on some aspect of CEP that they've heard about, and tend to forget that a CEP engine is a general platform for building a wide variety of applications dealing with real-time data, not a specific solution for one problem.
For example, I see a lot of confusion over what domains CEP applies to. I sometimes hear somebody who needs to process RFID data saying "Oh, but I've heard CEP is only taking a hold in financial services, right? What about RFID?". Worse, somebody working in financial services will say "But my app is all about risk management. Isn't CEP all about algorithmic trading?" If that was not bad enough, I've heard "Yes, we know somebody is using a CEP engine for fixed income trading, but we are so different, we trade equities!". Of course, all these applications of CEP are different, but the underlying platform is one and the same!
There is also confusion about what kind of processing is suitable for CEP. Based on some published examples, some people get an extremely narrow view of what CEP can do. "I've seen examples of how CEP can be applied to one stream, but it does not work for us since we have multiple streams" Or: "We need to correlate real-time events with data in a database. Surely you cannot do that!" Or: "We need to be able to call a web service to look up some reference data for an incoming event. I have not read about a CEP engine being able to do this, so surely this is impossible?". When we tell people that they can use the Coral8 CEP engine to do all of this, and a whole lot more, they get genuinely surprised.
I may have partially contributed to the problem by writing the Ten CEP Design Patterns white paper. While I'm happy that so many people have read it, I occasionally hear from people who make an assumption that something is not possible to do with CEP just because it's not mentioned in this paper!
I strongly believe that CEP is like the elephant from the poem: it's big and powerful, but few people can really appreciate everything that it can do. In fact, finding the things that CEP cannot do is becoming harder and harder. If you can think of a good use case related to real-time data analysis that a good CEP engine such as Coral8 cannot solve, I want to hear about it!
Mark Tsimelzon, President & CTO, Coral8.
Determinism in CEP
What is determinism? For the purpose of this post, we'll use a simplified definition:
A deterministic CEP engine will always produce the same results on the same input.
For example, if one stores streaming data in one or more files, and plays it back to the engine, then the engine will always produce the same results.
Why is determinism important?
Building complex CEP applications is hard enough, but it's pretty much impossible to build them without determinism. It's very hard to test an app or its parts if the engine is non-deterministic. How do you know that every small change in the app, in the algorithm or in the engine itself does not break anything? You don't.
Moreover, as you come up with new algorithms or strategies, you will often want to back-test them on existing data. How can you do this if the engine is non-deterministic, or if the results of running the algorithm on live data are different from the results of running it on historical data, not to mention running with acceleration to speed up your testing? You can't.
Interestingly enough, one can talk about degrees of determinism. Let's coin some terms here: an engine may be non-deterministic, single-stream deterministic, or multi-stream deterministic. Let's consider them in order:
Non-determinism
A non-deterministic engine does not produce reproducible results. By far the most common reason for this is that the engine does not process events according to event timestamps, and instead uses the time of arrival as the timestamp for the event. Also, it probably uses a clock for measuring windows. This is a big no-no if one wants to achieve determinism.
A good engine will typically have both options: to process events according to event timestamps, and to process events according to the time of arrival (the system clock). Both have advantages and disadvantages, but only the first option can guarantee determinism.
Single-stream determinism
There are engines on the market that can process events according to event timestamps, but only if the app has a single input stream. How can you test that the engine is deterministic for one stream? Here is a query you can run:
Insert Into Result Select Count(*) From S Keep 1 second;
This query should work in any good CEP engine, modulo trivial syntax changes. It creates a sliding window on stream S, and for each event it computes the number of events in one second prior to the event. If S has a reasonably high data rate (10,000+ events/sec), then one should be able to use this query as a test of single-stream determinism. Namely, if you run this query on the same stored data, it should produce the same results, provided you choose the option of using event timestamps for event processing. If every time you run this query you get different results (use 'diff'), then the engine is not single-stream deterministic.
Multi-stream determinism
Most interesting applications take more than one data stream. For example, many financial applications look at multiple feeds from exchanges and multiple FIX order streams. In this case, the relative order of events as they happened at their sources is very important! Unfortunately, achieving multi-stream determinism is pretty hard. One has to be able to handle and synchronize delayed events, out-of-order events, and so on. Such stream synchronization is tricky, so here is how you can test if you engine is multi-stream deterministic. Create a simple query which takes data from more than one stream, such as
Insert Into Result
Select *
From Orders as O,
Ticks as T Keep Last;
This query pairs each order with the last tick (of course, you can use any other streams). Again, try this query at high enough data rates. If your engine is multi-stream deterministic, you'll always get the same results! If you don't, then the engine is not multi-stream deterministic, simple as that.
Conclusion
I hope this post has demystified the notion of determinism a little. Of course, if you don't want to run all these tests, you can just ask me which engines are deterministic and which ones are not. I won't mention any names here, but as far as I know, the Coral8 Engine is the only commercial multi-stream deterministic CEP engine on the market today. If your tests prove otherwise, please let me know, and I'll gladly report so here!
Mark Tsimelzon, President & CTO, Coral8
Unclouding and streamlining your thinking about CEP use cases
I've been trying to understand why this is easy for some customers and hard for others, and I've recently observed a pattern I want to share. I'm sure it is controversial and does not explain the whole story, but still. Here it goes:
Those people who find it hard to come up with specific use cases think about The Event Cloud. Those who find it easy to do think in term of event streams.
Now, if you were following the recent "CEP vs. ESP" wars, you may be forgiven for thinking that the basic difference between clouds and streams is about ordering. Streams carry ordered events and clouds don't require ordering, or so the story goes. Well, we all know this is false. Coral8 Engine, as well as some other ESP/CEP engines can handle unordered streams. So ordering is a red herring.
The real difference between a cloud and a stream is much more significant. An event cloud is an abstract concept. In some sense, it does not really exist! An event cloud represents a collection of all the events flowing through an enterprise or a part of an enterprise. It may be useful to think about this collection from a theoretical standpoint, but in my experience, it does not help one come up with and understand use cases.
Streams, on the other hand, are much more real and useful. Each stream has a meaning and carries similar events. You can decompose your events into input streams, output streams, and intermediate streams. Each stream (at least in the Coral8 Engine) has a name (a URI) and is a publish/subscribe topic, so you may have multiple subscribes and publishers dynamically added. Also, each stream may have an access control list associated with it.
Once you think about a stream, you can think about the things you may want to do with it, such as filter events, maintain a window, aggregate events over a window, enrich the stream, join it with another stream, persist it to a database, and so on. Your use cases start practically defining themselves as soon as you decompose your problem into streams and operations on streams.
Hopefully this has convinced you to stop thinking about "The Cloud" and start thinking about multiple streams, at least for the purpose of defining use cases. As some folks know, I like finding non-CEP analogies for CEP topics. Here is one here: You'll never understand databases if you think that they just store "data". To understand databases, you've got to understand that they store tables. Similarly, to uncloud and streamline your thinking about CEP, start thinking streams.
Mark Tsimelzon, President & CTO, Coral8.
A cool discussion at the CEP-Interest Yahoo Group.
If one builds a CEP/BAM application that observes events in a complex distributed environment, what happens if the underlying core applications change the format and the nature of the events? Will one need to modify all the CEP applications that rely on these events?
The easy answer seems to be to create an event abstraction layer that abstracts the underlying events, but this raises more questions: Who creates this abstraction layer? How is it maintained? Does it introduce any overhead? For the latest in-depth answers, read the CEP-Interest Yahoo group (http://tech.groups.yahoo.com/group/CEP-Interest/messages).
Mark Tsimelzon, President & CTO, Coral8.
Great Customer Stories
But I am even more amazed by the stories of some engineers building complex applications completely on their own, without any help from Coral8. I remember visiting a large bank and starting the meeting with opening my laptop and saying "How about I show you a quick Coral8 demo". No, said the two engineers, how about we show you our demo of a trading application that we built with Coral8! The demo was pretty impressive, and so were the guy and the gal who built it. They downloaded the Coral8 product from our web site, read through the tutorial, and built a very nice prototype in just one week, without any training or help whatsoever!
It's stories like this that brighten my day, and the day of everyone at Coral8. Do you have a great Coral8 story? Do you want to share it? If you do, please leave it in the comments or send it to me: mark AT coral8 DOT com.
Mark Tsimelzon, President & CTO, Coral8
Our Biggest Competitor
Most of our customers are facing the "Build vs. Buy" decision. "Build it yourself" is always an attractive option that gets considered. Indeed, we have several prospects right now who have eliminated all the CEP vendors other than Coral8, yet "build it yourself" is still on the table!
Since I work for a software vendor, it's natural for me to say that you should always, or almost always, choose "Buy" instead of "Build". It's all about one's core competency, and building high throughput, low latency event processing engines is our core competency, but not our customers'. It all makes sense.
But recently, I asked myself a question: when Coral8 needs a certain component, how often do I choose "Buy" instead of "Build"? What concerns do I have with "Buy"? And how can Coral8 address similar concerns when we work with our prospects?
Just to be clear, I am not talking about business concerns here. For example, everybody has concerns about buying from a startup, but we at Coral8 put these concerns to rest very effectively once we discuss our customers, partners, team, funding, etc. And I am not talking about the price: most prospects usually agree that our price is quite fair. Finally, I am not talking about the concerns about the "Build" option. We all know what they are, but it's a whole separate discussion. I'm talking about technical concerns with the "Buy" option, such as:
Is the product easy to learn and use?
I don't want to learn brand new languages and computing paradigms. I've been known to reject products if I could not figure out how to use them in 20 minutes, and I am sure I am not an exception. Sometimes the product just does not "feel right", and this makes most prospects abandon their evaluation.What do we do about this at Coral8? First, we've based our programming language on SQL, the language familiar to millions of developers. Second, we've thought really hard about the programming model. Continuous queries subscribe and publish into data streams. Interaction with the outside world is handled by adapters, and adapters also subscribe and publish into streams. Queries running in the same process or on different machines interface via streams. We get high marks for the clean and easy to understand architecture. Third, we believe in providing as many tutorials, manuals, and examples and possible. People learn to program by "copy & paste", and we want to provide lots and lots of places to "copy" from.
Is the quality of the product and performance good enough?
I don't want to buy a product only to discover later that it's buggy and slow. Nobody does. Yet many products on the market *are* buggy and slow. So, what can one do about this?At Coral8, we do not believe that we can convince that our product is the best and the fastest by issuing press releases. We don't even want to convince anybody by telling them about our internal QA, unit tests, regression tests, compatibility tests, performance tests, and many other kinds of test we have here. We don't really want to convince anybody, period.
We want people to convince themselves that our product is great. We were the first vendor in the CEP space, by a wide margin, to publish our product on our Web site and encourage open downloads. We want prospects to download the product, exercise all its features, build their applications, run benchmarks, and compare our product to those of competitors and to their home-grown solutions. We think this is the only way to sell sophisticated software such as CEP engines.
Does the product provide enough visibility and control over its internal operations?
I don't want to buy a black box. I care about two things: Visibility and Control. Visibility is all about providing a view into how the product operates internally, how well it's doing, what resources are being consumed, what errors are being raised, and so on. And Control, of course, is all about having an ability to change this!At Coral8, we try to take Visibility and Control very seriously. We are provide countless runtime statistics via Studio, SNMP, Status streams, Status objects, command line tools, and other interfaces. We automatically measure and report everything from resource consumption to end-to-end latency. We want the customer to know how the engine is doing!
And if something is not working great, customers should be able to Control it. One can change server operation at runtime, again via the Studio, command line tools, SOAP interfaces, and Java, .NET, C/C++, Perl, and Python APIs. One can stop and start queries, and change various parameter values at runtime. Most customers are usually amazed about how much control over deployment and operation of the Coral8 server they have, but of course we can always do better.
Anyway, these are the things I care about when considering Build vs. Buy myself, and I think most customers are similar to me in this respect. But what about you? Did I miss anything you worry about when considering "Buy"? And if I did, how can we at Coral8 make sure we address these concerns?
Mark Tsimelzon
President & CTO, Coral8.
CEP and SOA
SOA and Web Services are all about interoperability and interfaces. SOAP, the corner stone of SOA and Web Services, is all about having one standard way of connecting applications and application components together. How those applications and components are internally implemented is largely irrelevant. Developers implement the logic inside web services using Java, .NET languages, C/C++, or any other standard language. The innovation is not in the implementation of the service, but in the interface and interoperability!
Contrast this to CEP. CEP is not really about interfaces. It's about a radically new way of implementing applications that analyze large volumes of real-time events. It's about new languages, even if those languages are based on familiar ones, such as SQL. It's about new runtime engines. It's about new algorithms. It's about new design patterns. In short, CEP goes way beyond interfaces.
Thus, while major CEP vendors spend a lot of time on helping developers implement CEP applications, they typically have less to say about interfaces. Indeed, there is less innovation here. The Coral8 Engine, for example, supports a number of standard integration technologies: sockets, JMS, MS MQ, MQ Series, Tibco Rendezvous, and even SOAP. Yes, SOAP. SOAP is not well suited for sending hundreds of thousands of input events per second -- the overhead of SOAP is too great. But it is great for triggering actions as the result of event processing.
Hopefully, at some point SOAP's limitations will be addressed, and we'll get a standard way of exchanging very large volumes of real-time events. That will be an exciting improvement on the interoperability side for CEP. But meanwhile, the main innovation is inside the engine, not outside. And this is a huge difference between CEP and SOA.
Mark Tsimelzon
President & CTO, Coral8
The CEP vs. ESP Debate: Enough Already?
It's hard to believe, but some folks are still continuing the debate that you'd think died a while ago: "CEP" vs. "ESP." I guess some people just like categorizations, and they find it useful to put products into artificial categories. Or maybe they like terminology debates. Or maybe this is their marketing strategy, I don't really know.
The debate is shifting all the time. Originally ESP was supposed to be looking at "streams", while CEP was supposed to be looking at "clouds". Then it was "ordered streams" vs. "unordered streams". Then "small windows" vs. "long windows". Then looking at one streams vs. looking at multiple streams. Then "latency matters" vs. "latency does not matter". I think now it's simple patterns vs. more complex patterns, but I am losing track of it.
All the while it's been perfectly clear that most products on the market are adding both kinds of features. Coral8 today supports all kinds of streams, ordered and unordered. Small time-based windows and very large windows with complex policies. Simple patterns and very complex patterns. Applications where our 1 millisecond latency matters, and applications where it does not matter. But the debate continues.
Meanwhile, I am yet to meet a customer who'd ask me: is Coral8 an ESP engine or a CEP engine? They don't care. The question they ask is "What can your engine do for me?" To me, this is a much more interesting question to talk about.
Mark Tsimelzon, President & CTO, Coral8.
My New Startup (and a few CEP use cases)
Excited as I am about this development, I do not claim it's a big news for the CEP community. Instead, having spent a few days and nights in a hospital, I want to share with you some thoughts and personal observations on the place of the CEP technology in this environment.
First of all, I knew that some of our partners have deployed Coral8-based location monitoring applications in hospitals throughout the country. I did not realize, however, how common it is for a hospital to lose track of their doctors, anesthesiologists, nurses. Yet every so often somebody would enter our room, asking a question like "Is Dr. Joe Shmoe here, by any chance?". There have been studies linking such disturbances to increased stress and infection for new mothers, and I can testify they are supremely annoying. I hear they also lose patients, but luckily we did not get to experience that. And of course they lose equipment even more often, which costs them millions of dollars.
But location monitoring applications is just the beginning. After all, to most folks CEP is all about real-time sensor data processing, cool dashboards, and alerts. I certainly saw quite a few of those! The baby pulse signal, the mother pulse signal, the baby movement signal, the contraction strength signal, the mother's blood pressure, and so on. All these signals need to be monitored. The signals have got to lie within a given range. And if they go outside of the range, it should not do so for long. Sliding windows, MAX and MIN within the window, you get the idea.
Today, the software they have monitors for simple conditions, such as signal value too high or too low. But more complex conditions are monitored by humans, who, like traders, look at screens full of charts coming from multiple patients all day long. A CEP Engine could certainly automate that.
Now, you may ask, monitoring a signal like heart rate is fine, but what about correlations? They have those requirements in spades! Here is a cool example I learned: You are watching two signals, the contraction strength and the baby pulse rate. Whenever the first peaks, the second should NOT drop appreciably. If it does, it means the baby is not handling contractions well. Who wants to write a CCL query for this?
You may also want to correlate these signals with location information. If you unplug the sensors to exit the room for a while, the software goes berserk thinking that something horrible has happened, and a nurse runs into your room to check if you are alive. Yet if you have RFID in place (which they already do on babies for security purposes, but not on mothers), you can figure out that the reason you don't have a signal is because the mother is no longer in the room. If she is moving from room to room, it's a good bet that she is alive!
Anyway, these are just some random thoughts I had over the few days we spent in the hospital. I certainly had more important things to worry about. But it's very nice to know first hand that what we do here at Coral8 makes a difference already, and has potential to do so even more.
Mark Tsimelzon, President & CTO, Coral8
CEP and SQL: The Top Five Myths
As everybody reading this blog knows by now, the Coral8 Engine is programmed via a SQL-based language called CCL. I know at least two other companies that use a SQL-based language for CEP (or ESP; or CESP -- choose the acronym you like best. I consider them all to be largely one and the same, but that's not the topic for this post). Yet there is an amazing degree of misunderstanding over what it means to use a SQL-based language for CEP.
Some of this misunderstanding stems from marketing done by the vendors who use non-SQL-based languages for CEP, but some of it reflects genuine confusion. A SQL-based language is not the same thing as SQL! SQL would indeed be unsuitable for CEP, but CCL is a language designed specifically for CEP. So let's examine the top five myths about using a SQL-based language for CEP.
Myth 1: the request-response nature of SQL makes it unsuitable for low-latency event processing
It's a very good point, but CCL is not a request-response language. CCL (Continuous Computation Language) uses a completely different processing model in which queries subscribe to input streams, run continuously, and publish results into output streams. This is what gives the Coral8 Engine its amazing low latency (about one millisecond) .
Myth 2: SQL is good for handling sets, but events do not come in sets.
It's true that events do not come in sets. Events come in streams. A stream may be thought of as an ordered set of events of infinite size. But storing and operating on infinite sets is inconvenient, so CCL includes a feature called "windows". A window creates a bounded set. For example, the window "KEEP 10 SECONDS" always keeps the last ten seconds' worth of data. The window "KEEP 100 ROWS" always keeps the last 100 events, and the window "KEEP 10 ROWS KEEP 10 SECONDS" keeps up to 100 events from the past ten seconds. More interesting windows are possible: for example,, the window "KEEP 10 LARGEST ROWS BY PRICE" keeps the ten largest events by price. Once windows are defined, they may be used in SQL-like queries just like tables!
Myth 3: SQL-based systems do not include the notion of time.
Indeed, SQL does not have a notion of time, but CCL does. Every CCL message has a timestamp, and CCL has a number of features for dealing with time, such as built-in time-based windows (KEEP 10 SECOND), output control (OUTPUT EVERY 1 MINUTE), and so on. Time is central to CCL!
Myth 4: SQL-based systems do not handle delayed messages, out-of-order messages, revisions, etc.
Databases do not do this, but the Coral8 CEP engine certainly does. It has built-in configuration parameters for handling delayed and out-of-order messages. Coral8 windows are powerful enough to handle revisions. Data stored in a window can be easily removed or replaced when a revision arrives.
Myth 5: it's hard to code complex event patterns with SQL-based systems.
Yes, SQL makes it hard to write a statement that checks whether a certain sequence of events, say A, then B, then C or D, then E and F happen within a certain period of time, like ten minutes. But CCL has a powerful built-in event pattern matching engine, making it easy to write clauses such as MATCHING [10 MINUTES: A, B, C || D, E && F]. The syntax even supports negative events (events that do NOT occur), using the !A notation. Patterns may be arbitrarily nested, too!
As we've seen, CCL is significantly more powerful than SQL. For the mathematically inclined, CCL is Turing-complete, and SQL is not (don't worry if you don't know what this means). At the same time, the fact that CCL is based on SQL makes it very easy to learn. So you get the best of both worlds: a powerful language designed for CEP that uses the syntax and concepts of the language you already know! Sounds like a good deal to me.
Dashboards and CEP
- Send results to a dashboard
- Send an alert (via email, SMS, IM)
- Issue a command ("Buy this stock", "Block this IP address")
- Publish results to a message bus (Tibco Rendezvous, IBM MQ Series, etc)
- Save results to a database
There are a lot of dashboards on the market, but very few of them can be useful for CEP applications. Most dashboards are not designed for displaying real-time results of CEP. They pull data from various data sources, typically relational databases. This introduces unavoidable latency. A CEP dashboard, on the other hand, must allow the CEP engine to push data to it in real-time. And many old fashioned dashboards or dashboard toolkits were never designed for this.
Luckily, there are some options on the market for building very nice real-time dashboards, and they do not even cost that much. Coral8 does not market dashboards (we prefer to focus on building the best CEP engine), but some of our customers are using Adobe Flex to build nice real-time dashboards that run inside a Flash container, which means they can run inside any browser!
Flex is a great environment for building all kinds of interactive applications. We've put together a simple example of a dashboard that can directly subscribe to Coral8 streams. Using this examples as a model, it is very easy to build nice interactive real-time dashboards. Well done, Adobe!
Here is what a Flex dashboard may look like:

Click here to enlarge image
Mark Tsimelzon
Coral8, Inc.
