Datomic: Event Sourcing without the hassle

vvvvalvalval.github.io

206 points by fnordsensei 5 years ago

codebje 5 years ago

The Events here give up on the naming, granularity, and semantics problem: they're extremely low level fine-grained changes to fields in a database.

Events themselves are no longer interesting or semantically meaningful, because they're a single atomic change to a database field. A change in the way state is represented means a different set of events is produced. Subscribing to meaningful occurrences in this model is difficult, and probably will eventually result in the creation of a "meta-event" for each action that contains the semantic intent of the outcome.

Events are IMO most useful for analytics and processing when they correspond to meaningful business outcomes: steps in a workflow, consequences of user actions, and the like - despite this having the problem of making extraordinary and rare business outcomes more difficult to accomodate.

valw 5 years ago

> The Events here give up on the naming, granularity, and semantics problem: they're extremely low level fine-grained changes to fields in a database.
You say "fields", I say "facts".
Which of a set of nominal events or a graph of facts is higher-level will ultimately depend on the domain, but using Datomic over several years now has led me to think that it's much more often the latter than conventional Event Sourcing has trained us to believe.
In fact, it may well be that Event Sourcing has traditionally been relegated to those (few) use cases where the "set of nominal Event Types" approach is better, being the only situations where conventional Event Sourcing is practical.
> Events themselves are no longer interesting or semantically meaningful, because they're a single atomic change to a database field. A change in the way state is represented means a different set of events is produced. Subscribing to meaningful occurrences in this model is difficult,
Datoms are not 'single' or 'atomic' - they're packed coherently together in a Transaction, and are immediately relatable to entire database values. Subscribing is just pattern matching, and is not hard, as the following section of the article tried to show: https://vvvvalvalval.github.io/posts/2018-11-12-datomic-even....
> [...] and probably will eventually result in the creation of a "meta-event" for each action that contains the semantic intent of the outcome.
Which would only bring you back to the 'set of nominal Events' case, so it's not a regression compared to conventional Event Sourcing anyway. That's what the article meant by 'annotating Reified Transactions', and that's something you can even do after the fact (i.e in a later transaction, when the requirement for it becomes apparent) which means that you don't have to get these aspects rights upfront, nor commit to them.
For a more in-depth discussion of Datomic's Reified Transactions, I suggest this talk by Tim Ewald: https://docs.datomic.com/on-prem/videos.html#reified-transac...
weavejester 5 years ago

"Events themselves are no longer interesting or semantically meaningful, because they're a single atomic change to a database field."
This isn't true; changes to datoms are not considered individually, but are grouped together into transactions. Additionally, you can add arbitrary keys to annotate the transaction currently being committed.
See: https://docs.datomic.com/cloud/transactions/transaction-proc...
seer 5 years ago

That is true, but isn’t it trivially solvable with the datomic model? You just have a datum “significant event” list and add stuff to it?
You still get all the benefits of event sourcing, but then you can query the state much more easily.
At least that’s what it looks to me from a cursory look at datomic and daily struggles with normal event sourcing (kafka)
- pwm 5 years ago
  
  Have you written about your daily struggles somewhere? I, for one, would be very interested reading about it as I think not nearly enough people talk about the cons/war stories when it comes to ES (or I just don't find those blogs).
- MaxGabriel 5 years ago
  
  We didn’t have much success with it, but there are tools like Debezium that read the Postgres WAL and send row level changes like this to Kafka

ChrisMalherbe 5 years ago

Great article, but he should have started with the "it's proprietary" bit.

Life is too short to run proprietary software.

hjek 5 years ago

> Life is too short to run proprietary software.
Agreed, but Datomic uses Datalog as query language:
> In practice, a Datomic Database Value is not implemented as a basic list; it's a sophisticated data structures comprising multiple indexes, which allows for expressive and fast queries using Datalog, a query language for relational data.
Hence the article should still be relevant to free Datalog implementations, such as the Racket Datalog package[0].
[0]: https://docs.racket-lang.org/datalog/index.html
- vbuwivbiu 5 years ago
  
  and after a few days of writing datalog queries you'll wonder why you ever put up with SQL
  
  javajosh 5 years ago
  
  "After a few days of X you'll wonder why you ever put up with Y". If you added all the (X,Y) pairs where this has been said, I wonder how many days, nay, lifetimes you'd need to prove their points for them?
  
  village-idiot 5 years ago
  
  More hilariously, how many pairs of (x,y) would have their opposite of (y,x) in the same set?
  Also, having used datalog, I couldn’t disagree more. SQL is fine.
jwr 5 years ago

I do not intend to use Datomic (I currently use RethinkDB and I am considering FoundationDB in the future), and yet I found the article interesting and I think I can learn a lot from it (get a good feeling for a number of approaches to a hard problem).
manigandham 5 years ago

I'm not sure what proprietary software has to do with limited time, other than helping you move faster in most cases.
fnordsensei 5 years ago

Setting aside idealism for a moment, I find it difficult to criticize those who do choose to make their software proprietary. In some cases, it's the only way for the software to exist.
- Barrin92 5 years ago
  
  I don't even understand the 'idealistic' part. Without proprietary software, software as a distinguishing feature falls away, and scale and hardware seem to become all that matters, and nobody can compete with the big players on cash.
  It's no wonder large companies have become fans of open source with the advent of cloud computing. People are throwing software at their magical money making machines entirely for free.
mhd 5 years ago

Life is also usually too short to implement event sourcing. Strangely enough I see this beloved YAGNI pattern enjoying some sort of renaissance lately.
- bantunes 5 years ago
  
  One of the big drawbacks to it was storage (for all those events), and it's dirt cheap these days.
Scarbutt 5 years ago

Many succeed with proprietary software while others worry and don't ship.
- chongli 5 years ago
  
  Proprietary software is a cousin to technical debt. You're trading expedience for future risk. You simply never know when the copyright holder will fold up shop and leave you in the lurch.
  Edit: If you think this is just an abstract risk, remember that Google shuts down popular and incredibly useful products so often it's become a meme around here.
  
  manigandham 5 years ago
  
  Every decision is a trade-off. Proprietary software with support from a vendor can greatly speed up your product viability and market profitability.
  For most companies, and especially startups, that is far more important than the very unlikely risk that a vendor completely disappears overnight, and the even more unlikely risk that their software also stops working completely before you can migrate to something else.
  
  chongli 5 years ago
  
  The vendor disappearing overnight is only one of many risks from using proprietary software. Others include the vendor discontinuing the product, taking the product in a radically different direction, the company being acquired, or simply changing the licensing model (see Adobe Creative Cloud) to dramatically increase the costs of using the software.
  Look, right now I am working for a company that is in the midst of attempting to transition out of a very old source code and release management system and they're having a hell of a time doing it. That system happens to be proprietary and the support plus licensing fees are astronomical while the actual tech support is abysmal.
  Yes, the risk may seem like an easy tradeoff when you're starting out and you need to ship and you don't have any market share to worry about. It's a whole different story when you're dealing with a very clunky, yet very profitable legacy system that you're not allowed to fix because it's proprietary and yet your business depends on it.
  
  aidenn0 5 years ago
  
  > The vendor disappearing overnight is only one of many risks from using proprietary software. Others include...
  > the vendor discontinuing the product
  This is not a zero risk proposition with open source software either; your costs go up significantly if you have to start maintaining a legacy codebase.
  > taking the product in a radically different direction
  See above; if you just want to run the old version, proprietary software lets you do this as well.
  > the company being acquired
  This is definitely the biggest risk with Datomic; if Cognitect decides to EOL Datomic, there is a very high chance that they open source it (see the various free software they develop already), but if they are acquired by Oracle that chance becomes zero.
  > or simply changing the licensing model (see Adobe Creative Cloud) to dramatically increase the costs of using the software.
  Datomic licenses are perpetual I believe, so not a risk with Datomic.
  
  fnordsensei 5 years ago
  
  > Datomic licenses are perpetual I believe, so not a risk with Datomic.
  On-prem is perpetual with a year of maintenance (which can be extended), while Datomic Cloud is integrated with AWS and charged and licensed like other AWS services: month-by-month.
  
  pritambaral 5 years ago
  
  Could one transition from Datomic Cloud to an on-prem setup on self-managed EC2 instances?
  
  Scarbutt 5 years ago
  
  You can't (at least not with some great effort), cloud depends completely on various aws services, that's the worst part of datomic IMO, cloud and on-prem are two different incompatible databases.
  
  manigandham 5 years ago
  
  Sure, but that's my point: weigh the actual risks.
  How big is the company you're working for? Could they have gotten that big in the first place without using these tools? Companies change as they scale and solutions that worked when they were young will almost always need to change as they grow, so I don't see that as a particularly bad situation. It's a cycle of constant change management and risk mitigation.
  Usually the bigger company has the resources to make changes while a startup trying to plan for 100x future size usually ends up limiting its own growth.
  
  TeMPOraL 5 years ago
  
  Here's a trick though: for a lot of companies, startups in particular, their expected lifetime is comparable to the expected lifetime of the proprietary software/services they depend on. It's quite likely that they'll fold themselves before their vendor does.
  I don't like this much (especially the part when an exit-oriented startup will say they want to "change the world" and/or they "care about the users", but it's how things are.
  
  ken 5 years ago
  
  Open source projects stop being developed, too. It's nice to have access to the source code, but in practice whenever it's happened to me, I don't have the resources to maintain even one abandoned program on my own.
  
  dominotw 5 years ago
  
  I think the greater risk is vendor raising the contract price to something you can't afford. Because now they know you are locked in. I 've seen this happen time and again.
- hjek 5 years ago
  
  Studies have shown that most people consider arguments based solely on weasel words completely meaningless.
aidenn0 5 years ago

As long as it's not profitable to make money solely off of consulting for free software, but is profitable to sell proprietary software, proprietary software will be inevitable (the company that makes Datomic previously had that business model before deciding to release Datomic as proprietary software).
- zcam 5 years ago
  
  Both models are/can be profitable, it's down to a personal (strategic) choice from the authors ultimately.
  There are many successful stories of proprietary dbs and also a number open sources ones in terms of profitability, ex: Elastic, Datastax, Confluent, Citus etc etc...
  Personally I wished it was open-source, it looks quite capable but there is too much risk involved for me to be confortable using it, not to mention it's quite pricey. 1$/day is for dev setups, prod cloud setups start around 4-5k/year last time I read about it, it might be fine for a single deploy backing your service, not when that's a cost you have to add to every client.
  Another thing is that it is very specific to some uses and has some limitations (subjectively) that will often require to pair it with other solutions to be actually usable for some things (ex: strings are limited to 4096 characters, no bytes type). All in all it makes sense given what you should use it for (and not use it for), but that's not your usual db product and sometimes I have the feeling that it's advertised as a potential drop-in replacement for <insert favorite relational db> when it's quite often not by itself (arguably, apples vs oranges).
  There are also a number interesting of projects that got inspired by it in one way or another, but nothing directly comparable:
  * datahike (and the upcoming datopia.io)
  * datascript
  That said datalog is a pleasure to use and datomic looks fantastic it's just not for everybody.
  
  jdminhbg 5 years ago
  
  > 1$/day is for dev setups
  This is just for the AWS-hosted cloud version; you can run the dev version locally for free.

nathan_long 5 years ago

I like the idea of storing events rather than simply the current, mutable state of the world.

However, it seems like privacy requirements like "forget you ever knew about this user" would throw a wrench in the gears.

valw 5 years ago

That's manageable: https://vvvvalvalval.github.io/posts/2018-05-01-making-a-dat...
- nathan_long 5 years ago
  
  How cool that you've already written a whole blog post dealing thoroughly with that problem. :D
cfontes 5 years ago

You can encrypt personal fields on the events and throw the key away if you need to forget that user.

z3t4 5 years ago

After implementing "collaboration" features in an editor, I discovered that functionality can also be used else-where for example with undo/redo. But also in for example a web app that has many servers, eg a site like Facebook. For example 1) you get message A, 2) You Read message A 3) You get message A again, which would usually end up with annoying user interface errors, like "you got one undread messages", even though you've read them all. But if you transform the event, that last message, get market as read, because there was a prior event marking it as read.

Kiro 5 years ago

I don't understand how you can be performant with Event Sourcing. So to read the latest value I need to backtrack all the changes from the start? Every time?

beaker52 5 years ago

You can pre-compute the latest values by consuming the event stream and building up some sort of persistent, disposable read model that has the values you want. Similar to the redis-read cache strategy that has been popular in Ruby and PHP communities in the past, for example.
Bonus: When your read model differs from the model you use to determine and enforce business logic constraints, this is a strategy that's known as CQRS (Command Query Responsibility Segregation) - this enables you to trade consistency and speed off against each other for both the read/write sides of your model independently of each other.
For example, you can have fast, inconsistent reads by reading from a cache which you use to power your user interfaces and when it comes to processing user commands, you separately and intentionally read the event stream and build up a consistent model to inform whether you can accept the incoming user command. This way you've achieved consistent-yet-slower writes, minimising the opportunity for the system state to become inconsistent with your programmed business logic, whilst minimising the processing requirement (thus time) to display a user interface, at the cost of consistency. That's not to say your user interface will be wrong - but you should be aware that under certain conditions (e.g. your read model cache is unable to reflect changes fast enough) then your read model and thus user interface will be inconsistent with the state chronicled in the event log.
jonathanoliver 5 years ago

Almost. Think of it like a bank account. You don't sum up every transaction amount across all time. Instead, to calculate your balance you start from the last statement balance (the last memento according to Greg Young) and apply newer events. So it's VERY fast. You can even compute and save your newer balance (memento) every few events if you'd like.
dustingetz 5 years ago

immutability and snapshotting - like git
uncle_d 5 years ago

You can cache that result and update it with each new event.
anentropic 5 years ago

that's where the Aggregates come in, in Event Sourcing model you write to the Event Log and read from Aggregates

jfbaro 5 years ago

I really wish Datomic will go Open Source some day! It is such a great product... a lot of innovation would come from a community driven Datomic.

amirouche 5 years ago

I am working on a Datomic clone, watch this https://github.com/amirouche/asyncio-foundationdb

quantum_state 5 years ago

It seems to me content of an event should respect causality: an event simply encapsulates the time and context something happens without any info about how it is going to be interpreted semantically in the future by the event processing components, one would be able to resolve any of the issues. The design needs to respect separation of concerns at the info level.

truth_seeker 5 years ago

The DynamoDB (storage layer) commercial plans for reads and writes is a deal breaker for me.

ScyllaDB is a much stronger alternative. It is supported but not officially in Datomic Cloud. It will make Redis/Memcached almost redudant for most of the applications in the Datomic Cache layer as it has very good latencies for read and write.