Ask HN: Is there server-side software that we are missing in 2018?
It seems like there are so many choices in each category that there's nothing left to do.
I mean things like RDBMS, NoSQL databases, time-series databases, key-value stores, message queue, web servers.
I remember many years ago if you were building something you'd notice the missing solutions and tools because there were things you couldn't do easily (like lightweight application-level caching before redis/memcached popularity).
Nowadays it seems like there's nothing missing.
I don't think anyone has solved the issues raised in Out of the Tar Pit in the data storage space. We still spend a ton of effort, perhaps more than ever before, wrangling incidental complexity. In 2018, I should be able to define a simple relational schema in 30 seconds and start using it with effectively zero constraints around data access patterns, transaction volume, and data scale, and with no design decisions needing to be made around these bits of incidental complexity. There should be one gigantic knob I can turn to spend more money to increase more resource consumption of the system, and it should be damned cheap.
We are so far from that reality that it seems obvious there is still a ton of work to do. You can see bits and pieces of solutions but if your product asks me to define indexes, tune queries, determine sharding keys, write against computer-centric (vs human-centric) APIs, cannot deal with change easily, requires babysitting, or falls over after hitting some incidental tipping point relative to the resources available to a single machine, you've not hit it.
I have not used Google Spanner but at 30k/ft it seems like the closest thing out there to the idealized case -- but being closed source and centralized it is not really a "solved problem" imho.
AWS just release AppSync. You can define a relational GraphQL Schema and hook up to a number of sources to feed data to it. Then, AppSync will automatically turn your schema into a GraphQL API where clients can make queries, mutations, and subscribe to data changes with no constraints regarding data access patterns. Subscriptions scales without any work from you. All of these operations use the relational schema, which is self-documenting. You can change the schema any time you want.
Using the graphQL API requires no knowledge of the backend data store. Therefore, clients and consumers of your API do not need to know about shards, indices, SQL queries, etc.
If you have an existing DynamoDB table, you can automatically generate a GraphQL schema/API from it too.
https://aws.amazon.com/appsync/
disclaimer: i work on aws appsync
This looks awesome, except that it's still in preview. Seems a bit early to bet on it - any idea when there will be assurances that it will continue to be supported long term?
This looks pretty nice.
This leads me to one of my constant bugbears.
Why is RDBMS tooling so completely shite.
I have an IDE that can index several million lines of code and apply accurately timely context dependent intelligent code completion - I have yet to find a good way to debug a MySQL stored procedure (I hate them, I inherited them, except for very simple use cases they are getting slow replaced).
Personally, I would like something like sqlite that you can sync like git. Though it probably should enforce its schema more than sqlite does.
I would use it for replicated / distributed storage for data frames (e.g. R or Pandas), which is somewhat related to my other request [1]
The difference here is that there is structure to the files.
[1] https://news.ycombinator.com/item?id=16394048
I totally agree.
I'm toying with the idea of build a relational-centric language (and maybe storage) because I think the same.
I wanna to create a table/relation as easy as:
in memory. Other things as triggers, PK, FK, views, etc obscure the simplicity of the relational model and make people say weird stuff as "relational database not scale" or "are too inflexible".I think exist a LOT of easy things to enrich the model and make it more usefull. For example, is ok to say:
Is totally ok to nest relations on relations (this alone could make wonders for ORM :) ).> I should be able to define a simple relational schema in 30 seconds and start using it with effectively zero constraints around data access patterns, transaction volume, and data scale, and with no design decisions needing to be made around these bits of incidental complexity
Oh, this also is something that need a better way!
RDBMS are too coupled. I remember to ask if I can ditch the SQL parsing of sqlite and call directly the storage (so I can put my own query engine on top) and that was like if I'm nuts!.
I think is possible to build a RDBMS that is semi-plugable. To make things as swap parts of the engine on "user land". If someone for example wanna create a new kind of index is must be as easy as write:
and plug it into the engine just fine. I think do something like flask/django-esque framework where is possible to have custom fields, validations, middlewares, etc on top of a core storage layer and a default implementation.So, instead of go with redis or something else I could build my "redis-like" api INSIDE the engine and get the advantages of locality, integrity, etc.
ie: See a RDBMS as an API Backend.
This could be model (instead of MVC in common web frameworks) as CQRS or something similar.
Wasn't that what Btrieve was doing back in the 90s? Or was it PervasiveSQL on top of Btrieve engine you could access directly?..
Doesn't this exist? Isn't it called hiring consultants / an engineering team?
This has always been my biggest complaint with Cassandra. It performs and works fantastic if you have the right usage patterns, but if you don't fit into that pattern then you need to mold your usage so that it will fit, or risk the consequences. Using Cassandra effectively is pretty much impossible without knowledge of it's internals and consideration of what effect your usage patterns will have on those internals, which IMO kind of defeats the purpose of a database (keep my data, give it back when I ask).
I want a distributed (not cloud) file system that handles many files and WAN latency (i.e. not HDFS). It might be a cross between Git and BitTorrent.
It feels like deb repos, PyPI, NPM, CPAN, CRAN, etc. should be put in there, with the addition of binaries for popular architectures. And probably Docker-like images, although I think if they are not opaque blobs, it would be better for rsync-like differential compression (which Git implicitly provides).
There will be some small files and some big files. I want it to be like Git so I can clone locally, not just go to the cloud. The way that Git is trivial to set up and clone through SSH is nice too.
It probably has to have a notion of "user", like a local file system. (This makes the problem a lot harder; git doesn't really have permissions.)
BTW Julia's package manager just used git, but I watched a talk that said this ended up being a really bad idea, especially on Windows.
As far as I understand IPFS has some of these properties. Has anybody used it? Could I use it for the package repository use case?
I don't know much about it, but Project Atomic sounds similar too: https://www.projectatomic.io/
Any other projects that seem like a close fit?
BTW I think this would also be useful in the data center, as some companies like Twitter apparently use BitTorrent in data centers to start large jobs quickly (i.e replicate the same 500 MB binary to 1000 machines).
A few random thoughts:
IPFS sounds a lot like what you want. Tahoe LAFS plays in this space a bit
The newer distributed file systems I've seen don't like permissions. They like cryptographically-backed capabilities. You have the permission to read the file because you have the ability to decrypt, through the key. (Of course, key management is easy </sarc>).
Some user-facing distributed filesystem talks drift into FUSE (or similar) territory. A TahoeLAFS dev talks about how users probably don't actually know what they want: https://plus.google.com/108313527900507320366/posts/ZrgdgLhV... (QUIBBLES: REAL FILESYSTEM VS. STORAGE APP section). This is probably less relevant for a package manager.
The first time I read about BitTorrent based deployment was from Facebook.
Thanks for the link! I've heard about Tahoe LAFS, but like a lot of these projects (Upspin, IPFS), I don't actually know anyone who uses them!
Basically you want S3, which has BitTorrent BTW. S3 has multiple widely used command line tools that support syncing (cloning, pushing). There are multiple heavily used S3-like systems, open and closed source, some are S3 API compatible. Examples include Ceph and OpenStack Swift.
The reason people use cloud services instead of running these themselves is because the cloud storage fee is usually less than paying someone to run a distributed system.
Right, but I can run my own git "server" with no problem. That's why I use that analogy.
Plenty of people run their own BitTorrent trackers too.
In my mind, "cloud" open source means: you need a team of experts to run it. git and BitTorrent are different -- they are designed for you to run it yourself.
Also, the model is to "sync" and then "read/write", as with git (and BitTorrent). Not just read/write remotely. So maybe I should call it a "replicated file system" rather than a distributed one.
I guess the main difference with git is that it should handle large binaries / many files, you shouldn't have to clone the entire repo, and maybe users/permissions.
Its not a solution to all problems but Minio (https://minio.io) will provide s3 compatible data storage on your own server(s).
I’m sure it misses some part of what you’re looking for but there’s an old project on github called DrFTPD that does at least part of what you’re looking for, in a relatively archaic manner. It’s designed for a distributed FTPD across WAN links. It also has mirroring / striping, user accounts, etc. I’m sure you could find some tool to help it look local if needed.
Upspin? https://upspin.io/
Yes I watched a talk on it... it seems interesting for sure. It relies on FUSE I believe, whereas I might want an explicit "sync".
Still, I'm definitely interested in whether I could use it to upload say 1 TB of source code, and maybe 5 TB of binaries, and have people sync efficiently (and partially).
Syncthing is pretty nice. It's not a filesystem, and it doesn't deal with conflicts particularly well, but it's as close to a cross-platform, distributed solution that "just works" as I've been able to find.
I feel like a major repo issue has yet to show it's head. We got a sample of what it might feel like with the NPM namespace fiasco that happened recently, albeit a different problem than scale and accessibility.
IPFS does sound like it might be close to what you are looking for, just based off my little experience playing with it. I want to try to leverage a private deployment for dataset sharing for analytics, etc.
https://www.resilio.com/ (formerly bitorrent-sync)?
I'm a big believer in logging, and I find it hard to believe that collecting and monitoring logs needs to have a server with 32GB of RAM and massive Java and ElasticSearch back-ends like ELK, Splunk, Graylog, or similar monitoring software.
I've been searching for something self-hosted that can run in my server's spare capacity and monitor the OS and application logs with a simple searching interface (just a web interface to grep would be handy) and come up short. If I can't host it on the cheapest DO plan, it's out of my budget. I really don't want to have to build it myself but I'm just about at that point.
Sounds like you want something like syslog-ng paired with Nagios.
The "big" solutions require some hardware because they do a lot more than merely collecting your logs.
I used to do something similar (but a bit less fancy), with logs processed by Heka + bash scripts using jq. Heka has now been phased out in favor of hindsight[0], but I haven't personally tried that one out yet.
[0]: https://github.com/mozilla-services/hindsight
oklog might be for you? https://github.com/oklog/oklog
It’s still new, but the architecture is sound and it worked well for me on a small single-node deployment (about 30 clients sending logs).
It looks promising. Now if it could integrate easily with syslog-ng/rsyslog, that would be perfect.
Nagios is not-so-easy to configure, but probably will get you most of the way there.
Nagios doesn't do log monitoring out of the box though. Unless I've missed that? It definitely doesn't aggregate logs.
Nagios has check_log but that is not great for memory, I believe it loads the entire log file into memory to take a diff to only check on the entries missed.
There is check_logfile which has a similar name but is a different library, I wrote about how to set this up a while ago: https://medium.com/luma-consulting/how-to-install-check-logf...
It's not super obvious how this plugin works unfortunately. Not the craziest thing in the world to configure but certainly not easy starting from zero!
Machine learning model management and serving service.
I haven't seen a decent open source framework that makes it easy to package a trained model with prepocessing and postprocessing steps and deploy it behind an API.
Adding performance tracking and model validation on top of that would be great.
Then there are things like queuing/batching and autoscaling.
Closes thing that comes to mind right now is tensorflow serving and their k8s stuff (which it looks like they just renamed to tf-operator and moved to kubeflow org https://github.com/kubeflow/tf-operator)
The model development pipeline is still a far cry from the maturity of the software development pipeline, and will have to get there in order to reduce the hands-on heavy lifting required for model development and deployment.
Along with packaging a trained model are things like: Snapshot/versioning of the training and test data used to create the model, versioning of the model, storing versioned models in a model registry, auto-deploying models from the registry to target environments, telemetry from deployed models.
Closest I've found is https://github.com/mitdbg/modeldb, and I've spoken to the woman leading the effort. They still have data versioning as an open question, and don't see the need. But there are training set modification, results RCA, and other use cases that drive the need to catalog training/test data with the model that results.
It'll get there. Just a question of when and how.
This is basically all we do (Disclaimer: This is mainly meant for bigger companies, not startups): https://skymind.ai/platform - reach out if you have any questions. Mail in profile.
One thing I'll say is "just k8s" isn't realistic. You need a lot more than that.
It should run within k8s but should also be minimalistic. Not a lot of platforms provide that. A lot of our docs on the internals are still being built out yet, but we provide everything ranging from an offline install of anaconda to managed connectivity gateway to hadoop and spark clusters.
We also have built in model serving and experiment tracking (which in reality is just a relational database with a rest api automatically integrated in to the platform) - if you're interested in learning more please reach out.
I'm still waiting for a rock-solid scalable open source graph database in the mold of the Freebase database engine or the amazing graph database that Facebook have built for themselves. I'm very excited about dgraph as an option here but I think it's still an area that is very open for new entrants.
This would be really useful. Now there are Fuseki and Janus[1]. Years ago there was 4store [2]. There is also gStore in development [3]. They all require a lot of help. Would be nice if somebody could pick up and help one of these.
[1] https://jena.apache.org/index.html http://janusgraph.org [2] https://github.com/4store/4store [3] https://github.com/Caesar11/gStore
Adding Cayley to the mix: https://github.com/cayleygraph/cayley
An open-source platform for self-hosting function-as-a-service - something that provides the tooling for easily saying "deploy this function, auto scale it, route HTTP traffic to it, now atomically replace it with this new version" without having to lock yourself in to Google/AWS/Azure.
OpenWhisk is a bit clunky, but it does exist! https://openwhisk.apache.org/
That looks like exactly what I'm talking about: https://github.com/apache/incubator-openwhisk/blob/master/do...
The space is still early enough that there is a lot of value in competing options
https://github.com/openfaas/faas
I'm sure I've seen announcements for others pop up on HN too.
There are a lot of these underway at this point. I'm allocated to one (Project Riff). We also like to hear news from our nearest neighbours: Fn, Kubeless, Fission and OpenFaaS.
A lot of what FaaSes give you is a basically a PaaS, with autoscale to zero and preconfigured event sources. If you don't need those -- actually need them -- then you can get away with a bog ordinary PaaS for the time being.
There are a few. flynn.io does this (it's a bit like heroku, using git push as a deployment action and a web ui to manage things). caveat: i hasn't used it...
Some of the biggest advancements in recent years have come less from technological breakthroughs, and more from improvements in designs and abstractions. Tools like Ruby on Rails, GraphQL, and even AWS, while impressive technically, are breakthroughs because they improved developer efficiency. They also weren't "needed" until they were built, and have allowed many developers to work on broader parts of the stack.
Also, today's tech solves today's problems.
Even though we have Postgres and MySQL they really don’t seem well configured out of the box and you’ve got to wade through a bunch of settings files and understand what vacuuming is.
I think we need a self tuning rdms. It could watch and adapt to usage patterns and available resources.
Isn't that what aws aurora and google cloud spanner offer?
There might be something exciting to build to help implement ludicrously fast Google-style autocomplete / typeahead Search. I've tried using MySQL, elasticsearch, PostgreSQL with trigram indices... they can be made to work, but I've never felt that I'm anywhere near the quality of whatever it is Google are doing here.
You need to go deeper and use AnalyzingSuggester from Lucene. It is as fast as it gets. Also, do not forget about the web part of the equation. Using HTTP/2 helps, as well as disabling buffering all the way through.
Have you looked at Algolia?
I would say anything in the video / 3D / imaging service platforms always have a lot of things left to be desired for.
Anything that potentially touches FFMPEG basically
I can name a few examples that I still think need improvements
- Online gif editors
- Video editing / clipping
- Background image editing / online photoshop equivalents
- Better alternatives than lucidpress / adobeIndesign for catalog page / brochure creation
- Machine learning / deepfake online tool, this is all driven client side mostly
- Pretty much anything client side thats not yet server-side is open game I would say
- PDF markup tools could be better for online-based services, especially for architectural design
- CAD-based online programs for retails so customers can DIY build their own warehouse or layout schemas is lacking
- Better online RDBMS. Currently, there's just airtable, its lacking some core features like refential integrity
- Integrating space-repetition learning in most educational based services (lynda.com,pluralsight, etc)
- Managed ecommerce cart / hosting services. Its a well understood problem that should technically be easy for a client to do.
Again, I would say, almost every profitable service touches some form FFMPEG for video / image editing / 3D is definitely still out there. There's such a huge untapped market out there combining all of theses services in one package.
I would add on-the-fly video transcoding. FFMPEG would need some efficient context state serialization to bring resumability/seekability and at the same time to stay low on resources.
I should be able to copy and paste a server configuration and create an identical copy of my server. Right now, the only solutions I see are generating images (too big and cumbersome) or writing scripts (too complex).
I would like to be able to type in a single command and replicate all the packages, services and configuration present in a particular server to a new target system.
You can get a fair way down that path with ansible if you are very careful but it still requires too much work, modern operating systems and packaging where not built for idempotent rollbacks.
We try to treat pets like cattle then wonder why they bite us (to reverse the usual refrain).
I agree though, it should be declarative but everything shits everything else all over the filesystem.
I am trying to make daptin just like you explained.
https://github.com/daptin/daptin
A JSON config plus backing database (mysql/postgres/sqlite) defines your complete environment.
Check out NixOS and NixOps
https://nixos.org/
https://nixos.org/nixops/
Software that does it: puppet, chef, salt, ansible, ... You can have chef as a service in AWS via OpsWorks. Alternatively almost every cloud provider supports cloud-config at boot time.
I think there's a general problem that the incentives are for building big tools, so small problems like this are handled by everybody writing their own ad-hoc scripts.
Maybe you will find Nix useful.
Doesn't Puppet(.com) do something like this?
A workload-aware data distribution proxy.
Let me explain. A lot of people talk about multi-cloud these days. Either AWS and Azure, or AWS and their own datacenter, or whatever.
While it's really easy to send compute jobs to one cloud or the other based on which one is best suited for the job, the big issue is that your compute job will need data. Right now your choices are to have a copy of all your data in all the clouds (a very expensive proposition since you'd be constantly shipping updates across very expensive outbound connections) or to have all your data in one place and have the compute jobs reach out across the internet to get it (also expensive and now there is latency for every job too).
I want a proxy that is smart enough to say "Workload X is usually done on AWS, and requires data points Y and Z, so make sure Y and Z are always up to date on AWS, but lazy update Y and Z to GCE in batches through the cheapest direct connect possible".
One project that is aiming in this direction: http://seaclouds-project.eu/
That's an interesting start! Scanning briefly it looks like I still have to tell it what data and workloads go where -- it isn't figuring it out automatically. Which is the holy grail I'm looking for.
Given the sheer number of services on AWS, I would like an expert system that would query me for requirements and then suggest a set of possible services to use together to solve my problem.
Meanwhile you can use this to get a quick overview of the services offered. Amazon should put this up on their AWS main page.
AWS in Plain English https://www.expeditedssl.com/aws-in-plain-english
Serving machine learning models in production is still something that appears not to have an obvious correct solution.
Hi fellow YC alumni! See the pitch here: https://news.ycombinator.com/item?id=16399326
We'll be supporting PMML and the like as well. The goal is to hit the simple things rather than perpetuating the latest hype like the AutoML stuff people are going on about currently. If you'd be interested, would be happy to have a conversation to go over what we're trying to do. We hope to just provide a platform neutral tool for building and deploying models similar to sagemaker (but cloud agnostic)
I think what I've been missing, and trying to do a better job at, is understanding the awesome power of existing and mature solutions. Things like postgres's LISTEN/NOTIFY and so on.
10 years ago, I learned about the idea of elastic infrastructure. Services with load balancers in front of them, hosts that come and go easily, a structured way to communicate between services.
At the time, that was nginx + manual autoscaling, then it was ELBs and autoscaling groups, now it's kubernetes and containers, maybe hosted. It's still not there.
I'm excited about the software that makes that operable at scale. It seems like a service mesh is a good idea. It seems like mutual security between services is a good idea. It seems like storing routing configuration in a separate control plane that is executed in a data plane like Envoy is a good idea.
There's a bag of software at CNCF that's loosely organized around this, but I don't think the "just deploy some code, it can scale and you can have tons of services doing that with good visibility and operability" is quite there. I'm really exicted about Envoy, but I don't think there's a good control plane for it. I'm part of a company that's working on a commercial implementation (turbinelabs.io), and Istio is in a similar space.
There's still work to be done!
I suspect this niche will emerge this year.
> It seems like there are so many choices in each category that there's nothing left to do.
Log storage and search for structured logs (e.g. JSON or CEE, not merely stringblobs). We have paid solutions (Splunk, Loggly, Papertrail), and then we have Elasticsearch, which gets worse in this use scenario with every release.
Message stream processing engine that doesn't require restarting to add a query or data sink, so you could build monitoring system around it. In fact, a monitoring system designed to allow you to easily add your custom processing or data sink.
Infrastructure inventory that can be both filled by hand and kept updated by machine and that can be queried from script or browsed by human. For that it would be useful to have a good topic maps engine, which is another missing thing.
OS updates manager that can handle more than just Red Hat/CentOS (Red Hat Satellite or Spacewalk) or just Ubuntu (Canonical Landscape), and while at it, one that doesn't try to be underdeveloped configuration management tool (like CFEngine or Puppet) and underdeveloped deployment tool (like Ansible), but can cooperate with them.
And there's much more where these came from.
I really want to simplify writing backend logic and implement everything like functions that reacts on some events and produce another one's. This is a very very looks like, Actor Model (Akka) plus Redux's Reduces: We have just a bunch of a state changing rules and they can be executed reliably and with decent performance.
Something very simple (for example for Build Server):
Build Failed (Event) -> Assign responsible user for a crash -> Send Notification -> Deliver Notification via user's configured notification systems (email, push, sms..)
This is a Event Sourcing, but event sourcing works well only for one part of the platform - write side, but reader side is sometimes too hard to implement. Sometimes there are a problems with eventual consistency and you actually need to wait while some of the reducers in chain will process this event and starting from this point everything became toooo slow to develop and you basically start to redevelop the wheel - this is just a database engine de-factor. Meh.
I'm a one-eyed Concourse fan, which implements a separation of state and logic that allows designs like this. I've been pitching pretty heavily the concept of creating or converging to parity with the FaaS I'm assigned to work on.
So: maybe. We might pull this off.
I'm a front-end developer.
I'd like an OSS solution that will allow me to deploy an arbitrary server-side service - be it a Ruby, Python application or even a package like OpenVPN - to some cloud infrastructure, easily.
This solution should spin up a hardened OS distro (CIS-compliant, maybe?), provision it with my arbitrary services (using Ansible or Chef or something), and deploy it to AWS or some cloud infrastructure for me (using Terraform or something).
All these component pieces exist, but nothing ties all of them together for an easy deploy for a front-end developer like me.
(And, I know things like Heroku and CodeDeploy exist, but I dislike lock-in and they nearly universally come with their own restrictions, like lack of support for server-side Swift applications or custom services like OpenVPN or git-annex.)
EDIT - I'm strongly considering taking some time off to write this soon, so get in touch if this is something you're interested in! Contributing or using!
There's an overlap with what BOSH does. Starts with a stemcell, compiles your packages against it, spins up whole VMs configured with those packages.
It also adds monitoring for both VMs and processes.
Don't underestimate what you get from a PaaS like Heroku, though. If you're able to stick to 12-factor apps, the lockin is pretty mild -- you should be able to hoist your skirts and move to Cloud Foundry or even OpenShift without too much pain.
Disclosure: I work for Pivotal, we work on BOSH and Cloud Foundry. We compete with Red Hat and Heroku.
We're actually about to release a product that does exactly this! We're in private beta right now, but will have something available to the public soon. If you're interested, check out www.nucleus.codes. And if you want to join the beta, email us at hi@nucleus.codes.
A schemaless and automated middleware for scripting languages (js, php, lua...).
The tool inspects the source code and wraps calls across processes (no IDL).
I wrote a (beginning of) a description here: http://dpt.slasheva.com/project-ideas.html#middleware
A CMS backend to which I can sew on any template based frontend I want without having to follow the strict workflow set by the backend.
The backend would just serve content when asked (JSON prolly), the frontend could be anything you desired.
Area of automated testing and developer tools, in general, is hugely underestimated. Tools that help you write, format, verify code, automatically detect issues from stacktraces, or any other sources like tcpdumps, smart fuzzers, and etc. The market is huge.
I have a personal project for 5 years in this area https://goreplay.org, and investigating ways to automatically find issues in web applications. Project get traction, but I do not see any competition so far. And there is a lot of reasons for this because in such under-researched areas being a pioneer for both project owner and end user require them a different set of mind. And this is really hard to market.
I believe the problem has now shifted to a slightly higher level and especially when working in a microservice/containerized world. For example, given the number of specialized services you mentioned, we see more applications with polyglot persistence stacks underneath. If they form a part of the same application, how do you collectively manage them? What does it mean to take a consistent snapshot? These and a bunch of other similar questions need to be answered.
There is some work we are doing (https://kanister.io) to help with these issues but there are a lot of emerging solutions in this space. Look at CNCF's landscape for some more detail.
I feel like we're in a consolidation phase. We have a ton of tools/software and the big developments have been in how we use those tools aka devops. I'd put things like kubernetes/docker in this category, but there's obviously huge room for growth around this tech.
I do think there's room for new tools tho, but by definition, they're on the cutting edge and not especially visible if you're not looking. Things like stream processing are already incredibly useful and only getting better. Machine learning could be in this list too. It might not be a 1000% improvement like caching, but that's part of a maturing industry.
> Nowadays it seems like there's nothing missing.
There's lots of things missing. There's just nothing missing in established categories. Why would there be?
Start your own category, create software for it, then convince everyone that the category is important.
Great advice! I'd venture that managing large files is an unsolved problem. It's a hack in most version control systems, and uploading/downloading files from a host, even S3, is a slow, serial process. Same for checksumming. Network speeds have more than caught up, and large files are a frequent process bottleneck. Something that makes it easy to manage and consume large files could be a big deal. It probably would require a new application protocol, maybe even a new filesystem similar to XFS.
Or maybe a file should stay where it is and processing logic itself should be deployed there. If file parts are distributed then processing could be suspended and migrated to place where next piece is stored. Something similar is done with Hadoop and HDFS.
This. If you want to create something, create something that simplifies things for us. Don't just create another tool to go on the tool belt.
I remember when databases were only for highly paid experts and needed a lot of customization and configuration. I think we need a similar development for AI and ML to make them accessible to the average developer.
I wish we had more statically typed web backend frameworks as usable as rails or phoenix. I have used Elm a lot recently and it rocks. I wish I had something as incredible for the server side.
I do too! Seems like the play framework with java and the sails framework with typescript are a few existing options. I'd love to see something rails-ish for go.
.net core?
I miss a decent self-hosted photo/video library.
The missing piece?
It is a new project, it enables your server to use all other front end and back end technologies at the same time! That is what makes it so fantastic!
You get all front end JavaScript frameworks, and all back end server technologies, all for one low price. Just install this new component into your project. It will download the other few gigabytes and hook it into your project.
1. Static Website generators for non-blogs 2. Half decent wrapper libraries for tools like FFMPEG in NPM 3. Plug and Play game server with stat tracking that can be easily used for any game 4. More versatile compression formats for media, ability to serve up 360/720/1080 from the same file?
1. It's hard to give constructive help without more information. Why not write a script that compiles a bunch of pages into html? As the other comment said, there's also stuff like jekyll. Maybe you could provide some use-cases?
2. Explain your use-cases? Why not call ffmpeg directly? You'll need to familiarize yourself with the original library functionality anyway, no?
3. No opinion. Not particularly interested in games. Might be hard to generalize, maybe?
4. Versatile in what sense? What are your complaints with existing media formats? I think licensing is an issue, but stuff like VP9 and Opus seem to take care of that matter. I'm not an expert, but I think both MP4 and MKV can already hold an unlimited number of media files.
>Static Website generators for non-blogs
We use Jekyll for more than one site which is not a blog, works great for us so far. What features do you think are missing that you wish it had?
(4) can already be done with MKV... technically. You actually end up with 4 alternate video streams in the same MKV, which you remux for clients that can't handle it.
I don't have great suggestions for you but I think software needs are ever evolving. As more of the world comes online, there are more data requirements, newer business requirements, etc. We would need even more scalable systems on various dimensions.
There's two ways to "fill a gap", either you create a gap that needs filling or you make an existing solution better. You don't see anything missing because you aren't looking at things from this perspective.
Security probably requires quite a bit of OSS on it. Lot's of proprietary software. There are a lot of penetration testing devices, not so many securing software.
ACID databases that can scale easily on multiple machines still don’t exist.
https://cloud.google.com/spanner/ and the open-source https://www.cockroachlabs.com/product/cockroachdb/ both provide scalable ACID transactions; however, both introduce added latency compared to traditional databases.
Static website generators.
This might be a joke but there's plenty of room for a good static wordpress alternative that focuses on bloggers instead of developers.
Netlify CMS?
I would argue content management systems for static website generators like cloudcannon.com could be better.
We already have jekyll and hugo
And Hakyll [1]
[1] https://jaspervdj.be/hakyll/
Before storing data we should have a good do not track / privacy respecting software for the data we store.
I want a way to use AWS Lambda/GCF but with cluster level network locality.
Interesting. Do you mean for functions to call each other (so that it’s sub-ms) or for some other reason?
Yeah, basically low latency versions of FaaS.
A plugin to handle all of GDPR requirements?
It feels rather like we have too much software.
An embarrassment of riches in the categories that are served does not preclude the possibility that there is a need that isn't well served at all yet.
I haven't found a good authorization service (fine grain roles, acls, etc). I always find myself having build it for every new project/company.