Ask HN: Is there server-side software that we are missing in 2018?

79 points by borplk 6 years ago

It seems like there are so many choices in each category that there's nothing left to do.

I mean things like RDBMS, NoSQL databases, time-series databases, key-value stores, message queue, web servers.

I remember many years ago if you were building something you'd notice the missing solutions and tools because there were things you couldn't do easily (like lightweight application-level caching before redis/memcached popularity).

Nowadays it seems like there's nothing missing.

gfodor 6 years ago

I don't think anyone has solved the issues raised in Out of the Tar Pit in the data storage space. We still spend a ton of effort, perhaps more than ever before, wrangling incidental complexity. In 2018, I should be able to define a simple relational schema in 30 seconds and start using it with effectively zero constraints around data access patterns, transaction volume, and data scale, and with no design decisions needing to be made around these bits of incidental complexity. There should be one gigantic knob I can turn to spend more money to increase more resource consumption of the system, and it should be damned cheap.

We are so far from that reality that it seems obvious there is still a ton of work to do. You can see bits and pieces of solutions but if your product asks me to define indexes, tune queries, determine sharding keys, write against computer-centric (vs human-centric) APIs, cannot deal with change easily, requires babysitting, or falls over after hitting some incidental tipping point relative to the resources available to a single machine, you've not hit it.

I have not used Google Spanner but at 30k/ft it seems like the closest thing out there to the idealized case -- but being closed source and centralized it is not really a "solved problem" imho.

  • TheHam 6 years ago

    AWS just release AppSync. You can define a relational GraphQL Schema and hook up to a number of sources to feed data to it. Then, AppSync will automatically turn your schema into a GraphQL API where clients can make queries, mutations, and subscribe to data changes with no constraints regarding data access patterns. Subscriptions scales without any work from you. All of these operations use the relational schema, which is self-documenting. You can change the schema any time you want.

    Using the graphQL API requires no knowledge of the backend data store. Therefore, clients and consumers of your API do not need to know about shards, indices, SQL queries, etc.

    If you have an existing DynamoDB table, you can automatically generate a GraphQL schema/API from it too.

    https://aws.amazon.com/appsync/

    disclaimer: i work on aws appsync

    • davidjnelson 6 years ago

      This looks awesome, except that it's still in preview. Seems a bit early to bet on it - any idea when there will be assurances that it will continue to be supported long term?

    • marsanyi 6 years ago

      This looks pretty nice.

  • noir_lord 6 years ago

    This leads me to one of my constant bugbears.

    Why is RDBMS tooling so completely shite.

    I have an IDE that can index several million lines of code and apply accurately timely context dependent intelligent code completion - I have yet to find a good way to debug a MySQL stored procedure (I hate them, I inherited them, except for very simple use cases they are getting slow replaced).

  • chubot 6 years ago

    Personally, I would like something like sqlite that you can sync like git. Though it probably should enforce its schema more than sqlite does.

    I would use it for replicated / distributed storage for data frames (e.g. R or Pandas), which is somewhat related to my other request [1]

    The difference here is that there is structure to the files.

    [1] https://news.ycombinator.com/item?id=16394048

  • mamcx 6 years ago

    I totally agree.

    I'm toying with the idea of build a relational-centric language (and maybe storage) because I think the same.

    I wanna to create a table/relation as easy as:

       customers = [id:int, name:str; 1 "jhon"; 2 "doe"]
    
    in memory. Other things as triggers, PK, FK, views, etc obscure the simplicity of the relational model and make people say weird stuff as "relational database not scale" or "are too inflexible".

    I think exist a LOT of easy things to enrich the model and make it more usefull. For example, is ok to say:

       inv = [id:int, lines:Lines; 1 [id:1, qty:1, price:$10];  2 [id:1, qty:3, price:$40];]
    
    Is totally ok to nest relations on relations (this alone could make wonders for ORM :) ).
  • mamcx 6 years ago

    > I should be able to define a simple relational schema in 30 seconds and start using it with effectively zero constraints around data access patterns, transaction volume, and data scale, and with no design decisions needing to be made around these bits of incidental complexity

    Oh, this also is something that need a better way!

    RDBMS are too coupled. I remember to ask if I can ditch the SQL parsing of sqlite and call directly the storage (so I can put my own query engine on top) and that was like if I'm nuts!.

    I think is possible to build a RDBMS that is semi-plugable. To make things as swap parts of the engine on "user land". If someone for example wanna create a new kind of index is must be as easy as write:

       fun CoolIndex.get(key) -> Value
    
    and plug it into the engine just fine. I think do something like flask/django-esque framework where is possible to have custom fields, validations, middlewares, etc on top of a core storage layer and a default implementation.

    So, instead of go with redis or something else I could build my "redis-like" api INSIDE the engine and get the advantages of locality, integrity, etc.

    ie: See a RDBMS as an API Backend.

    This could be model (instead of MVC in common web frameworks) as CQRS or something similar.

    • Fins 6 years ago

      Wasn't that what Btrieve was doing back in the 90s? Or was it PervasiveSQL on top of Btrieve engine you could access directly?..

  • ariwilson 6 years ago

    Doesn't this exist? Isn't it called hiring consultants / an engineering team?

  • foobarchu 6 years ago

    This has always been my biggest complaint with Cassandra. It performs and works fantastic if you have the right usage patterns, but if you don't fit into that pattern then you need to mold your usage so that it will fit, or risk the consequences. Using Cassandra effectively is pretty much impossible without knowledge of it's internals and consideration of what effect your usage patterns will have on those internals, which IMO kind of defeats the purpose of a database (keep my data, give it back when I ask).

chubot 6 years ago

I want a distributed (not cloud) file system that handles many files and WAN latency (i.e. not HDFS). It might be a cross between Git and BitTorrent.

It feels like deb repos, PyPI, NPM, CPAN, CRAN, etc. should be put in there, with the addition of binaries for popular architectures. And probably Docker-like images, although I think if they are not opaque blobs, it would be better for rsync-like differential compression (which Git implicitly provides).

There will be some small files and some big files. I want it to be like Git so I can clone locally, not just go to the cloud. The way that Git is trivial to set up and clone through SSH is nice too.

It probably has to have a notion of "user", like a local file system. (This makes the problem a lot harder; git doesn't really have permissions.)

BTW Julia's package manager just used git, but I watched a talk that said this ended up being a really bad idea, especially on Windows.

As far as I understand IPFS has some of these properties. Has anybody used it? Could I use it for the package repository use case?

I don't know much about it, but Project Atomic sounds similar too: https://www.projectatomic.io/

Any other projects that seem like a close fit?

BTW I think this would also be useful in the data center, as some companies like Twitter apparently use BitTorrent in data centers to start large jobs quickly (i.e replicate the same 500 MB binary to 1000 machines).

  • burkemw3 6 years ago

    A few random thoughts:

    IPFS sounds a lot like what you want. Tahoe LAFS plays in this space a bit

    The newer distributed file systems I've seen don't like permissions. They like cryptographically-backed capabilities. You have the permission to read the file because you have the ability to decrypt, through the key. (Of course, key management is easy </sarc>).

    Some user-facing distributed filesystem talks drift into FUSE (or similar) territory. A TahoeLAFS dev talks about how users probably don't actually know what they want: https://plus.google.com/108313527900507320366/posts/ZrgdgLhV... (QUIBBLES: REAL FILESYSTEM VS. STORAGE APP section). This is probably less relevant for a package manager.

    The first time I read about BitTorrent based deployment was from Facebook.

    • chubot 6 years ago

      Thanks for the link! I've heard about Tahoe LAFS, but like a lot of these projects (Upspin, IPFS), I don't actually know anyone who uses them!

  • antoncohen 6 years ago

    Basically you want S3, which has BitTorrent BTW. S3 has multiple widely used command line tools that support syncing (cloning, pushing). There are multiple heavily used S3-like systems, open and closed source, some are S3 API compatible. Examples include Ceph and OpenStack Swift.

    The reason people use cloud services instead of running these themselves is because the cloud storage fee is usually less than paying someone to run a distributed system.

    • chubot 6 years ago

      Right, but I can run my own git "server" with no problem. That's why I use that analogy.

      Plenty of people run their own BitTorrent trackers too.

      In my mind, "cloud" open source means: you need a team of experts to run it. git and BitTorrent are different -- they are designed for you to run it yourself.

      Also, the model is to "sync" and then "read/write", as with git (and BitTorrent). Not just read/write remotely. So maybe I should call it a "replicated file system" rather than a distributed one.

      I guess the main difference with git is that it should handle large binaries / many files, you shouldn't have to clone the entire repo, and maybe users/permissions.

      • stephenr 6 years ago

        Its not a solution to all problems but Minio (https://minio.io) will provide s3 compatible data storage on your own server(s).

  • mhacct 6 years ago

    I’m sure it misses some part of what you’re looking for but there’s an old project on github called DrFTPD that does at least part of what you’re looking for, in a relatively archaic manner. It’s designed for a distributed FTPD across WAN links. It also has mirroring / striping, user accounts, etc. I’m sure you could find some tool to help it look local if needed.

  • jimmy1 6 years ago
    • chubot 6 years ago

      Yes I watched a talk on it... it seems interesting for sure. It relies on FUSE I believe, whereas I might want an explicit "sync".

      Still, I'm definitely interested in whether I could use it to upload say 1 TB of source code, and maybe 5 TB of binaries, and have people sync efficiently (and partially).

  • zukzuk 6 years ago

    Syncthing is pretty nice. It's not a filesystem, and it doesn't deal with conflicts particularly well, but it's as close to a cross-platform, distributed solution that "just works" as I've been able to find.

  • _eht 6 years ago

    I feel like a major repo issue has yet to show it's head. We got a sample of what it might feel like with the NPM namespace fiasco that happened recently, albeit a different problem than scale and accessibility.

  • dsnuh 6 years ago

    IPFS does sound like it might be close to what you are looking for, just based off my little experience playing with it. I want to try to leverage a private deployment for dataset sharing for analytics, etc.

freehunter 6 years ago

I'm a big believer in logging, and I find it hard to believe that collecting and monitoring logs needs to have a server with 32GB of RAM and massive Java and ElasticSearch back-ends like ELK, Splunk, Graylog, or similar monitoring software.

I've been searching for something self-hosted that can run in my server's spare capacity and monitor the OS and application logs with a simple searching interface (just a web interface to grep would be handy) and come up short. If I can't host it on the cheapest DO plan, it's out of my budget. I really don't want to have to build it myself but I'm just about at that point.

  • mseebach 6 years ago

    Sounds like you want something like syslog-ng paired with Nagios.

    The "big" solutions require some hardware because they do a lot more than merely collecting your logs.

  • hobofan 6 years ago

    I used to do something similar (but a bit less fancy), with logs processed by Heka + bash scripts using jq. Heka has now been phased out in favor of hindsight[0], but I haven't personally tried that one out yet.

    [0]: https://github.com/mozilla-services/hindsight

  • Artemis2 6 years ago

    oklog might be for you? https://github.com/oklog/oklog

    It’s still new, but the architecture is sound and it worked well for me on a small single-node deployment (about 30 clients sending logs).

    • Blindedwino 6 years ago

      It looks promising. Now if it could integrate easily with syslog-ng/rsyslog, that would be perfect.

  • odonnellryan 6 years ago

    Nagios is not-so-easy to configure, but probably will get you most of the way there.

    • Blindedwino 6 years ago

      Nagios doesn't do log monitoring out of the box though. Unless I've missed that? It definitely doesn't aggregate logs.

      • odonnellryan 6 years ago

        Nagios has check_log but that is not great for memory, I believe it loads the entire log file into memory to take a diff to only check on the entries missed.

        There is check_logfile which has a similar name but is a different library, I wrote about how to set this up a while ago: https://medium.com/luma-consulting/how-to-install-check-logf...

        It's not super obvious how this plugin works unfortunately. Not the craziest thing in the world to configure but certainly not easy starting from zero!

m_ke 6 years ago

Machine learning model management and serving service.

I haven't seen a decent open source framework that makes it easy to package a trained model with prepocessing and postprocessing steps and deploy it behind an API.

Adding performance tracking and model validation on top of that would be great.

Then there are things like queuing/batching and autoscaling.

Closes thing that comes to mind right now is tensorflow serving and their k8s stuff (which it looks like they just renamed to tf-operator and moved to kubeflow org https://github.com/kubeflow/tf-operator)

  • mpeter88 6 years ago

    The model development pipeline is still a far cry from the maturity of the software development pipeline, and will have to get there in order to reduce the hands-on heavy lifting required for model development and deployment.

    Along with packaging a trained model are things like: Snapshot/versioning of the training and test data used to create the model, versioning of the model, storing versioned models in a model registry, auto-deploying models from the registry to target environments, telemetry from deployed models.

    Closest I've found is https://github.com/mitdbg/modeldb, and I've spoken to the woman leading the effort. They still have data versioning as an open question, and don't see the need. But there are training set modification, results RCA, and other use cases that drive the need to catalog training/test data with the model that results.

    It'll get there. Just a question of when and how.

  • agibsonccc 6 years ago

    This is basically all we do (Disclaimer: This is mainly meant for bigger companies, not startups): https://skymind.ai/platform - reach out if you have any questions. Mail in profile.

    One thing I'll say is "just k8s" isn't realistic. You need a lot more than that.

    It should run within k8s but should also be minimalistic. Not a lot of platforms provide that. A lot of our docs on the internals are still being built out yet, but we provide everything ranging from an offline install of anaconda to managed connectivity gateway to hadoop and spark clusters.

    We also have built in model serving and experiment tracking (which in reality is just a relational database with a rest api automatically integrated in to the platform) - if you're interested in learning more please reach out.

simonw 6 years ago

I'm still waiting for a rock-solid scalable open source graph database in the mold of the Freebase database engine or the amazing graph database that Facebook have built for themselves. I'm very excited about dgraph as an option here but I think it's still an area that is very open for new entrants.

simonw 6 years ago

An open-source platform for self-hosting function-as-a-service - something that provides the tooling for easily saying "deploy this function, auto scale it, route HTTP traffic to it, now atomically replace it with this new version" without having to lock yourself in to Google/AWS/Azure.

  • jacques_chester 6 years ago

    There are a lot of these underway at this point. I'm allocated to one (Project Riff). We also like to hear news from our nearest neighbours: Fn, Kubeless, Fission and OpenFaaS.

    A lot of what FaaSes give you is a basically a PaaS, with autoscale to zero and preconfigured event sources. If you don't need those -- actually need them -- then you can get away with a bog ordinary PaaS for the time being.

  • owyn 6 years ago

    There are a few. flynn.io does this (it's a bit like heroku, using git push as a deployment action and a web ui to manage things). caveat: i hasn't used it...

jastr 6 years ago

Some of the biggest advancements in recent years have come less from technological breakthroughs, and more from improvements in designs and abstractions. Tools like Ruby on Rails, GraphQL, and even AWS, while impressive technically, are breakthroughs because they improved developer efficiency. They also weren't "needed" until they were built, and have allowed many developers to work on broader parts of the stack.

Also, today's tech solves today's problems.

mrfusion 6 years ago

Even though we have Postgres and MySQL they really don’t seem well configured out of the box and you’ve got to wade through a bunch of settings files and understand what vacuuming is.

I think we need a self tuning rdms. It could watch and adapt to usage patterns and available resources.

  • davidjnelson 6 years ago

    Isn't that what aws aurora and google cloud spanner offer?

simonw 6 years ago

There might be something exciting to build to help implement ludicrously fast Google-style autocomplete / typeahead Search. I've tried using MySQL, elasticsearch, PostgreSQL with trigram indices... they can be made to work, but I've never felt that I'm anywhere near the quality of whatever it is Google are doing here.

  • lowry 6 years ago

    You need to go deeper and use AnalyzingSuggester from Lucene. It is as fast as it gets. Also, do not forget about the web part of the equation. Using HTTP/2 helps, as well as disabling buffering all the way through.

  • btbright 6 years ago

    Have you looked at Algolia?

Kagerjay 6 years ago

I would say anything in the video / 3D / imaging service platforms always have a lot of things left to be desired for.

Anything that potentially touches FFMPEG basically

I can name a few examples that I still think need improvements

- Online gif editors

- Video editing / clipping

- Background image editing / online photoshop equivalents

- Better alternatives than lucidpress / adobeIndesign for catalog page / brochure creation

- Machine learning / deepfake online tool, this is all driven client side mostly

- Pretty much anything client side thats not yet server-side is open game I would say

- PDF markup tools could be better for online-based services, especially for architectural design

- CAD-based online programs for retails so customers can DIY build their own warehouse or layout schemas is lacking

- Better online RDBMS. Currently, there's just airtable, its lacking some core features like refential integrity

- Integrating space-repetition learning in most educational based services (lynda.com,pluralsight, etc)

- Managed ecommerce cart / hosting services. Its a well understood problem that should technically be easy for a client to do.

Again, I would say, almost every profitable service touches some form FFMPEG for video / image editing / 3D is definitely still out there. There's such a huge untapped market out there combining all of theses services in one package.

  • imhoguy 6 years ago

    I would add on-the-fly video transcoding. FFMPEG would need some efficient context state serialization to bring resumability/seekability and at the same time to stay low on resources.

takinola 6 years ago

I should be able to copy and paste a server configuration and create an identical copy of my server. Right now, the only solutions I see are generating images (too big and cumbersome) or writing scripts (too complex).

I would like to be able to type in a single command and replicate all the packages, services and configuration present in a particular server to a new target system.

  • noir_lord 6 years ago

    You can get a fair way down that path with ansible if you are very careful but it still requires too much work, modern operating systems and packaging where not built for idempotent rollbacks.

    We try to treat pets like cattle then wonder why they bite us (to reverse the usual refrain).

    I agree though, it should be declarative but everything shits everything else all over the filesystem.

  • artpar 6 years ago

    I am trying to make daptin just like you explained.

    https://github.com/daptin/daptin

    A JSON config plus backing database (mysql/postgres/sqlite) defines your complete environment.

  • viraptor 6 years ago

    Software that does it: puppet, chef, salt, ansible, ... You can have chef as a service in AWS via OpsWorks. Alternatively almost every cloud provider supports cloud-config at boot time.

  • sjellis 6 years ago

    I think there's a general problem that the incentives are for building big tools, so small problems like this are handled by everybody writing their own ad-hoc scripts.

  • rs86 6 years ago

    Maybe you will find Nix useful.

  • hathathat 6 years ago

    Doesn't Puppet(.com) do something like this?

jedberg 6 years ago

A workload-aware data distribution proxy.

Let me explain. A lot of people talk about multi-cloud these days. Either AWS and Azure, or AWS and their own datacenter, or whatever.

While it's really easy to send compute jobs to one cloud or the other based on which one is best suited for the job, the big issue is that your compute job will need data. Right now your choices are to have a copy of all your data in all the clouds (a very expensive proposition since you'd be constantly shipping updates across very expensive outbound connections) or to have all your data in one place and have the compute jobs reach out across the internet to get it (also expensive and now there is latency for every job too).

I want a proxy that is smart enough to say "Workload X is usually done on AWS, and requires data points Y and Z, so make sure Y and Z are always up to date on AWS, but lazy update Y and Z to GCE in batches through the cheapest direct connect possible".

  • matteuan 6 years ago

    One project that is aiming in this direction: http://seaclouds-project.eu/

    • jedberg 6 years ago

      That's an interesting start! Scanning briefly it looks like I still have to tell it what data and workloads go where -- it isn't figuring it out automatically. Which is the holy grail I'm looking for.

tmaly 6 years ago

Given the sheer number of services on AWS, I would like an expert system that would query me for requirements and then suggest a set of possible services to use together to solve my problem.

simonw 6 years ago

Serving machine learning models in production is still something that appears not to have an obvious correct solution.

  • agibsonccc 6 years ago

    Hi fellow YC alumni! See the pitch here: https://news.ycombinator.com/item?id=16399326

    We'll be supporting PMML and the like as well. The goal is to hit the simple things rather than perpetuating the latest hype like the AutoML stuff people are going on about currently. If you'd be interested, would be happy to have a conversation to go over what we're trying to do. We hope to just provide a platform neutral tool for building and deploying models similar to sagemaker (but cloud agnostic)

ioddly 6 years ago

I think what I've been missing, and trying to do a better job at, is understanding the awesome power of existing and mature solutions. Things like postgres's LISTEN/NOTIFY and so on.

trjordan 6 years ago

10 years ago, I learned about the idea of elastic infrastructure. Services with load balancers in front of them, hosts that come and go easily, a structured way to communicate between services.

At the time, that was nginx + manual autoscaling, then it was ELBs and autoscaling groups, now it's kubernetes and containers, maybe hosted. It's still not there.

I'm excited about the software that makes that operable at scale. It seems like a service mesh is a good idea. It seems like mutual security between services is a good idea. It seems like storing routing configuration in a separate control plane that is executed in a data plane like Envoy is a good idea.

There's a bag of software at CNCF that's loosely organized around this, but I don't think the "just deploy some code, it can scale and you can have tons of services doing that with good visibility and operability" is quite there. I'm really exicted about Envoy, but I don't think there's a good control plane for it. I'm part of a company that's working on a commercial implementation (turbinelabs.io), and Istio is in a similar space.

There's still work to be done!

dozzie 6 years ago

> It seems like there are so many choices in each category that there's nothing left to do.

Log storage and search for structured logs (e.g. JSON or CEE, not merely stringblobs). We have paid solutions (Splunk, Loggly, Papertrail), and then we have Elasticsearch, which gets worse in this use scenario with every release.

Message stream processing engine that doesn't require restarting to add a query or data sink, so you could build monitoring system around it. In fact, a monitoring system designed to allow you to easily add your custom processing or data sink.

Infrastructure inventory that can be both filled by hand and kept updated by machine and that can be queried from script or browsed by human. For that it would be useful to have a good topic maps engine, which is another missing thing.

OS updates manager that can handle more than just Red Hat/CentOS (Red Hat Satellite or Spacewalk) or just Ubuntu (Canonical Landscape), and while at it, one that doesn't try to be underdeveloped configuration management tool (like CFEngine or Puppet) and underdeveloped deployment tool (like Ansible), but can cooperate with them.

And there's much more where these came from.

ex3ndr 6 years ago

I really want to simplify writing backend logic and implement everything like functions that reacts on some events and produce another one's. This is a very very looks like, Actor Model (Akka) plus Redux's Reduces: We have just a bunch of a state changing rules and they can be executed reliably and with decent performance.

Something very simple (for example for Build Server):

Build Failed (Event) -> Assign responsible user for a crash -> Send Notification -> Deliver Notification via user's configured notification systems (email, push, sms..)

This is a Event Sourcing, but event sourcing works well only for one part of the platform - write side, but reader side is sometimes too hard to implement. Sometimes there are a problems with eventual consistency and you actually need to wait while some of the reducers in chain will process this event and starting from this point everything became toooo slow to develop and you basically start to redevelop the wheel - this is just a database engine de-factor. Meh.

  • jacques_chester 6 years ago

    I'm a one-eyed Concourse fan, which implements a separation of state and logic that allows designs like this. I've been pitching pretty heavily the concept of creating or converging to parity with the FaaS I'm assigned to work on.

    So: maybe. We might pull this off.

joshavant 6 years ago

I'm a front-end developer.

I'd like an OSS solution that will allow me to deploy an arbitrary server-side service - be it a Ruby, Python application or even a package like OpenVPN - to some cloud infrastructure, easily.

This solution should spin up a hardened OS distro (CIS-compliant, maybe?), provision it with my arbitrary services (using Ansible or Chef or something), and deploy it to AWS or some cloud infrastructure for me (using Terraform or something).

All these component pieces exist, but nothing ties all of them together for an easy deploy for a front-end developer like me.

(And, I know things like Heroku and CodeDeploy exist, but I dislike lock-in and they nearly universally come with their own restrictions, like lack of support for server-side Swift applications or custom services like OpenVPN or git-annex.)

EDIT - I'm strongly considering taking some time off to write this soon, so get in touch if this is something you're interested in! Contributing or using!

  • jacques_chester 6 years ago

    There's an overlap with what BOSH does. Starts with a stemcell, compiles your packages against it, spins up whole VMs configured with those packages.

    It also adds monitoring for both VMs and processes.

    Don't underestimate what you get from a PaaS like Heroku, though. If you're able to stick to 12-factor apps, the lockin is pretty mild -- you should be able to hoist your skirts and move to Cloud Foundry or even OpenShift without too much pain.

    Disclosure: I work for Pivotal, we work on BOSH and Cloud Foundry. We compete with Red Hat and Heroku.

  • rloewenherz 6 years ago

    We're actually about to release a product that does exactly this! We're in private beta right now, but will have something available to the public soon. If you're interested, check out www.nucleus.codes. And if you want to join the beta, email us at hi@nucleus.codes.

slake 6 years ago

A CMS backend to which I can sew on any template based frontend I want without having to follow the strict workflow set by the backend.

The backend would just serve content when asked (JSON prolly), the frontend could be anything you desired.

LeonidBugaev 6 years ago

Area of automated testing and developer tools, in general, is hugely underestimated. Tools that help you write, format, verify code, automatically detect issues from stacktraces, or any other sources like tcpdumps, smart fuzzers, and etc. The market is huge.

I have a personal project for 5 years in this area https://goreplay.org, and investigating ways to automatically find issues in web applications. Project get traction, but I do not see any competition so far. And there is a lot of reasons for this because in such under-researched areas being a pioneer for both project owner and end user require them a different set of mind. And this is really hard to market.

ntolia 6 years ago

I believe the problem has now shifted to a slightly higher level and especially when working in a microservice/containerized world. For example, given the number of specialized services you mentioned, we see more applications with polyglot persistence stacks underneath. If they form a part of the same application, how do you collectively manage them? What does it mean to take a consistent snapshot? These and a bunch of other similar questions need to be answered.

There is some work we are doing (https://kanister.io) to help with these issues but there are a lot of emerging solutions in this space. Look at CNCF's landscape for some more detail.

stocktech 6 years ago

I feel like we're in a consolidation phase. We have a ton of tools/software and the big developments have been in how we use those tools aka devops. I'd put things like kubernetes/docker in this category, but there's obviously huge room for growth around this tech.

I do think there's room for new tools tho, but by definition, they're on the cutting edge and not especially visible if you're not looking. Things like stream processing are already incredibly useful and only getting better. Machine learning could be in this list too. It might not be a 1000% improvement like caching, but that's part of a maturing industry.

maratd 6 years ago

> Nowadays it seems like there's nothing missing.

There's lots of things missing. There's just nothing missing in established categories. Why would there be?

Start your own category, create software for it, then convince everyone that the category is important.

  • thesmallestcat 6 years ago

    Great advice! I'd venture that managing large files is an unsolved problem. It's a hack in most version control systems, and uploading/downloading files from a host, even S3, is a slow, serial process. Same for checksumming. Network speeds have more than caught up, and large files are a frequent process bottleneck. Something that makes it easy to manage and consume large files could be a big deal. It probably would require a new application protocol, maybe even a new filesystem similar to XFS.

    • imhoguy 6 years ago

      Or maybe a file should stay where it is and processing logic itself should be deployed there. If file parts are distributed then processing could be suspended and migrated to place where next piece is stored. Something similar is done with Hadoop and HDFS.

  • tboyd47 6 years ago

    This. If you want to create something, create something that simplifies things for us. Don't just create another tool to go on the tool belt.

maxxxxx 6 years ago

I remember when databases were only for highly paid experts and needed a lot of customization and configuration. I think we need a similar development for AI and ML to make them accessible to the average developer.

rs86 6 years ago

I wish we had more statically typed web backend frameworks as usable as rails or phoenix. I have used Elm a lot recently and it rocks. I wish I had something as incredible for the server side.

  • davidjnelson 6 years ago

    I do too! Seems like the play framework with java and the sails framework with typescript are a few existing options. I'd love to see something rails-ish for go.

lowry 6 years ago

I miss a decent self-hosted photo/video library.

DannyB2 6 years ago

The missing piece?

It is a new project, it enables your server to use all other front end and back end technologies at the same time! That is what makes it so fantastic!

You get all front end JavaScript frameworks, and all back end server technologies, all for one low price. Just install this new component into your project. It will download the other few gigabytes and hook it into your project.

CryoLogic 6 years ago

1. Static Website generators for non-blogs 2. Half decent wrapper libraries for tools like FFMPEG in NPM 3. Plug and Play game server with stat tracking that can be easily used for any game 4. More versatile compression formats for media, ability to serve up 360/720/1080 from the same file?

  • TheAceOfHearts 6 years ago

    1. It's hard to give constructive help without more information. Why not write a script that compiles a bunch of pages into html? As the other comment said, there's also stuff like jekyll. Maybe you could provide some use-cases?

    2. Explain your use-cases? Why not call ffmpeg directly? You'll need to familiarize yourself with the original library functionality anyway, no?

    3. No opinion. Not particularly interested in games. Might be hard to generalize, maybe?

    4. Versatile in what sense? What are your complaints with existing media formats? I think licensing is an issue, but stuff like VP9 and Opus seem to take care of that matter. I'm not an expert, but I think both MP4 and MKV can already hold an unlimited number of media files.

  • revicon 6 years ago

    >Static Website generators for non-blogs

    We use Jekyll for more than one site which is not a blog, works great for us so far. What features do you think are missing that you wish it had?

  • freeone3000 6 years ago

    (4) can already be done with MKV... technically. You actually end up with 4 alternate video streams in the same MKV, which you remux for clients that can't handle it.

pmohan6 6 years ago

I don't have great suggestions for you but I think software needs are ever evolving. As more of the world comes online, there are more data requirements, newer business requirements, etc. We would need even more scalable systems on various dimensions.

biggodoggo 6 years ago

There's two ways to "fill a gap", either you create a gap that needs filling or you make an existing solution better. You don't see anything missing because you aren't looking at things from this perspective.

slake 6 years ago

Security probably requires quite a bit of OSS on it. Lot's of proprietary software. There are a lot of penetration testing devices, not so many securing software.

thesmallestcat 6 years ago

Static website generators.

  • m_ke 6 years ago

    This might be a joke but there's plenty of room for a good static wordpress alternative that focuses on bloggers instead of developers.

    • pzk1 6 years ago

      Netlify CMS?

  • Kagerjay 6 years ago

    I would argue content management systems for static website generators like cloudcannon.com could be better.

    We already have jekyll and hugo

sphix0r 6 years ago

Before storing data we should have a good do not track / privacy respecting software for the data we store.

deepnotderp 6 years ago

I want a way to use AWS Lambda/GCF but with cluster level network locality.

  • boulos 6 years ago

    Interesting. Do you mean for functions to call each other (so that it’s sub-ms) or for some other reason?

    • deepnotderp 6 years ago

      Yeah, basically low latency versions of FaaS.

slake 6 years ago

A plugin to handle all of GDPR requirements?

tboyd47 6 years ago

It feels rather like we have too much software.

  • dspillett 6 years ago

    An embarrassment of riches in the categories that are served does not preclude the possibility that there is a need that isn't well served at all yet.

djswartz 6 years ago

I haven't found a good authorization service (fine grain roles, acls, etc). I always find myself having build it for every new project/company.