Use Cases and Ideas - MSUCSIS/Inference GitHub Wiki

What should the library do? What do we want to do with it?

Core Library

Optimize testing

Composable parsers

Web Stuff

Web Application

We might want to create a front end that uses the library and allows for the use of the library for demo purposes. Later we might extend this to have more advanced UI support for using the web application as a workspace of sorts. Initially, though, we might simply want to allow the user to type in some values, paste in a document, or upload a few files that we can process and provide a result for.

Web Service

We may also want to provide support for larger jobs by creating a public API which can be accessed programatically so that the web based code can be used for more realistic use cases. We would need to think about things like the cost, since we most likely want this to be on a free tier of AWS or Heroku or... so we may need to restrict access or limit the usage or something...

Batch Use

I think some use cases will be based on processing documents in batches and using the results for some purpose like generating interface definitions, performing data exploration, etc.

Generate Code

We might want to take the results and generate actual source code that can be used to store the data from documents. We could also generate compiled class files, etc. and include code that uses xstream or some other libraries to read in the documents as well. So, the user could provide documents, and they will get back a library that they can pass future documents into and get back objects representing the data. There are a lot of libraries that do this already, but they do it in a very conservative, generic manner since they don't know the actual data types. For instance, you might get a generic map with key,value pairs with types like string, number and boolean, but no attempt to actually be precise about the types.

Generate Schemas

We may want to analyze documents and then generate schemas for the documents that can be used to verify future documents.

Thrift

We might want to generate thrift specifications

Online Uses

Stream Processing Monitoring

Users could continually monitor the documents and data types and be notified if the input types start changing which could indicate an error in the creation of documents or that they have misunderstood the real document structure and should, potentially, review their pipeline

Beyond Inference

We may want to consider how we can use this library as a piece in a bigger plan related to data integration problems and plan accordingly

Other

Would it be good to implement some random data generators for each type as well?