TypeSpec to Java - Azure/autorest.java GitHub Wiki

Requirement

DPG 2.0 requires TypeSpec as input, if service would like to generate models.

Part of the reason is that TypeSpec supports versioning. It is hard to support from OpenAPI, or from OpenAPI generated from TypeSpec.

TypeSpec

GitHub

Resources:

The Data-plane in TypeSpec would be used for validation during development, until we had knowledge of the first real TypeSpec for SDK release.

Pipeline from TypeSpec to Java code

AutoRest CLI currently does not support the pipeline from TypeSpec to code generator.

TypeSpec Java is integrated as plugin to TypeSpec compiler.


Plan (2022/07/08)

Pipeline for TypeSpec-Java

The Java.emitter and the JAR of the code generator is packed into a single NPM package.

  1. Java.emitter first communicates with TypeSpec compiler/rest/versioning, to generate a code-model.yaml for code generator.
  2. Java.emitter then executes the JAR, with necessary information.
  3. JAR of the code generator parses the code-model.yaml and generates Java code.

AutoRest

flowchart LR
  Swagger-->m4
  m4-->preprocessor
  preprocessor-->javagen
  javagen-->postprocessor
  postprocessor-->Java.SDK
  preprocessor-->androidgen
  androidgen-->Android.SDK

TypeSpec

flowchart LR
  TypeSpec-->Java.emitter-- yaml -->preprocessor
  subgraph JAR
  preprocessor-->javagen
  end
  javagen-->Java.SDK

Source:

IPC

The code-model.yaml is compatible with output of current Modeler Four. It will be enhanced for TypeSpec features.

Candidates of enhancement:

  • Summary on each type, operation, property
  • Namespace on type (if different from the global namespace)
  • Versioning information (addedOn, removedOn, renamedFrom, madeOptional)

Code generator

preprocessor and javagen is packaged together in one JAR to form the code generator. postprocessor is temporary left out. But it can be included without much effort.

Log is written to stdout, and it is connected to Java.emitter.

Files are directly written to file system.


Detail

Namespace

language:
  default:
    name: Confidential Ledger Service
    description: ''
    namespace: Azure.Security.ConfidentialLedger
  java:
    namespace: com.azure.security.confidentialledger

Code generator will do further processing, like replace Azure.Core.Operation.Error with com.azure.core.models.ResponseError.

Literal Type

Literal type (StringLiteralType, NumericLiteralType, BooleanLiteralType) maps to Constant.

Union (UnionType) of literal type maps to Enum.

Enum

Enum (EnumType) maps to ExpandableStringEnum.

Enum with @fixed decorator maps to Enum.

string with @knownValues

It maps to ExpandableStringEnum.

Nullable

Union of int64 | null maps to Long (object), while Model int64 maps to long (primitive).

This difference only applies to Java primitive data types. There is no difference to Java object data type, as it is always nullable.

Nullable could be handled differently in Patch model for "application/merge-patch+json".

Optional and default

foo?: string = "bar" maps to optional parameter in API or optional property in model.

The default value is for service (when the parameter or property is not provided, service takes that value), SDK does not use it.

Union

Union is supported as input.

input: string | string[] maps to classes

public abstract class InputModelBase {
    protected InputModelBase()
}

@Immutable
public final class StringInputModel extends InputModelBase {
    public StringInputModel(String value)
    @JsonValue public String getValue()
}

@Immutable
public final class StringListInputModel extends InputModelBase {
    public StringListInputModel(List<String> value)
    @JsonValue public List<String> getValue()
}

@visibility

If a property has @visibility decorator but without input context in it, it is read-only.

@server

If there is no parameter, SDK uses the url of the @server as host, similar to host in OpenAPI.

If there are parameters, SDK takes the parameters to populate the host (url would then be e.g. https://{region}.foo.com as template), similar to x-ms-parameterized-host.

If there is no @server, SDK fallback to takes a single {endpoint} parameter as host.

All these parameters are treated as client parameters.

Multiple @server (to namespace) is supported. Different server would have to be on different client.

@versioned

Multiple api-versions map to multiple enum value in ServiceVersion class. Last api-version is treated as latest.

public enum FooServiceVersion implements ServiceVersion {
    V2022_06_01_PREVIEW("2022-06-01-preview"),
    V2022_12_01_PREVIEW("2022-12-01-preview");
}

One can use service-name emitter option to change the name of the class.

Different versions for different client is supported as preview feature. It would result in one ServiceVersion per client.

Pageable operation

Service is recommended to use op ResourceList<> from @azure-tools/typespec-azure-core.

Method signature:

PagedFlux<BinaryData> list(...)

PagedIterable<BinaryData> list(...)

Authentication

@useAuth(OAuth2Auth<[AuthFlow]> | ApiKeyAuth<ApiKeyLocation.header, "x-ms-api-key">)
namespace ...;

model AuthFlow {
    type: OAuth2FlowType.clientCredentials;
    tokenUrl: "https://api.example.com/oauth2/token";
    refreshUrl: "https://api.example.com/oauth2/refresh";
    scopes: [
        "https://api.example.com/.default"
    ]
}

Only OAuth2 (with scopes) and ApiKey (with header) is supported. They produce trait TokenCredentialTrait and AzureKeyCredentialTrait in builder, respectively.

Round-trip model in PUT

PUT method is usually defined as ResourceCreateOrReplace<>, for example:

op createOrUpdate is ResourceCreateOrReplace<Project>;

The model of request body is ResourceCreateOrReplaceModel<TResource>, which passes multiple templates/decorators. Hence, its definition is no longer same as TResource.

SDK is still required to have same model for request body and response body, for example:

Project createOrUpdate(String projectName, Project project);

Model in JSON Merge Patch

In design.

Convenience API is not generated for JSON Merge Patch.

Long-running operation (design changes expected)

Service is recommended to use op LongRunningResourceCreateOrReplace<> etc. from @azure-tools/typespec-azure-core.

At present, emitter recognizes @pollingOperation decorator on operation (for now, also @pollingLocation decorator in response headers).

Method signature:

PollerFlux<BinaryData, BinaryData> beginCreateOrUpdate(...)

SyncPoller<BinaryData, BinaryData> beginCreateOrUpdate(...)

Convenience API takes the response type of @pollingOperation API as poll response type, and the response type of @finalOperation API as final result type. If no @finalOperation, it would deduce the final result type as response type of this LRO API (actually the activation API), which could be incorrect.

Exception

SDK uses exception classes from azure-core, e.g. HttpResponseException, ClientAuthenticationException, ResourceNotFoundException, ResourceModifiedException.

TypeSpec does not yet able to specify that a particular status code as expected or not. Therefore, at present, any status code same or larger and 400 is treated as unexpected.

Convenience API and models

Service uses decorator @convenientAPI from @azure-tools/typespec-client-generator-core.

The operation would have.

convenienceApi:
  language:
    default:
      name: <convenience-api-name>

And all related models (object and enum) would be annotated with usage.

usage:
  - convenience-api

And only those models having convenience-api in usage would be generated as Java file.

Model used as response body of pageable operation is generated in implementation/models package, as the class does not need to be accessed by user.

Emitter options

Options to the cadl-java can be specified in tspconfig.yaml.

For instance:

emit:
  - "@azure-tools/cadl-java"
options:
  "@azure-tools/cadl-java":
    emitter-output-dir: "{project-root}/azure-ai-language-authoring"
    namespace: "com.azure.ai.language.authoring"
    service-name: "Authoring"
    partial-update: false
    service-versions:
      - "2022-05-15-preview"
    namer: false
    generate-samples: true
    generate-tests: true

A few dev options are reserved for developer:

    dev-options:
      generate-code-model: true

Client and operation group

Service uses decorator @client and @operationGroup from @azure-tools/typespec-client-generator-core.


Discussion (2022/06/30)

IPC between Node.js and Java

As CADL compiler is Node.js, and code generator is Java, some kind of IPC is required.

Candidates (brainstorm):

  • IPC supported by CADL package
  • A daemon service for IPC (e.g. Codegen calls getAllRoutes via REST API, the daemon call same to CADL compiler, then send response to Codegen as JSON)
  • Compile both to binary, e.g. WebAssembly or GraalVM
  • Java runs JavaScript engine, e.g. J2V8

Proof of concept (ClientModel as IPC)

A standard flow without much advanced tech stack would be (which is what Python does),

flowchart LR
  CADL.compiler-->Java.emitter-- yaml -->Codegen-->Java

The yaml is the intermediate data for communication between CADL compiler and code generator. It is in the format of internal ClientModel of the code generator.

The Java.emitter is a TypeScript library that interact with CADL compiler and output the yaml.

Sample:

Design and improvements (brainstorm):

  • Limit the code of Java.emitter which is in TypeScript, as we are Java developer. But it might still be covering what we had for preprocess module and mapper package in javagen module.
  • Should we use YAML or JSON? The difference is that snakeyaml in Java is not easy to use, but YAML supports anchor and reference natively.
  • Should we directly aim for ClientModel, or some data format more aligned with CodeModel from Modeler Four.
  • Should we generate the essential part of the ClientModel, and let code generator to fill the rest. E.g. only include ProxyMethod in YAML, get ClientMethod generated from it; only include ServiceClient in YAML, get ClientBuilder generated from it.
  • One difficulty is that the class initialized by snakeyaml is not compatible with existing Builder patten. In PoC the walkaround is many additional setter methods.

Current state:

  1. Builder pattern (and immutability of basic ClientModel objects) is a major source of incompatibility with YAML.
  2. Singleton pattern (e.g. single ClassType.UserDefinedModel as IType for single model) and multiple references (e.g. ProxyMethod referenced from Proxy and ClientMethod) is a major source of incompatibility with JSON, which does not support anchor and reference (see *ref_ in the YAML).
  3. Duplication (e.g. lots of ClientMethod to a single request in operation) is manageable issue.
  4. Some code in Mapper would need to be re-write in TypeScript, or in Java but based on ClientModel.

Complication of losing CodeModel

The CodeModel from Modeler Four is much easier to analyze and manipulate than what we have now in ClientModel. For example, management-plane does lots of analysis and modification based on CodeModel.

On the contrary, ClientModel has more duplication in its data representation. E.g. data about model could be in IType, ClientModel, and maybe in other classes that have reference to the model.

A few DPG features, like selectively generate models for operations, would require analyzing the operation and the models used in its parameter and response, and then the hierarchy/reference of the models. We might either put the logic in TypeScript, or make ClientModel easy to analyze.

Continue to use CodeModel as IPC

Another direction to explore, with standard flow, is to let Java.emitter output a simplified version of CodeModel.

One advantage in development is that this almost completely de-couples work on TypeScript and work on Java. Work on TS would focus on generating a correct CodeModel from CADL. And work on Java would focus on consuming data from existing swagger for down-stream development, and later switch to CADL when the emitter is completed and tested.

In the long term, a language-agnostic domain specific data format (as the CodeModel and its evolution) helps developer think on what is the essential information we need to pass from CADL to code generator, not what Java code needs. For example, when encounter removedOn decorator, we might be tempted to jump in and think about method overload or model de-serialization in Java.

An apparent drawback is that CADL is already language-agnostic, and there is no better representation than CADL itself. However, if we need data exchange between Typescript and Java, CodeModel might still be the right compromise on what we are familiar (and known to work), and what is optimal (as we cannot output CADL itself).

Another drawback is that having another abstraction layer could have some cost on the speed of design and implementation. If we need to support a new feature from CADL, we had to think about how we represent it in language-agnostic way, and how to transform it to ClientModel which is best for Java code.

Make an RPC to CADL compiler

Another thought is to make the Java.emitter a daemon providing RPC for Codegen. (this one likely not going to fit in the Aug schedule)

When code generator calls getAllRoutes (may route to /localhost/getAllRoutes), the emitter in TypeScript would in turn call getAllRoutes from @cadl-lang/rest and reply the response as JSON.

This way, code generator almost directly works with CADL compiler, and the JSON in the response serves as intermediate data. There is no need to have any other language-agnostic domain specific data format.

There is a lot to verify on this approach.

  • Is the response of all CADL API representable in JSON?
  • How does code generator handle the raw JSON? Do we still use a model in Java to de-serialize it? Does it affect the feasibility of evolution if CADL decide to change the response data?

Current state:

  • Response of getAllRoutes cannot be serialized to JSON, due to circular reference.
⚠️ **GitHub.com Fallback** ⚠️