XSD.exe and Reuse of Generated Classes

What is Xsd.exe?
It's a small utility bundled in the .Net SDK. typically somewhere in the NetFX 4.8 Tools folder. I usually just open up the Visual Studio Developer Command Prompt and magically it's in my %PATH%.

What's it for?
It is a tool that let's you generate c#/vb.net classes used with XmlSerializer. Typically to help read/write xml from/to business objects. Or you can point it at some xml and have it infer the schema (and you really shouldn't - well, maybe to get a first draft, but otherwise don't).

But that sounds great! Why would you not love this?
And I'll tell you. If you've ever tried referencing a Soap/Wcf service from Visual Studio, you have been given the option to reuse existing types in a dll. Which means, that if you're using multiple services from the same site, you can have one instance of a Document class (or whatever the domain is) in stead of one Document class per service. Granted, there will be some fiddling to make it work, but it does work, and that matters. Xsd.exe doesn't support that.

I Have 80 document types give or take, that share imports/includes of schemas from a shared domain. Our application makes use of about 20 of these document types, and using xsd.exe on each of these 20 documents produces about 5-10 identical class-definitions per domain object - and that's a problem.

Another thing is the built-in modularity features of Xml Schemas. You can xsd:include another xsd IF the namespaces of the two schemas match. If they don't you have to xsd:import the offending namespace. Xsd.exe supports resolving the included schemas, but not the imported ones (read: I can't make it work). And of course only if they're on your local disk. http(s) schemaLocations are not supported. I found a blog post describing how to get around this little issue - very handy. As the case were, I had all referenced schemas on disk, but since you have to explicitly tell xsd.exe the imported schemas, I modified the above solution to build xsd.exe-expressions to suit my need. 

Handling duplicate class declarations: For this I used Microsoft.CodeAnalysis. You point a CSharpSyntaxTree at the generated code, and you can extract whatever you need from the xsd.exe generated code. With this, I built a simple tool to 

  1. Build an xsd.exe expression per root document xsd and run them.
  2. Parse all the generated code using Microsoft.CodeAnalysis and build unique lists of 
    1. ClassDeclarations
    2. EnumDeclarations
  3. Use said lists to write new generated classes.cs and enums.cs and delete all other generated code.

Show me the code! Well, no. It's a bit domain-specific right now, but you can have a few bits and pieces

// Recursively read all schema/includes and add to files-collection
// if schema is imported
private void Do(bool import, string file)
{
    var contents = File.ReadAllText(file);
    var regex = @"(import|include).*?schemaLocation=""(.*?)""";
	
    if (files.Contains(file))
        return;
	
    var toProcess = Regex.Matches(contents, regex)
        .OfType<Match>()
        .Select (m => (inc:m.Groups[1].Value, sl:m.Groups[2].Value))
        .Select (m => (inc:m.inc == "import", path: Path.GetFullPath(Path.Combine(dir, m.sl))));
	
    foreach (var nested in toProcess)
        Do(nested.inc, nested.path);
    if(import)
        files.Add(file);
}

And for each root document, you reverse the files list, and build an expression like "xsd.exe /c file1.xsd file2.xsd file3.xsd /namespace:foo /o:file3.cs and pipe it to a command processor (like System.Diagnostics.Process).

With that out of the way, process all the generated .cs files: 

// Read generated source, extract namespace, classes and enums
// and store them uniquely in appropriate collections
private void Disambuiginator(string sourceFile)
{
    var contents = File.ReadAllText(sourceFile);
    var syntaxTree = (CSharpSyntaxTree)CSharpSyntaxTree.ParseText(contents);
    var root = syntaxTree.GetCompilationUnitRoot();
    var classes = root.DescendantNodes().OfType<ClassDeclarationSyntax>().ToArray();
    var enums = root.DescendantNodes().OfType<EnumDeclarationSyntax>().ToArray();

    if(theNamespace == null)
    {
        // only one namespace ever. Stored in static variable
        var ns = root.DescendantNodes().OfType<NamespaceDeclarationSyntax>().Single();
        theNamespace = ns.RemoveNodes(classes,SyntaxRemoveOptions.KeepNoTrivia)
				.RemoveNodes(enums,SyntaxRemoveOptions.KeepNoTrivia);
    }

    foreach(var @class in classes)
        AddToGeneratedClasses(@class.Identifier.ValueText, @class);
    foreach(var @enum in enums)
        AddToGeneratedEnums(@enum);

}

And finally to write the code to classes:

private void CreateClassFile(string path, string className, ClassDeclarationSyntax cds)
{
    var unit = SyntaxFactory.CompilationUnit();
    var ns = theNamespace.AddMembers(cds);
    unit = unit.AddMembers(ns);
    File.WriteAllText(Path.Combine(path, $"{className}.cs"), unit.ToFullString());
}

and similarly for enums. That's it. Remember to store the program for the next batch of schema changes!

Comments are closed