(English) DTD and XSD – The XML side you didn’t know about

Desculpe, este conteúdo só está disponível em Inglês Americano. For the sake of viewer convenience, the content is shown below in the alternative language. You may click the link to switch the active language.

Don’t you wonder why most of software developers vacancies demanding people with experience almost always have XML as a requirement?”Why put XML as a requirement for an experienced developer if it’s just XML”? That’s the moment you imagine there’s something about XML you don’t know yet.My name is MJ. I’m no XML expert. I’m just a man with enough patience to help you with that particular question up there.Welcome.

XML

XML is acronym for extensible Markup Language. It’s a document encoding format that is both human and machine readable.If you are a Java programmer, there is no way you don’t know XML. Remember the pom.xml, web.xml, beans.xml, build.xml, log4j.xml. These my friend, are XML documents.Lets create a sample XML document we will be using to remember the basics of XML

Elements and Tags

“An element is a logical document component that either begins with a start-tag and ends with a matching end-tag or consists only of an empty-element tag”. This paragraph was pulled from Wikipédia. I couldn’t resist to its simplicity then I decided to quote it.We have one element with tag name school and we also have two elements with tag name student in our XML document sample and one of them has a child element with tag name job. Our sample XML document has 4 elements.You got this right? I will assume you said “yes” and move on.The extensibility of XML resides on the fact that you can create your tags and markup your documents with your tags as you wish.XML documents are intended to be parsed by machines and in the same time be understood by humans. But how do we make sure the machine is reading a document with elements it knows how to process? The XML specification allows us to create XML document contracts allowing us to define a limited set of tags and attributes for each tag. It even allows us to restrict the values for such attributes and tags.There two ways we can define such XML document contracts. XSD and DTD. The two fellows that brought to this article. Here they come!

DTD

I want you to assume that every time I say “schema” I will be saying “language”. Why? because they are the same thing. A language has a set of rules and constraints to be observed, the same way schemas have. Its common in the software industry the use of the “XML language” term to refer to XML schemas.DTD is acronym for doc type definition. It’s a language that can be used to define XML schemas.DTD came out before XSD. DTD is a language that has few things in common with XML. Let’s see an example of a DTD schema document:This DTD schema defines one “note” element which has 3 child elements: “to, from, heading, body”.How does a XML document following this DTD schema looks like? Check it out:This is how we markup a XML document that conforms to our DTD schema.Did you notice the DOCTYPE? If not, check again. Pay attention on how we make reference to the DTD schema document.If for example, we add a new element to the document or even rename one of the existent, like this:We no longer have a XML document that conforms to our DTD schema.The schema never mentioned a “sender” element, nor an “origin” element. If we were going to validate this document against the DTD schema it wouldn’t pass.

XSD

XSD is acronym for XML Schema Definition. Its a XML Language used to define XML Schemas. Yes, you heard it right: XML Language.What do I mean by XML Language? A XSD Schema actually looks like an XML document, this makes it easier to learn and use than DTD.XSD introduced the namespaces concept and also a very important OOP concept: types inheritance. XSD Provides a more human readable syntax by being XML based and allows types re-utilisation through namespaces.

Namespaces – the need

An XML document is composed by elements that may have attributes/properties. XSD allows you to restrict the format of a property value, the minimum number of times an element must be present in the document, the maximum number of times the element must be present in the document and the possibilities list goes on. XSD defines restriction in an element-level model.XML Schemas might become huge and we may wish to have multiple elements with the same name, bringing up the need of grouping elements and attributes in some kind of packages which would allow us to have the elements and or attributes with the same name in different packages. Remember Java? The reason why packages exist? It’s the same reason namespaces exist. Remember C#? The reason why namespaces exist there is the same why XSD introduced namespaces.Being XSD restrictions element-level and attributed-level based, we may wish to re-use elements and or attributes we declared in previous XSD schema files. We should be able to import them and use them as we wish, the same way we do in Java or C# or anything else that implements some Types packaging or namespacing.This is enough for you to understand that XSD was solving a real problem introducing namespaces.Now, lets understand the way XSD implements namespaces, but first you should know what a namespace value looks like: namespaces are values using the URI format. Eg: http://www.w3.org/TR/html4/. That wasn’t a hyperlink for you to follow. That was a namespace example. A namespace doesn’t have to be a link that points to an XSD file.

Target namespaces

Target namespace is the namespace we want our XML elements to be published to.The target namespace of a XSD schema document is defined on the root element of it, which is the schema element. Lets now write an almost empty XSD schema document:I believe you found the schema element in the schema example and you have also seen the namespace to which we will be publishing our XML Elements. But, you have also seen the xlmns attribute and you might we wondering: What is that supposed to mean?

Namespace importation

If we can define a target-namespace, then, we should be able to import elements of our namespace, later, right? This is the point of having a namespace, its not because it’s a beautiful attribute. Now, the question is: What can we define as XML schema importation and how is it done? Importing a namespace means one of two things:

  1. Import all the elements of a specific namespace to the default namespace
  2. Define a prefix to be used when referring to elements of a specific namespace

What we have done back there is to import all the elements of the http://www.w3.org/2001/XMLSchema namespace to the default namespace. The schema element is part of that namespace and we are not using any prefix to access it.What if we wanted to define a prefix to the http://www.w3.org/2001/XMLSchema namespace? This is how we would do it:We just need to add the prefix right after the xmlns attribute: xlmns:xsd. Now we can only access elements of the http://www.w3.org/2001/XMLSchema namespace using the xsd prefix.Just for you to know: we can have as many xmlns:prefix declaration in our schema element as we wish, as long as they do not have the same prefix name. Any XML element can hold a XML namespace importation, meaning that the elements of the imported namespaces will only be available inside the declaring element’s body.

XSD schema example

Lets write some basic XSD schema document just to see how easy XSD is:This XSD Schema defines one Data type, which is PersonType.The PersonType has two child elements. One of string type and another of integer. Both of these elements are mandatory because they both have minOccurs equals to 1. The maxOccurs property defines the maximum number of times the element must be found in a valid XML document.Our XSD Schema also defines a root element of type PersonType. This means that our XML root element will be of type PersonType, which means that the constraints declared for the PersonType will be used to validate our XML root element.In case you are not understanding what is a root-element: it’s the top-level XML element. Every valid XML document has it. The root element for an HTML4 document is. I guess you understand it now.The name our XSD schema defines for our root element is “Person”. Lets now see, how do we markup a valid XML document conforming to our XSD schema:This is it. There is nothing special here. To make this document break our XSD constraints, you just have to replace the age tag value by a string value.

How to test an XSD without having to run into trouble

You can simply follow this link and have fun making some XSD Schemas and testing them, or, you can just open IntelijIdea, create a new project and create two different files: one with the XSD extension, another with the XML extension and start having fun. Another option is to look for another options, there plenty I’m sure. Just dig, they will pop up.

Elements and Data Types

If you pay attention in our XSD Schema, you will notice that there two declarations inside the schema element:ComplexType and element. What you need to understand is that we have declared a data type of name PersonType and we have also declared an element named Person of type PersonType. XML Documents are composed merely by elements while XSD Documents are composed by data types and elements. Elements must always define their data type. It doesn’t mean every XSD document must have data types declarations, no. You may use the data types that are already declared in the http://www.w3.org/2001/XMLSchema namespace. For example: string and integer.For now, this is all i had planed for this trip. Thank you for passing by. Its always a pleasure.If you interested in getting on your hands on the artifacts of this article (XML, DTD and XSD code), please get them from a Github repository by clicking here.

Receba nossas actualizações por email

Ver Campanhas anteriores.

(Visited 105 times, 1 visits today)
Share