XML Entities by example#5634
Conversation
|
The build failures is expetect. It will be fixed by doc-base PR 307. I opened this as early as now to allow more time for comments. |
|
We should also update docbookcs.xml or these entities are not loaded. Can you add those, or are you more comfortable that I will do them? |
|
I will update |
|
And |
|
The |
|
It's a bit of a whacky setup though because of the doc-base dependency so I understand the confusion.
|
|
I'm reading these like this: <entities>
<!-- These are files from doc-en... -->
<file>contributors.ent</file>
<file>extensions.ent</file>
<!-- ... so anything outside doc-en need a relative path -->
<directory>../doc-base/entities/</directory>
<file>../doc-base/temp/file-entities.ent</file>This is correct? As in, the current directory is the path of the first /project/directory? Also, if a <entities>
<directory>.</directory>
<directory>../doc-base/</directory>
</entities>do not the same as listing all files in each directory? |
|
It does do the same thing. It just scans more files. But since we will have .ent files in the references folder (per ext) it's fine to me for doc-base. Though I prefer that the doc-base paths stays more specific as they were wdyt? |
Specific paths for doc-base are a very nice thing, that was only an example.
Now this got me worried. There will be new .ent files on entities and reference folders. But these are neither Docbook XML files, nor concatenated DTD entities. These are normal XML files, yes, but with a lot of Docbook fragments. These new .ent files can be detected because they have a <?xml header. But many new .xml files on entities folder are at most well balanced texts, without a <?xml header. That is, they are for the most part invalid Docbook XML, and many are in fact not even valid XML. How will docbookcs react for them? For now, having only specific paths for DTD entities will suffice, but the new entities folder may need some form of exclusion. Something like a Thinking ahead, if docbookcs is a local project, I have some bold suggestions:
In fact.. I'm very curious about how the sniffer is loading the actual XML files with undefined entities, as I cannot make any bundled PHP XML parser to read those. I will try study how docbookcs does that. But it docbookcs is not a local project... I think I will need to change XML Entities to have a diferent extension for XML fragments on entities/ folder. |
|
I think we would have to do some testing outside of this PR. docbook-cs is a project we have in control. Perhaps we can do something like: <entities>
<directory type="dtd" />
<directory type="xml" />
</entities> |
That would be very nice, and will map directly on what manual sources already do, by for now buried inside the DTD entities, and XML entities will make visible as individual fragment XML files. About these fragment files on entities/, I could specify they are .xml files but without any XML declarations, so all files on entities/ (.ent and .xml) would be special in some way, if this could make things easier on your side. Two side things:
|
This is a tour de force for XML Entities project, showing how it works, by examples. The final objective is to remove all DTD entities in favor of "XML Entities", implemented in tha last year.
Entity deletion
The first example is the remotion of an unused entity on
doc-en. In particular, the filedoc-en/contributors.enthas two empty DTD entities from a long time. But simply removing these entities will cause break the build of the manual, as these two entities are referenced in a file in another repository (in case,doc-base/manual.xml). These cases of interdependence between repositories are common, mose so betweendoc-baseand translations, and this makes the evolution ofdoc-enmanual harder than need be.So, to remove these unused entities from
doc-enwithout breaking all translations, these removed entities are "moved" to adoc-en/entities/entities-remove.ent, where they keep being declared and usable, but they are explicit marked as deleted, on all translations. This way,doc-endoes not need to either maintain unused entities, nor all translations keep translating these unused entities.Transformation of DTD Entity into a XML Entity
As an example of the main objective of the XML Entities project, I moved two entities from
doc-en/extensions.entto the new format.First, the text of
extcat.introentity as transformed from:to this:
Mind the clearer syntax. In particular, mind the removal of the two namespace declarations, that yes, are obligatory on DTD format in some obscure cases, but are optional and unnecessary om XML format. These new files are fully valid XML, down to namespace declarations, as they can hold textual entities, well balanced texts and also multirooted "XML" fragments. So not only are they easier to work on normal XML code, they also are easier to edit, as any XML error will be detected by IDEs, something that does not happen in DTD entities.
The
extcat.alphabeticalentity demonstrates the capacity for text only entities, and is duplicated in the example to demonstrate another aspect of XML Entities project, that is, more detailed reporting of entity collision and under and over translated entities.Detailed reporting
After the XML entities project is fully merged (see php/doc-base#307), the merge of this PR will start generating a new line on
doc-base/configure.phpruns. Something like this:Running
text-entities.phpdirectly, with one language, will then report:And running
text-entities.phpdirectly with two languages, will then report:So it will be possible to detect, from one place, all duplicated entities and all duplicated translations, and all redefitions of entities that should not occur.
Big entities as individual files
Not only
.entfiles are expected on the newdoc-lang/entities/path. Anyname.xmlfile placed here will be compiled as an individual entity, where the name of the entity maps to file name (minus the.xml), and the contents of the entity is mapped to the contents of the file.No more gigantic and hard to edit entities on
language-snippets.ent. These can be moved here.Per extension entities
Finally, another feature of XML Entities is the ability to have per extension
.entfiles, inside eachdoc-lang/reference/dir, solanguage-snippets.entcan also be splitted. These per extension entity files are processed in one place, and will be reported in consolidated form, as above.The plan
If there are no hard objections, wait for the merging of doc-base PR 307, then merge this.
Then, start new PRs converting all
.entfiles ofdoc-en(minus thelanguage-snippets.ent). Whilelanguage-snippets.entis manually edited into per extension entity files, it will shrink until it is deleted, and finally, all manually edited DTD files can be erased from the manual.And... that's it.
Comments and reviews wellcome. Plan to leave this open at least two weeks before merging.