Published 2005-06-10 12:58:22

I love good documentation, but do not love writing or maintainting it.. I'm not talking about the nice docbook manuals in PEAR or PHP, but good old reference manuals for C API's.

As anyone who has worked with gtk or gnome will tell you, the semi-automated API docs that are generated for things like gtk and gnome-db, are a godsend. They frequently make the difference between taking 3 hours work to code something up, and a few days. This was one of the reasons why writing DBDO was not too complex.

However, the downside is that DBDO interacts with two API's, gnome-db and PHP. On one side, there is a detailed API documents, along with a highly structured design (gobject). On the other, is a organically grown API, which has evolved, relatively undocumented (except a few articles and the extension guide in the PHP manual).

As DBDO has reached a point where it implements the basic functionality, it has become quite clear that building it (or more specifically the libraries that it depends on) is extremely complex. And while it is increadibly featurefull and easy to code against, it's adoption is always going to be affected by this barrier to entry. (including my enthusiasum to set it up on my clients boxes. So as a parallel effort I've been exploring using PDO as the backend for DBDO.

The Itch
My work on these projects is usually restricted to a couple of evenings a week. As It's not exactly fee paying, and I'm often busy fighting bugs or other workload during the day. It had become quite clear in doing this,  that the complexity of both the PHP and PDO API's, along with lack of documentation was leading to a situation where I was spending at least half of my time looking through lxr.php.net working out which bit of the API I needed to use to do each task.

So in a fit of fustration, I started looking at both documenting, and simplifying the API's (initially of PDO, and then wandering off to consider PHP).

Documentation generation
Looking at the first of these two problems, Documentation, It's clear that the result of gnome/gtk's way of documenting API's is very efficient. It's quite easy to look at any gtk project, and understand the underlying ideas and locate methods that are likely to be the best match, just by browsing through the documents. (although images of the widget would frequently be nice..)

Taking gnome-db as an example, it uses 'gtk-doc' to parse the .h files (using perl), reading a few tags, then merges this with docbook templates, with placeholders for the API details (like synopsis etc.) and then uses a docbook tool to actually render this to HTML (or other formats as required.)

While this works really well, it adds one thing that I started off by saying, the need to actually 'love and care' for the generation of API docs. While It's a great idea, the reality is, that most of the people capable of documenting the internals of PDO or PHP, would much rather be doing far more interesting things....., especially if they are not getting paid for it.

It's also pretty clear that a majority of users really only need the HTML output these days. While the other formats are nice, C API documents are not exactly masterpieces which are flying off the shelf of your local bookshop. So the value of using docbook is questionable in relation to the time and effort required to deliver a solution like this.

Simplicity of API doc parsers
The gtk-doc toolkit, starts off as a very simple set of perl scripts to parse a .h file, however, like javadoc and phpdocumentor, it soon devolves into the problem that parsing structure information from @tags can be both complex and cumbersome. For something as industrial as documenting C API's, I began to wonder if ruling out the majority of this complexity was perhaps a good idea. A let's get down to basics approach seems like it would be more suited to the situation.

To this end, I tried out having @blocks that only had a key and a value.The value was just text, and should never be intended to be processed in any depth. I came up with a simple comment block to be prefixed to a definition of a function/struct/enum.
/**
* @function the_name_of_the_function
* this is a function.....
*/

/**
* @enum the_name_of_the_enum
* this is a macro.....
*/

/**
* @struct the_name_of_the_Struct
* some comments..
*/

These would only be applied to a piece of code that was in need of documentation. Hence installing the idea of minimal impact, high return.

It did not take too much to use a line by line parser to find these blocks, and store the data, along with the following definition, ready to be rendered later.

With the addition of a few more very simple tags, that help to structure the flow of the resulting output I was able to generate some simple documentation.
  • @page The title of the page or resulting page for the document... (followed by free text comments)
  • @class ClassName (an abstract name to group sets of functions together)
  • @include filename.h (to include another .h file to build up more complex documents)
Along with this allowing some HTML tags withing the body of the comment enables a bit of formating, to make things a little clearer.

Documenting API's can show their less elegant side
It did not take me long to realize when documenting PDO's API, that although it's design is pretty sensible, It was not really designed with exposing a public API in mind. The drivers usually provide a stuct with function pointers, which while being the classic way of doing this, is rather complex to document and illustrate in API docs.

To solve this, I ended up creating a functional wrapper around alot of these pointer method calls, following the general pattern of gobject type classes.

The resulting document can be seen here (while the link still works..)

At present, the current concept still has a few problems which either are solved, or will be solved.
  • requires the reformating of the defintion so all the spaces line up on the rendered output. (I think this should be solveable with some simple parsing, and padding of the definition)
  • requires the correct ordering of elements (eg. stucts before methods that use them.). Again, @include and @class should solve alot of this. along with doing some auto re-ordering (eg. structs/enums before functions)
  • requires @class to be in a seperate comment block and used before an @function comment - again, this should be a simple fix to allow you to specify which @class a @function belongs to.
  • requires you actually define real functions for things like #define'd functions - usually with #if 0 wrapped around them. An @def tag should solve this, allowing you to comment the synopsis for a macro.
While PDO is at a quite early stage, and Wez is quite interested in the research, it looks like PDO may get a nice internal API, so you can your own php extensions using databases easily. I wonder how complex it would be to introduce this as a comment standard for the rest of PHP, as I already started playing with here..


Mentioned By:
google.com : PHP C api (225 referals)
google.com : april (65 referals)
google.com : december (52 referals)
www.phpdeveloper.org : PHPDeveloper.org: PHP News, Views, and Community (46 referals)
www.phpn.org : C API documentation (41 referals)
planet-php.org : Planet PHP (23 referals)
google.com : Documenting API (23 referals)
google.com : documenting an api (16 referals)
www.artima.com : PHP Buzz Forum - C API documentation (15 referals)
google.com : c api (12 referals)
google.com : documenting APIs (11 referals)
google.com : php c wrapper (11 referals)
www.phpdeveloper.org : PHPDeveloper.org: PHP News, Views, and Community... (10 referals)
google.com : php api C (10 referals)
google.com : C API PHP (9 referals)
google.com : php C-API (9 referals)
powerbook.blogger.de : powerbook _ blog (7 referals)
google.com : API like PDO (7 referals)
www.midgard-project.org : Midgard Project - Midgard - Start at the top and reach higher (6 referals)
google.com : C PHP api (6 referals)

Comments

Doxygen?
Why not Doxygen?
#0 - R. Rajesh Jeba Anbiah ( Link) on 2005-06-10 19:07:59 Delete Comment
Doxgen
Why not doygen
a) low impact
- it would involve asking too many develops who are already working a certain way to change.. (the idea with this is that the changes would not be noticable..)
b) low overhead
Theoretically a php line/by/line parser would involve a lower threashold to setup and maintain..
#1 - Alan Knowles ( Link) on 2005-06-10 21:33:35 Delete Comment
Very cool!
Great! This stuff has been in need of some good documentation. Glad to see someone is working on it, it will be very useful. Seems like a great solution.
#2 - Matthew Fonda ( Link) on 2005-06-11 03:22:53 Delete Comment

Add Your Comment

Follow us on