The world's most popular open source database
Giuseppe Maxia is a QA developer in MySQL community team.
A system analyst with 20 years of IT experience, he has worked as a database consultant and designer for several years.
He is a frequent speaker at open source events and he's the author of many articles.
He lives in Sardinia (Italy).
Submitting patches to Open Source code doesn't come naturally to everyone. It is not easy, since it requires coding skills that the average user does not have. Moreover, code patches require a fair amount of additional documentation, without which the patch itself is virtually useless. These notes will walk you through the requirements of a good patch for MySQL server and perhaps other projects.
Let's define some of the basics: A patch is a structured, computer-generated description of how to modify the source code of a software product. The most common patches fix bugs and introduce new features, and the most common program to generate such a description is "diff".
A patch does not exist in a vacuum. It must be integrated with the rest of the application, and thus it must not break existing code, and the result must pass all regression tests. But most important of all, a patch must be reviewed and approved by senior developers of the product before it is applied to a source tree. Then it must pass specific tests, to prove that the patch works as intended and that it does not clash with the rest of the application.
From the above requirements, let's consider two specific tasks and you should ask yourself what would you do if you were in charge of each:
To examine the code and approve it, you must know the purpose of the changes. If a patch does not include some description of such a purpose and the means to achieve it, the reviewer must try to determine the principles by translating from source code to plain language. You must admit that it does not sound like an easy or pleasant task.
Then you need to test the feature. Ask yourself, "Test it against what? What does this patch suppose to do?" Imagine a patch to implement a very simple function that accepts two values and returns their sum. In order to test it, you must know at least: The type of accepted values (integer, floating point, signed, unsigned), the lower and upper boundaries, the result type, the maximum value of the result, how the function should behave when casting values from different types, how should it behave with NULLs, if it accepts only literal values, or also variables and expressions ... one could go on, and this is for a simple feature. Testing a feature without some detailed specifications is not just difficult, it could be impossible.
Now you know of some of the difficulties of those who accept patches. In order for us to accept patches, we think that it's reasonable to require:
If you have been following my explanation so far, the reasons for all these items should be clear. Even so, one can imagine some readers grumbling and muttering, "You want me to submit what?"
So let's review the reasoning. Suppose you submit only code, saying "It's a patch to implement function XYZ, which everybody agrees is going to be useful." Let's say that a senior developer assigned to review your patch agrees in principle that function XYZ is a great feature indeed. However, which attributes of the SQL standards is your function XYZ going to implement? Does it integrate function JKL, which currently implements a subset of the standards, or does it replace it altogether? How will your patch affect existing routines? Is it using existing structures or creating new ones? If you are creating new ones, is there a justifiable reason?
Given these questions, you understand that a few lines of documentation can save a bunch of time to the reviewer; reviewer time is extremely valuable, because only a minority of our coders are entitled to perform a review. If the review is going to take too much of their time, for which the company pays, your patch is not going to be evaluated at all. Right -- not delayed, but skipped in favor of perhaps less interesting but better documented patches.
The same goes for testing. The reviewer wants to see some testing to back your claim that the patch fits its purpose. Without testing, the review will be negative, and the patch is left behind, until someone rescues it by writing the missing pieces.
Then, if you think that these requirements are just a sadistic idea to make you work more, think again. What we ask for is essential to ensure the quality of our products.
We can be a little more flexible, and accept patches that have less than everything we need, but not much less. So, let's make a reasonable list of what would make a patch that busy developers would grudgingly accept and evaluate with some delay.
We reduced the high-level specs and omitted the low-level ones, not because they are always unnecessary, but because you may not need them if your patch is simple and follows the "coding guidelines":[http://forge.mysql.com/wiki/MySQL_Internals_Coding_Guidelines|MySQL]. However, make no mistake: if your patch does not include all the required items, it may be evaluated, but don't hold your breath. It won't have the highest priority.
If you need further justification for our requests, think about this:
You, as the author of the patch, are the most suitable person to fill in the required information. You have given the matter some thought, and it would be easiest for you to commit your line of thinking to a page of organized docs. Anyone else would take at least twice as long to do the same. As for the test case requirement, we would not consider a patch that has not been tested. If you tested it, it would be easy for you to pack your tests in a script and submit it together with the expected results. If you didn't do it, then ... why would we accept an untested feature?
We hope you understand that we're not trying to impose extraneous work. What we require is just a portion of what MySQL demands of our internal developers. For voluntary contributors we can't be as demanding, but we still mustn't spend too much time on single contributions.
Sometimes, when submitting a patch to fix a bug, the specifications can be reduced to the bare minimum of pointing to the wrong code and saying how to fix it. This has happened in the past, and we are flexible enough to recognize such a case. Apply some judgment, but don't assume anything. What is immediately obvious to you may not be so for somebody else. When in doubt, add full specifications.
You don't have to do everything alone. There is much you can borrow from the existing code. First of all, you can ask around, starting from the MySQL internals mailing list, to find out if somebody has already written the specification you are after. If you are planning to write a standard function, chances are that the specifications are already written down, and you need just to cut-and-paste the main part and add something of your own. If it is a feature that many people requested, there may be already a high-level description or even a specification for it, so if you volunteer to do that, MySQLers will gladly provide the missing docs. If all the above fails, remember that writing specifications is not rocket science -- look at the appendix for a clear explanation of what is expected.
Assuming that you have done everything by the rules, the only thing that's missing is that part of the documentation that goes to the user manual. Normally, for features developed in-house, the documentation team takes works from the High Level Specification to produce the final docs.
For patches coming from the community, though, it would be really great to have the documentation drafted by the feature author. Therefore, for a really perfect job, feel free to add the full documentation for your creation. If you are not comfortable with writing, you may set up a wiki page on MySQL Forge with a stub of the docs, and invite other contributors to fill in the gaps.
Definitions and explanations (from MySQL AB internal guidelines)
Thanks to Trudy Pelzer for writing it and for giving us permission to make it public.
The High-Level Description should contain a brief (fewer than 250 words) description of a feature. The purpose of this description is to give the company an understanding of the change you want to make to a product, along with the rationale for making it. Use clear, fairly non-technical language; think of this description as something that could be put in a marketing brochure or executive summary.
Although the H-L-D should be brief, don't make it too brief. Your description should be more than a mere repetition of the title -- it should state the problem you expect the feature to solve. If you want your idea to be implemented, provide the rationale for doing so: Mention any relevant bugs, "Worklog":[http://forge.mysql.com/worklog/] entries, emails (with quotation or with URL for looking up in the mail archive). If the task is in a common category -- e.g. "Full-Text" -- then mention that category.
The High Level Specification begins by restating the problem from the HLD, possibly in more technical detail, then providing a solution.
The H-L-S is a precise requirement specification that describes how the feature will work; it describes the implementation requirements on an abstract level, without binding them to specific code. Thus, it should contain a fairly technical description of the feature that the specification proposes to implement. The purpose of this description is three-fold:
For example, the specification may include:
Some of the above -- especially the implementation suggestion -- will not be applicable to every feature; check with the internals mailing list if you think this is the case. But every specification should provide as much information as possible as part of the specification. Think of the information you would need to code the feature without having to do any more research on how the feature should work; if the entry provides all of the required information, then the specification is complete.
The Low-Level Design portion of a specification is a technical, low-level description of how the feature will be implemented; it maps the objects of the HLS into particular classes, routines and methods, disclosing the code you will change and how you will change it. Thus, while the design may contain only a minimal description, it should include technical details such as:
The purpose of this section is to provide the Implementer, Architecture Reviewer, and the Code Reviewers sufficient information to approve the design and to write/review the code. The intention is that the developer assigned to implement the feature will write the LLD. Think of it as a memory aid for coding, as well as a description that allows others to (a) decide whether there are major flaws in your design and (b) quickly find and review all areas your code has impacted.
Read and post comments on this article in the MySQL Forums. There are currently 2 comments.