Luca's meaningless thoughts  

Berlin  

by Leandro Lucarella on 2011- 06- 16 13:19 (updated on 2011- 06- 16 13:19)
tagged berlin, cdgc, d, dgc, en, self, sociomantic labs, work - with 7 comment(s)

http://llucax.com.ar/blog/posts/2011/06/16-berlin.png

I'm moving to Berlin on October to work at Sociomantic Labs. They use D + Tango with the GC I did as my thesis so I probably get the chance to improve the GC further!

Jetz, soll ich Deutsch lernen!

CDGC merged into Tango  

by Leandro Lucarella on 2011- 01- 28 19:49 (updated on 2011- 01- 28 19:49)
tagged cdgc, d, dgc, en, gc, merge, patch, tango - with 1 comment(s)

Yai! Finally my CDGC patches has been applied to Tango [1] [2] [3]. CDGC will not be the default Tango GC for now, because it needs some real testing first (and fixing a race when using weak references). So, please, please, do try it, is as simple as compiling from the sources adding a new option to bob: -g=cdgc and then manually installing Tango.

Please, don't forget to report any bugs or problems.

Thanks!

Engineer  

by Leandro Lucarella on 2010- 12- 08 18:33 (updated on 2010- 12- 08 18:33)
tagged cdgc, d, dgc, en, engineer, gc, self - with 5 comment(s)

Finally, I defended my thesis last Monday and now I'm officially (well, not really, the diploma takes about a year to be emitted) an Ingeniero en Informática (something like a Informatics Engineer). I hope I can get some free time now to polish the rough edges of the collector (fix the weakrefs for example) so it can be finally merged into Tango.

Día D  

by Leandro Lucarella on 2010- 11- 30 19:31 (updated on 2010- 11- 30 19:31)
tagged cdgc, dgc, es, self, tesis - with 3 comment(s)

Finalmente, luego de más de una década de carrera (no voy a decir estudio porque de estudio fue bastante menos :), me recibo. Defiendo la tesis el lunes 6 a las 19:30 horas en el aula 402 de FIUBA y dejo el planito solo porque a alguna gente le resultó gracioso (por lo que me siento muy insultado).

Por si algún extra-terrestre está interesado en el tema, les dejo el abstract:

El manejo de memoria es un problema recurrente en los lenguajes de programación; dada su complejidad es muy propenso a errores y las consecuencias de dichos errores pueden ser muy graves. La recolección de basura es el área de investigación que trata las técnicas de manejo automático de memoria. D es un lenguaje de programación compilado, con tipado estático y multi-paradigma que combina el poder de lenguajes de programación de bajo nivel, como C, con la facilidad de los de alto nivel, como Python o Java. D provee recolección de basura pero ha sido frecuentemente criticada por sus varias falencias. Dadas las particularidades del lenguaje, plantea un desafío casi único en cuanto al diseño de un recolector.

Este trabajo hace un recorrido por el estado del arte en recolección de basura teniendo en cuenta los requerimientos de D; analiza la implementación del recolector actual y propone mejoras con el objetivo principal de minimizar los tiempos de pausa. Finalmente se construye un banco de pruebas para verificar los resultados, que muestran una disminución de hasta 200 veces en el tiempo de pausa del recolector y de hasta 3 veces en el tiempo total de ejecución.

CDGC experimental branch in Druntime  

by Leandro Lucarella on 2010- 11- 09 15:51 (updated on 2010- 11- 09 15:51)
tagged cdgc, d, dgc, druntime, en, sean kelly - with 0 comment(s)

Sean Kelly just created a new experimental branch in Druntime with CDGC as the GC for D2. The new branch is completely untested though, so only people wanting to help testing should try it out (which will be very appreciated).

CDGC Tango integration  

by Leandro Lucarella on 2010- 10- 20 23:34 (updated on 2010- 10- 20 23:34)
tagged cdgc, d, dgc, en, integration, patch, tango - with 0 comment(s)

I've posted a small set of patches to integrate CDGC into Tango. If you want to try them out, just follow the simple 3 steps described in the ticket.

Please, let me know how it went if you do.

Trying CDGC HOWTO  

by Leandro Lucarella on 2010- 10- 10 16:28 (updated on 2010- 10- 10 16:28)
tagged cdgc, d, dgc, dmd, en, gc, howto, makefile, patch, tango - with 0 comment(s)

Here are some details on how to try CDGC, as it needs a very particular setup, specially due to DMD not having precise heap scanning integrated yet.

Here are the steps (in some kind of literate scripting, you can copy&paste to a console ;)

# You probably want to do all this mess in some subdirectory :)
mkdir cdgc-test
cd cdgc-test

# First, checkout the repositories.
git clone git://git.llucax.com.ar/software/dgc/cdgc.git
# If you have problems with git:// URLs, try HTTP:
# git clone http://git.llucax.com.ar/r/software/dgc/cdgc.git
svn co http://svn.dsource.org/projects/tango/tags/releases/0.99.9 tango

# DMD doesn't care much (as usual) about tags, so you have to use -r to
# checkout the 1.063 revision (you might be good with the latest revision
# too).
svn co -r613 http://svn.dsource.org/projects/dmd/branches/dmd-1.x dmd

# Now we have to do some patching, let's start with Tango (only patch 3 is
# *really* necessary, but the others won't hurt).
cd tango
for p in 0001-Fixes-to-be-able-to-parse-the-code-with-Dil.patch \
         0002-Use-the-mutexattr-when-initializing-the-mutex.patch \
         0003-Add-precise-heap-scanning-support.patch \
         0004-Use-the-right-attributes-when-appending-to-an-empty-.patch
do
   wget -O- "http://llucax.com.ar/blog/posts/2010/10/10-trying-cdgc-howto/$p" |
         patch -p1
done
cd ..

# Now let's go to DMD
cd dmd
p=0001-Create-pointer-map-bitmask-to-allow-precise-heap-sca.patch
wget -O- "http://llucax.com.ar/blog/posts/2010/10/10-trying-cdgc-howto/$p" |
      patch -p1

# Since we are in the DMD repo, let's compile it (you may want to add -jN if
# you have N CPUs to speed up things a little).
make -C src -f linux.mak
cd ..

# Good, now we have to wire Tango and CDGC together, just create a symbolic
# link:
cd tango
ln -s ../../../../../cdgc/rt/gc/cdgc tango/core/rt/gc/

# Since I don't know very well the Tango build system, I did a Makefile of my
# own to compile it, so just grab it and compile Tango with it. It will use
# the DMD you just compiled and will compile CDGC by default (you can change
# it via the GC Make variable, for example: make GC=basic to compile Tango
# with the basic GC). The library will be written to obj/libtango-$GC.a, so
# you can have both CDGB and the basic collector easily at hand):
wget http://llucax.com.ar/blog/posts/2010/10/10-trying-cdgc-howto/Makefile
make # Again add -jN if you have N CPUs to make a little faster

# Now all you need now is a decent dmd.conf to put it all together:
cd ..
echo "[Environment]" > dmd/src/dmd.conf
echo -n "DFLAGS=-I$PWD/tango -L-L$PWD/tango/obj " >> dmd/src/dmd.conf
echo -n "-defaultlib=tango-cdgc " >> dmd/src/dmd.conf
echo "-debuglib=tango-cdgc -version=Tango" >> dmd/src/dmd.conf

# Finally, try a Hello World:
cat <<EOT > hello.d
import  tango.io.Console;

void main()
{
   Cout("Hello, World").newline;
}
EOT
dmd/src/dmd -run hello.d

# If you don't trust me and you want to be completely sure you have CDGC
# running, try the collect_stats_file option to generate a log of the
# collections:
D_GC_OPTS=collect_stats_file=log dmd/src/dmd -run hello.d
cat log

Done!

If you want to make this DMD the default, just add dmd/src to the PATH environment variable or do a proper installation ;)

Let me know if you hit any problem...

CDGC done  

by Leandro Lucarella on 2010- 09- 28 12:16 (updated on 2010- 09- 28 12:16)
tagged cdgc, d, dgc, done, en, gc - with 0 comment(s)

I'm sorry about the quick and uninformative post, but I've been almost 2 weeks without Internet and I have to finish the first complete draft of my thesis in a little more than a week, so I don't have much time to write here.

The thing is, to avoid the nasty effect of memory usage being too high for certain programs when using eager allocation, I've made the GC minimize the heap more often. Even when some test are still a little slower with CDGC, but that's only for tests that only stress the GC without doing any actual work, so I think it's OK, in that cases the extra overhead of being concurrent is bigger than the gain (which is inexistent, because there is nothing to do in parallel with the collector).

Finally, I've implemented early collection, which didn't proved very useful, and tried to keep a better occupancy factor of the heap with the new min_free option, without much success either (it looks like the real winner was eager allocation).

I'm sorry I don't have time to show you some graphs this time. Of course the work is not really finished, there are plenty of things to be done still, but I think the GC have come to a point where it can be really useful, and I have to finish my thesis :)

After I'm done, I hope I can work on integrating the GC in Tango and/or Druntime (where there is already a first approach done by Sean Kelly).

Truly concurrent GC using eager allocation  

by Leandro Lucarella on 2010- 09- 10 00:01 (updated on 2010- 09- 10 00:01)
tagged cdgc, concurrent, d, dgc, eager allocation, en, fork - with 3 comment(s)

Finally, I got the first version of CDGC with truly concurrent garbage collection, in the sense that all the threads of the mutator (the program itself) can run in parallel with the collector (well, only the mark phase to be honest :).

You might want to read a previous post about CDGC where I achieved some sort of concurrency by making only the stop-the-world time very short, but the thread that triggered the collection (and any other thread needing any GC service) had to wait until the collection finishes. The thread that triggered the collection needed to wait for the collection to finish to fulfill the memory allocation request (it was triggered because the memory was exhausted), while any other thread needing any GC service needed to acquire the global GC lock (damn global GC lock!).

To avoid this issue, I took a simple approach that I call eager allocation, consisting on spawn the mark phase concurrently but allocating a new memory pool to be able to fulfill the memory request instantly. Doing so, not only the thread that triggered the collection can keep going without waiting the collection to finish, the global GC lock is released and any other thread can use any GC service, and even allocate more memory, since a new pool was allocated.

If the memory is exhausted again before the collection finishes, a new pool is allocated, so everything can keep running. The obvious (bad) consequence of this is potential memory bloat. Since the memory usage is minimized from time to time, this effect should not be too harmful though, but let's see the results, there are plenty of things to analyze from them (a lot not even related to concurrency).

First, a couple of comments about the plots:

  • Times of Dil are multiplied by a factor of 0.1 in all the plots, times of rnddata are too, but only in the pause time and stop-the-world plots. This is only to make the plots more readable.
  • The unreadable labels rotated 45 degrees say: stw, fork and ea. Those stand for Stop-the-world (the basic collector), fork only (concurrent but without eager allocation) and eager allocation respectively. You can click on the images to see a little more readable SVG version.
  • The plots are for one CPU-only because using more CPUs doesn't change much (for these plots).
  • The times were taken from a single run, unlike the total run time plots I usually post. Since a single run have multiple collections, the information about min, max, average and standard deviation still applies for the single run.
  • Stop-the-world time is the time no mutator thread can run. This is not related to the global GC lock, is time the threads are really really paused (this is even necessary for the forking GC to take a snapshot of threads CPU registers and stacks). So, the time no mutator thread can do any useful work might be much bigger than this time, because the GC lock. This time is what I call Pause time. The maximum pause time is probably the most important variable for a GC that tries to minimize pauses, like this one. Is the maximum time a program will stay totally unresponsive (important for a server, a GUI application, a game or any interactive application).
Stop-the-world time for 1 CPU

The stop-the-world time is reduced so much that you can hardly see the times of the fork and ea configuration. It's reduced in all tests by a big margin, except for mcore and the bigarr. For the former it was even increased a little, for the later it was reduced but very little (but only for the ea* configuration, so it might be a bad measure). This is really measuring the Linux fork() time. When the program manages so little data that the mark phase itself is so fast that's faster than a fork(), this is what happens. The good news is, the pause times are small enough for those cases, so no harm is done (except from adding a little more total run time to the program).

Note the Dil maximum stop-the-world time, it's 0.2 seconds, looks pretty big, uh? Well, now remember that this time was multiplied by 0.1, the real maximum stop-the-world for Dil is 2 seconds, and remember this is the minimum amount of time the program is unresponsive! Thank god it's not an interactive application :)

Time to take a look to the real pause time:

Pause time for 1 CPU

OK, this is a little more confusing... The only strong pattern is that pause time is not changed (much) between the swt and fork configurations. This seems to make sense, as both configurations must wait for the whole collection to finish (I really don't know what's happening with the bh test).

For most tests (7), the pause time is much smaller for the ea configuration, 3 tests have much bigger times for it, one is bigger but similar (again mcore) and then is the weird case of bh. The 7 tests where the time is reduced are the ones that seems to make sense, that's what I was looking for, so let's see what's happening with the remaining 3, and for that, let's take a look at the amount of memory the program is using, to see if the memory bloat of allocating extra pools is significant.

Test Maximum heap size (MB)
Program stw ea ea/stw
dil 216 250 1.16
rnddata 181 181 1
voronoi 16 30 1.88
tree 7 114 16.3
bh 80 80 1
mcore 30 38 1.27
bisort 30 30 1
bigarr 11 223 20.3
em3d 63 63 1
sbtree 11 122 11.1
tsp 63 63 1
split 39 39 1

See any relations between the plot and the table? I do. It looks like some programs are not being able to minimize the memory usage, and because of that, the sweep phase (which still have to run in a mutator thread, taking the global GC lock) is taking ages. An easy to try approach is to trigger the minimization of the memory usage not only at when big objects are allocated (like it is now), but that could lead to more mmap()/munmap()s than necessary. And there still problems with pools that are kept alive because a very small object is still alive, which is not solved by this.

So I think a more long term solution would be to introduce what I call early collection too. Meaning, trigger a collection before the memory is exhausted. That would be the next step in the CDGC.

Finally, let's take a look at the total run time of the test programs using the basic GC and CDGC with concurrent marking and eager allocation. This time, let's see what happens with 2 CPUs (and 25 runs):

Total run time for 2 CPUs (25 runs)

Wow! It looks like this is getting really juicy (with exceptions, as usual :)! Dil time is reduced to about 1/3, voronoi is reduced to 1/10!!! Split and mcore have both their time considerably reduced, but that's because another small optimization (unrelated to what we are seeing today), so forget about those two. Same for rnddata, which is reduced because of precise heap scanning. But other tests increased its runtime, most notably bigarr takes almost double the time. Looking at the maximum heap size table, one can find some answers for this too. Another ugly side of early allocation.

For completeness, let's see what happens with the number of collections triggered during the program's life. Here is the previous table with this new data added:

Test Maximum heap size (MB) Number of collections
Program stw ea ea/stw stw ea ea/stw
dil 216 250 1.16 62 50 0.81
rnddata 181 181 1 28 28 1
voronoi 16 30 1.88 79 14 0.18
tree 7 114 16.3 204 32 0.16
bh 80 80 1 27 27 1
mcore 30 38 1.27 18 14 0.78
bisort 30 30 1 10 10 1
bigarr 11 223 20.3 305 40 0.13
em3d 63 63 1 14 14 1
sbtree 11 122 11.1 110 33 0.3
tsp 63 63 1 14 14 1
split 39 39 1 7 7 1

See how the number of collections is practically reduced proportionally to the increase of the heap size. When the increase in size explodes, even when the number of collections is greatly reduced, the sweep time take over and the total run time is increased. Specially in those tests where the program is almost only using the GC (as in sbtree and bigarr). That's why I like the most Dil and voronoi as key tests, they do quite a lot of real work beside asking for memory or using other GC services.

This confirms that the performance gain is not strictly related to the added concurrency, but because of a nice (finally! :) side-effect of eager allocation: removing some pressure from the GC by increasing the heap size a little (Dil gets 3x boost in run time for as little as 1.16x of memory usage; voronoi gets 10x at the expense of almost doubling the heap, I think both are good trade-offs). This shows another weak point of the GC, sometimes the HEAP is way too tight, triggering a lot of collections, which leads to a lot of GC run time overhead. Nothing is done right now to keep a good heap occupancy ratio.

But is there any real speed (in total run time terms) improvement because of the added concurrency? Let's see the run time for 1 CPU:

Total run time for 1 CPU (25 runs)

It looks like there is, specially for my two favourite tests: both Dil and voronoi get a 30% speed boost! That's not bad, not bad at all...

If you want to try it, the repository has been updated with this last changes :). If you do, please let me know how it went.

Recursive vs. iterative marking  

by Leandro Lucarella on 2010- 08- 29 21:54 (updated on 2010- 08- 29 21:54)
tagged benchmark, cdgc, d, dgc, en, gc, iterative, mark, performance, recursive - with 0 comment(s)

After a small (but important) step towards making the D GC truly concurrent (which is my main goal), I've been exploring the possibility of making the mark phase recursive instead of iterative (as it currently is).

The motivation is that the iterative algorithm makes several passes through the entire heap (it doesn't need to do the full job on each pass, it processes only the newly reachable nodes found in the previous iteration, but to look for that new reachable node it does have to iterate over the entire heap). The number of passes is the same as the connectivity graph depth, the best case is where all the heap is reachable through the root set, and the worse is when the heap is a single linked list. The recursive algorithm, on the other hand, needs only a single pass but, of course, it has the problem of potentially consuming a lot of stack space (again, the recurse depth is the same as the connectivity graph depth), so it's not paradise either.

To see how much of a problem is the recurse depth in reality, first I've implemented a fully recursive algorithm, and I found it is a real problem, since I had segmentation faults because the (8MiB by default in Linux) stack overflows. So I've implemented an hybrid approach, setting a (configurable) maximum recurse depth for the marking phase. If the maximum depth is reached, the recursion is stopped and nodes that should be scanned deeply than that are queued to scanned in the next iteration.

Here are some results showing how the total run time is affected by the maximum recursion depth:

http://llucax.com.ar/blog/posts/2010/08/29-recursive-dil.png http://llucax.com.ar/blog/posts/2010/08/29-recursive-voronoi.png

The red dot is how the pure iterative algorithm currently performs (it's placed arbitrarily in the plot, as the X-axis doesn't make sense for it).

The results are not very conclusive. Even when the hybrid approach performs better for both Dil and Voronoi when the maximum depth is bigger than 75, the better depth is program specific. Both have its worse case when depth is 0, which makes sense, because is paying the extra complexity of the hybrid algorithm with using its power. As soon as we leave the 0 depth, a big drop is seen, for Voronoi big enough to outperform the purely iterative algorithm, but not for Dil, which matches it near 60 and clearly outperforms it at 100.

As usual, Voronoi challenges all logic, as the best depth is 31 (it was a consistent result among several runs). Between 20 and 50 there is not much variation (except for the magic number 31) but when going beyond that, it worsen slowly but constantly as the depth is increased.

Note that the plots might make the performance improvement look a little bigger than it really is. The best case scenario the gain is 7.5% for Voronoi and 3% for Dil (which is probably better measure for the real world). If I had to choose a default, I'll probably go with 100 because is where both get a performance gain and is still a small enough number to ensure no segmentation faults due to stack exhaustion is caused (only) by the recursiveness of the mark phase (I guess a value of 1000 would be reasonable too, but I'm a little scared of causing inexplicable, magical, mystery segfaults to users). Anyway, for a value of 100, the performance gain is about 1% and 3.5% for Dil and Voronoi respectively.

So I'm not really sure if I should merge this change or not. In the best case scenarios (which requires a work from the user to search for the better depth for its program), the performance gain is not exactly huge and for a reasonable default value is so little that I'm not convinced the extra complexity of the change (because it makes the marking algorithm a little more complex) worth it.

Feel free to leave your opinion (I would even appreciate it if you do :).

CDGC first breath  

by Leandro Lucarella on 2010- 08- 22 23:03 (updated on 2010- 08- 22 23:03)
tagged cdgc, concurrent, d, dgc, en, fork, gc, pause time, stop-the-world - with 7 comment(s)

I'm glad to announce that now, for the first time, CDGC means Concurrent D Garbage Collector, as I have my first (extremely raw and unoptimized) version of the concurrent GC running. And I have to say, I'm very excited and happy with the results from the very small benchmark I did.

The stop-the-world (pause) time was reduced by 2 orders of magnitude for the average, the standard deviation and, probably the more important, the maximum (these are the results for a single run, measuring the pause time for all the collections in that run). This is good news for people needing (soft) real-time in D, even when using the GC. Where the standard D GC have a pause time of 100ms, the CDGC have a pause time of 1ms.

The total run-time of the program was increased a little though, but not as much as the pause time was reduced. Only a 12% performance loss was measured, but this is just the first raw unoptimized version of the CDGC.

All this was measured with the voronoi benchmark, with -n 30000. Here are some plots:

http://llucax.com.ar/blog/posts/2010/08/22-cdgc-stw-pause-time.png http://llucax.com.ar/blog/posts/2010/08/22-cdgc-stw-run-time.png

Please note that the GC still has a global lock, so if 2 threads needs to allocate while the collection is running, both will be paused anyways (I have a couple of ideas on how to try to avoid that).

The idea about how to make the GC concurrent is based on the paper Nonintrusive Cloning Garbage Collector with Stock Operating System Support. I'm particularly excited by the results because the reduction of the pause time in the original paper were less than 1 order of magnitude better than their stop-the-world collector, so the preliminary results of the CDGC are much better than I expected.

TypeInfo, static data and the GC  

by Leandro Lucarella on 2010- 08- 15 21:39 (updated on 2010- 08- 15 21:39)
tagged cdgc, conservative, d, dgc, en, gc, precise, static data, typeinfo - with 0 comment(s)

The D compiler doesn't provide any information on the static data that the GC must scan, so the runtime/GC have to use OS-dependant tricks to get that information.

Right now, in Linux, the GC gets the static data to scan from the libc's variables __data_start and _end, from which are not much information floating around except for some e-mail from Hans Boehm to the binutils mainling list.

There is a lot of stuff in the static data that doesn't need to be scanned, most notably the TypeInfo, which is a great portion of the static data. C libraries static data, for example, would be scanned too, when it makes no sense to do so.

I noticed CDGC has more than double the static data the basic GC has, just because of TypeInfo (I use about 5 or so more types, one of them is a template, which makes the bloat bigger).

The voronoi test goes from 21KB to 26KB of static data when using CDGC.

It would be nice if the compiler could group all the static that must really be scanned (programs static variables) together and make its limits available to the GC. It would be even nicer to leave static variables that have no pointers out of that group, and even much more nicer to create a pointer map like the one in the patch for precise scanning to allow precise heap scanning. Then only the scan should be scanned in full conservative mode.

I reported a bug with this issue so it doesn't get lost.

Memory allocation patterns  

by Leandro Lucarella on 2010- 08- 14 03:28 (updated on 2010- 08- 14 13:09)
tagged allocation, benchmark, cdgc, d, dgc, dil, en, gc, memory, pattern - with 0 comment(s)

--> -->
 
 
<class 'docutils.utils.SystemMessage'>
Python 2.7.3:
Thu May 23 21:55:29 2013

A problem occurred in a Python script. Here is the sequence of function calls leading up to the error, in the order they occurred.

 /srv/llucax.com.ar/www/blog/blog.cgi in ()
   1468 if os.environ.has_key('GATEWAY_INTERFACE'):
   1469         i = datetime.datetime.now()
=> 1470         handle_cgi()
   1471         f = datetime.datetime.now()
   1472         print '<!-- render time: %s -->' % (f-i)
handle_cgi = <function handle_cgi>
 /srv/llucax.com.ar/www/blog/blog.cgi in handle_cgi()
   1390                 if not year and not month and not day and not tags:
   1391                         articles = articles[:index_articles]
=> 1392                 render_html(articles, db, year)
   1393 
   1394 
global render_html = <function render_html>, articles = [<__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, ...], db = <__main__.ArticleDB object>, year = 0
 /srv/llucax.com.ar/www/blog/blog.cgi in render_html(articles=[<__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, <__main__.Article object>, ...], db=<__main__.ArticleDB object>, actyear=0, show_comments=False, redirect=None, form_data=None)
   1156         for a in articles:
   1157                 print template.get_article_header(a)
=> 1158                 print a.to_html()
   1159                 print template.get_article_footer(a)
   1160                 if show_comments:
a = <__main__.Article object>, a.to_html = <bound method Article.to_html of <__main__.Article object>>
 /srv/llucax.com.ar/www/blog/blog.cgi in to_html(self=<__main__.Article object>)
    973                         </object>
    974                         </div>''', rst)
=>  975                 return rst_to_html(rst)
    976 
    977         def to_vars(self):
global rst_to_html = <function decorate>, rst = '.. note::\n\n Tango__ 0.99.9 has a bug__ in its ...und to experiment.\n\n\n.. vim: set et sw=3 sts=3 :\n'
 /srv/llucax.com.ar/www/blog/blog.cgi in decorate(*args=('.. note::\n\n Tango__ 0.99.9 has a bug__ in its ...und to experiment.\n\n\n.. vim: set et sw=3 sts=3 :\n',), **kwargs={})
    488                         s = open(cache_file).read()
    489                 except:
=>  490                         s = f(*args, **kwargs)
    491                         open(cache_file, 'w').write(s)
    492                 return s
global s = '/blog/blog.cgi', f = <function rst_to_html>, args = ('.. note::\n\n Tango__ 0.99.9 has a bug__ in its ...und to experiment.\n\n\n.. vim: set et sw=3 sts=3 :\n',), kwargs = {}
 /srv/llucax.com.ar/www/blog/blog.cgi in rst_to_html(rst='.. note::\n\n Tango__ 0.99.9 has a bug__ in its ...und to experiment.\n\n\n.. vim: set et sw=3 sts=3 :\n', secure=True)
    507         }
    508         parts = publish_parts(rst, settings_overrides = settings,
=>  509                                 writer_name = "html")
    510         return parts['body'].encode('utf8')
    511 
writer_name undefined
 /usr/lib/python2.7/dist-packages/docutils/core.py in publish_parts(source='.. note::\n\n Tango__ 0.99.9 has a bug__ in its ...und to experiment.\n\n\n.. vim: set et sw=3 sts=3 :\n', source_path=None, source_class=<class docutils.io.StringInput>, destination_path=None, reader=None, reader_name='standalone', parser=None, parser_name='restructuredtext', writer=None, writer_name='html', settings=None, settings_spec=None, settings_overrides={'file_insertion_enabled': True, 'halt_level': 1, 'input_encoding': 'utf8', 'output_encoding': 'utf8', 'raw_enabled': True, 'traceback': 1}, config_section=None, enable_exit_status=None)
    433         settings_overrides=settings_overrides,
    434         config_section=config_section,
=>  435         enable_exit_status=enable_exit_status)
    436     return pub.writer.parts
    437 
enable_exit_status = None
 /usr/lib/python2.7/dist-packages/docutils/core.py in publish_programmatically(source_class=<class docutils.io.StringInput>, source='.. note::\n\n Tango__ 0.99.9 has a bug__ in its ...und to experiment.\n\n\n.. vim: set et sw=3 sts=3 :\n', source_path=None, destination_class=<class docutils.io.StringOutput>, destination=None, destination_path=None, reader=None, reader_name='standalone', parser=None, parser_name='restructuredtext', writer=None, writer_name='html', settings=None, settings_spec=None, settings_overrides={'file_insertion_enabled': True, 'halt_level': 1, 'input_encoding': 'utf8', 'output_encoding': 'utf8', 'raw_enabled': True, 'traceback': 1}, config_section=None, enable_exit_status=None)
    646         settings_spec, settings_overrides, config_section)
    647     pub.set_source(source, source_path)
    648     pub.set_destination(destination, destination_path)
=>  649     output = pub.publish(enable_exit_status=enable_exit_status)
    650     return output, pub
output undefined, pub = <docutils.core.Publisher instance>, pub.publish = <bound method Publisher.publish of <docutils.core.Publisher instance>>, enable_exit_status = None
 /usr/lib/python2.7/dist-packages/docutils/core.py in publish(self=<docutils.core.Publisher instance>, argv=None, usage=None, description=None, settings_spec=None, settings_overrides=None, config_section=None, enable_exit_status=None)
    209             self.set_io()
    210             self.document = self.reader.read(self.source, self.parser,
=>  211                                              self.settings)
    212             self.apply_transforms()
    213             output = self.writer.write(self.document, self.destination)
self = <docutils.core.Publisher instance>, self.settings = <Values at 0xa685bec: {'attribution': 'dash', 'f... 'math_output': 'MathML', 'dump_settings': None}>
 /usr/lib/python2.7/dist-packages/docutils/readers/__init__.py in read(self=<docutils.readers.standalone.Reader instance>, source=docutils.io.StringInput: source='.. note::\n\n ...m: set et sw=3 sts=3 :\n', source_path='<string>', parser=<docutils.parsers.rst.Parser instance>, settings=<Values at 0xa685bec: {'attribution': 'dash', 'f... 'math_output': 'MathML', 'dump_settings': None}>)
     67         self.settings = settings
     68         self.input = self.source.read()
=>   69         self.parse()
     70         return self.document
     71 
self = <docutils.readers.standalone.Reader instance>, self.parse = <bound method Reader.parse of <docutils.readers.standalone.Reader instance>>
 /usr/lib/python2.7/dist-packages/docutils/readers/__init__.py in parse(self=<docutils.readers.standalone.Reader instance>)
     73         """Parse `self.input` into a document tree."""
     74         self.document = document = self.new_document()
=>   75         self.parser.parse(self.input, document)
     76         document.current_source = document.current_line = None
     77 
self = <docutils.readers.standalone.Reader instance>, self.parser = <docutils.parsers.rst.Parser instance>, self.parser.parse = <bound method Parser.parse of <docutils.parsers.rst.Parser instance>>, self.input = u'.. note::\n\n Tango__ 0.99.9 has a bug__ in its ...und to experiment.\n\n\n.. vim: set et sw=3 sts=3 :\n', document = <document: <note...><target...><target...><admonition...><target... ...>
 /usr/lib/python2.7/dist-packages/docutils/parsers/rst/__init__.py in parse(self=<docutils.parsers.rst.Parser instance>, inputstring=u'.. note::\n\n Tango__ 0.99.9 has a bug__ in its ...und to experiment.\n\n\n.. vim: set et sw=3 sts=3 :\n', document=<document: <note...><target...><target...><admonition...><target... ...>)
    155               inputstring, tab_width=document.settings.tab_width,
    156               convert_whitespace=1)
=>  157         self.statemachine.run(inputlines, document, inliner=self.inliner)
    158         self.finish_parse()
    159 
self = <docutils.parsers.rst.Parser instance>, self.statemachine = <docutils.parsers.rst.states.RSTStateMachine instance>, self.statemachine.run = <bound method RSTStateMachine.run of <docutils.parsers.rst.states.RSTStateMachine instance>>, inputlines = [u'.. note::', u'', u' Tango__ 0.99.9 has a bug__ in its runtime, which sometimes makes the GC scan', u' memory that should not be scanned. It only affects Dil and Voronoi programs,', u' but in a significant way. The tests in this post are done using a patched', u' runtime, with the bug fixed.', u'', u'__ http://www.dsource.org/projects/tango', u'__ http://www.dsource.org/projects/tango/ticket/1971', u'', u'', u'.. admonition:: Update', u'', u' The results for the unpublished programs are now available__. You can find', u' the graphic results, the detailed summary and the source code for all the', u' programs (except dil, which can be downloaded from its home site).', u'', u'__ http://llucax.com.ar/blog/posts/2010/08/14-memory-allocation/', u'', u'', ...], document = <document: <note...><target...><target...><admonition...><target... ...>, inliner undefined, self.inliner = None
 /usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py in run(self=<docutils.parsers.rst.states.RSTStateMachine instance>, input_lines=[u'.. note::', u'', u' Tango__ 0.99.9 has a bug__ in its runtime, which sometimes makes the GC scan', u' memory that should not be scanned. It only affects Dil and Voronoi programs,', u' but in a significant way. The tests in this post are done using a patched', u' runtime, with the bug fixed.', u'', u'__ http://www.dsource.org/projects/tango', u'__ http://www.dsource.org/projects/tango/ticket/1971', u'', u'', u'.. admonition:: Update', u'', u' The results for the unpublished programs are now available__. You can find', u' the graphic results, the detailed summary and the source code for all the', u' programs (except dil, which can be downloaded from its home site).', u'', u'__ http://llucax.com.ar/blog/posts/2010/08/14-memory-allocation/', u'', u'', ...], document=<document: <note...><target...><target...><admonition...><target... ...>, input_offset=0, match_titles=1, inliner=<docutils.parsers.rst.states.Inliner instance>)
    168         self.node = document
    169         results = StateMachineWS.run(self, input_lines, input_offset,
=>  170                                      input_source=document['source'])
    171         assert results == [], 'RSTStateMachine.run() results should be empty!'
    172         self.node = self.memo = None    # remove unneeded references
input_source undefined, document = <document: <note...><target...><target...><admonition...><target... ...>
 /usr/lib/python2.7/dist-packages/docutils/statemachine.py in run(self=<docutils.parsers.rst.states.RSTStateMachine instance>, input_lines=[u'.. note::', u'', u' Tango__ 0.99.9 has a bug__ in its runtime, which sometimes makes the GC scan', u' memory that should not be scanned. It only affects Dil and Voronoi programs,', u' but in a significant way. The tests in this post are done using a patched', u' runtime, with the bug fixed.', u'', u'__ http://www.dsource.org/projects/tango', u'__ http://www.dsource.org/projects/tango/ticket/1971', u'', u'', u'.. admonition:: Update', u'', u' The results for the unpublished programs are now available__. You can find', u' the graphic results, the detailed summary and the source code for all the', u' programs (except dil, which can be downloaded from its home site).', u'', u'__ http://llucax.com.ar/blog/posts/2010/08/14-memory-allocation/', u'', u'', ...], input_offset=0, context=[], input_source='<string>', initial_state=None)
    235                                 % (source, offset, self.line))
    236                         context, next_state, result = self.check_line(
=>  237                             context, state, transitions)
    238                     except EOFError:
    239                         if self.debug:
context = [], state = <docutils.parsers.rst.states.Text instance>, transitions = None
 /usr/lib/python2.7/dist-packages/docutils/statemachine.py in check_line(self=<docutils.parsers.rst.states.RSTStateMachine instance>, context=[], state=<docutils.parsers.rst.states.Text instance>, transitions=['blank', 'indent', 'underline', 'text'])
    456                           '"%s" in state "%s".'
    457                           % (name, state.__class__.__name__))
=>  458                 return method(match, context, next_state)
    459         else:
    460             if self.debug:
method = <bound method Text.underline of <docutils.parsers.rst.states.Text instance>>, match = <_sre.SRE_Match object>, context = [], next_state = 'Body'
 /usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py in underline(self=<docutils.parsers.rst.states.Text instance>, match=<_sre.SRE_Match object>, context=[], next_state='Body')
   2707         style = underline[0]
   2708         context[:] = []
=> 2709         self.section(title, source, style, lineno - 1, messages)
   2710         return [], next_state, []
   2711 
self = <docutils.parsers.rst.states.Text instance>, self.section = <bound method Text.section of <docutils.parsers.rst.states.Text instance>>, title = u'bh allocation pattern', source = u'bh allocation pattern\n---------------------', style = u'-', lineno = 64, messages = []
 /usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py in section(self=<docutils.parsers.rst.states.Text instance>, title=u'bh allocation pattern', source=u'bh allocation pattern\n---------------------', style=u'-', lineno=63, messages=[])
    327         """Check for a valid subsection and create one if it checks out."""
    328         if self.check_subsection(source, style, lineno):
=>  329             self.new_subsection(title, lineno, messages)
    330 
    331     def check_subsection(self, source, style, lineno):
self = <docutils.parsers.rst.states.Text instance>, self.new_subsection = <bound method Text.new_subsection of <docutils.parsers.rst.states.Text instance>>, title = u'bh allocation pattern', lineno = 63, messages = []
 /usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py in new_subsection(self=<docutils.parsers.rst.states.Text instance>, title=u'bh allocation pattern', lineno=63, messages=[])
    396         newabsoffset = self.nested_parse(
    397               self.state_machine.input_lines[offset:], input_offset=absoffset,
=>  398               node=section_node, match_titles=1)
    399         self.goto_line(newabsoffset)
    400         if memo.section_level <= mylevel: # can't handle next section?
node undefined, section_node = <section "bh allocation pattern": <title...><paragraph...><target...><target...><target... ...>, match_titles undefined
 /usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py in nested_parse(self=<docutils.parsers.rst.states.Text instance>, block=StringList([u'', u'This is a translation__ by `L...g>', 287), ('<string>', 288), ('<string>', 289)]), input_offset=64, node=<section "bh allocation pattern": <title...><paragraph...><target...><target...><target... ...>, match_titles=1, state_machine_class=<class docutils.parsers.rst.states.NestedStateMachine>, state_machine_kwargs={'initial_state': 'Body', 'state_classes': (<class docutils.parsers.rst.states.Body>, <class docutils.parsers.rst.states.BulletList>, <class docutils.parsers.rst.states.DefinitionList>, <class docutils.parsers.rst.states.EnumeratedList>, <class docutils.parsers.rst.states.FieldList>, <class docutils.parsers.rst.states.OptionList>, <class docutils.parsers.rst.states.LineBlock>, <class docutils.parsers.rst.states.ExtensionOptions>, <class docutils.parsers.rst.states.Explicit>, <class docutils.parsers.rst.states.Text>, <class docutils.parsers.rst.states.Definition>, <class docutils.parsers.rst.states.Line>, <class docutils.parsers.rst.states.SubstitutionDef>, <class docutils.parsers.rst.states.RFC2822Body>, <class docutils.parsers.rst.states.RFC2822List>)})
    282                                                 **state_machine_kwargs)
    283         state_machine.run(block, input_offset, memo=self.memo,
=>  284                           node=node, match_titles=match_titles)
    285         if use_default == 2:
    286             self.nested_sm_cache.append(state_machine)
node = <section "bh allocation pattern": <title...><paragraph...><target...><target...><target... ...>, match_titles = 1
 /usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py in run(self=<docutils.parsers.rst.states.NestedStateMachine instance>, input_lines=StringList([u'', u'This is a translation__ by `L...g>', 287), ('<string>', 288), ('<string>', 289)]), input_offset=64, memo=<docutils.parsers.rst.states.Struct instance>, node=<section "bh allocation pattern": <title...><paragraph...><target...><target...><target... ...>, match_titles=1)
    193         self.language = memo.language
    194         self.node = node
=>  195         results = StateMachineWS.run(self, input_lines, input_offset)
    196         assert results == [], ('NestedStateMachine.run() results should be '
    197                                'empty!')
results undefined, global StateMachineWS = <class docutils.statemachine.StateMachineWS>, StateMachineWS.run = <unbound method StateMachineWS.run>, self = <docutils.parsers.rst.states.NestedStateMachine instance>, input_lines = StringList([u'', u'This is a translation__ by `L...g>', 287), ('<string>', 288), ('<string>', 289)]), input_offset = 64
 /usr/lib/python2.7/dist-packages/docutils/statemachine.py in run(self=<docutils.parsers.rst.states.NestedStateMachine instance>, input_lines=StringList([u'', u'This is a translation__ by `L...g>', 287), ('<string>', 288), ('<string>', 289)]), input_offset=64, context=[], input_source=None, initial_state=None)
    235                                 % (source, offset, self.line))
    236                         context, next_state, result = self.check_line(
=>  237                             context, state, transitions)
    238                     except EOFError:
    239                         if self.debug:
context = [], state = <docutils.parsers.rst.states.Body instance>, transitions = None
 /usr/lib/python2.7/dist-packages/docutils/statemachine.py in check_line(self=<docutils.parsers.rst.states.NestedStateMachine instance>, context=[], state=<docutils.parsers.rst.states.Body instance>, transitions=['blank', 'indent', 'bullet', 'enumerator', 'field_marker', 'option_marker', 'doctest', 'line_block', 'grid_table_top', 'simple_table_top', 'explicit_markup', 'anonymous', 'line', 'text'])
    456                           '"%s" in state "%s".'
    457                           % (name, state.__class__.__name__))
=>  458                 return method(match, context, next_state)
    459         else:
    460             if self.debug:
method = <bound method Body.explicit_markup of <docutils.parsers.rst.states.Body instance>>, match = <_sre.SRE_Match object>, context = [], next_state = 'Body'
 /usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py in explicit_markup(self=<docutils.parsers.rst.states.Body instance>, match=<_sre.SRE_Match object>, context=[], next_state='Body')
   2281     def explicit_markup(self, match, context, next_state):
   2282         """Footnotes, hyperlink targets, directives, comments."""
=> 2283         nodelist, blank_finish = self.explicit_construct(match)
   2284         self.parent += nodelist
   2285         self.explicit_list(blank_finish)
nodelist undefined, blank_finish undefined, self = <docutils.parsers.rst.states.Body instance>, self.explicit_construct = <bound method Body.explicit_construct of <docutils.parsers.rst.states.Body instance>>, match = <_sre.SRE_Match object>
 /usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py in explicit_construct(self=<docutils.parsers.rst.states.Body instance>, match=<_sre.SRE_Match object>)
   2293             if expmatch:
   2294                 try:
=> 2295                     return method(self, expmatch)
   2296                 except MarkupError, error: # never reached?
   2297                     message = ' '.join(error.args)
method = <function directive>, self = <docutils.parsers.rst.states.Body instance>, expmatch = <_sre.SRE_Match object>
 /usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py in directive(self=<docutils.parsers.rst.states.Body instance>, match=<_sre.SRE_Match object>, **option_presets={})
   2034         if directive_class:
   2035             return self.run_directive(
=> 2036                 directive_class, match, type_name, option_presets)
   2037         else:
   2038             return self.unknown_directive(type_name)
directive_class = <class 'docutils.parsers.rst.directives.body.Container'>, match = <_sre.SRE_Match object>, type_name = u'container', option_presets = {}
 /usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py in run_directive(self=<docutils.parsers.rst.states.Body instance>, directive=<class 'docutils.parsers.rst.directives.body.Container'>, match=<_sre.SRE_Match object>, type_name=u'container', option_presets={})
   2079                                                    ' '.join(detail.args)),
   2080                 nodes.literal_block(block_text, block_text),
=> 2081                 source=src, line=srcline)
   2082             return [error], blank_finish
   2083         directive_instance = directive(
source undefined, src = '<string>', line undefined, srcline = 78
 /usr/lib/python2.7/dist-packages/docutils/utils.py in error(self=<docutils.utils.Reporter instance>, *args=(u'Error in "container" directive:\nunknown option: "align".', <literal_block: <#text: '.. container:: ...'>>), **kwargs={'line': 78, 'source': '<string>'})
    222         output will contain errors.
    223         """
=>  224         return self.system_message(self.ERROR_LEVEL, *args, **kwargs)
    225 
    226     def severe(self, *args, **kwargs):
self = <docutils.utils.Reporter instance>, self.system_message = <bound method Reporter.system_message of <docutils.utils.Reporter instance>>, self.ERROR_LEVEL = 3, args = (u'Error in "container" directive:\nunknown option: "align".', <literal_block: <#text: '.. container:: ...'>>), kwargs = {'line': 78, 'source': '<string>'}
 /usr/lib/python2.7/dist-packages/docutils/utils.py in system_message(self=<docutils.utils.Reporter instance>, level=3, message=u'Error in "container" directive:\nunknown option: "align".', *children=(<literal_block: <#text: '.. container:: ...'>>,), **kwargs={'line': 78, 'source': '<string>'})
    188             self.stream.write(msg.astext() + '\n')
    189         if level >= self.halt_level:
=>  190             raise SystemMessage(msg, level)
    191         if level > self.DEBUG_LEVEL or self.debug_flag:
    192             self.notify_observers(msg)
global SystemMessage = <class 'docutils.utils.SystemMessage'>, msg = <system_message: <paragraph...><literal_block...>>, level = 3

<class 'docutils.utils.SystemMessage'>: <string>:78: (ERROR/3) Error in "container" directive: unknown option: "align". .. container:: :align: center .. image:: http://llucax.com.ar/blog/posts/2010/08/14-memory-allocation/bh.rq.tot.png .. image:: http://llucax.com.ar/blog/posts/2010/08/14-memory-allocation/bh.rq.bin.png .. image:: http://llucax.com.ar/blog/posts/2010/08/14-memory-allocation/bh.ws.tot.png .. image:: http://llucax.com.ar/blog/posts/2010/08/14-memory-allocation/bh.ws.bin.png
      args = (u'<string>:78: (ERROR/3) Error in "container" dire...posts/2010/08/14-memory-allocation/bh.ws.bin.png\n',)
      level = 3
      message = u'<string>:78: (ERROR/3) Error in "container" dire...posts/2010/08/14-memory-allocation/bh.ws.bin.png\n'