Drawing Diagonal Lines with CSS

July 31, 2008

CSS allows you to specify so many bizarre graphical properties, that I couldn’t believe it doesn’t support drawing simple lines. Well, it doesn’t. And googling for how to draw a diagonal line was not effective. The closest result I got was how to draw a trapezoid.

So how to draw lines? The first idea that crossed my mind was stretching images of diagonal lines (which this site suggests as well, after suggesting to implement Breshenham 🙂 ). However, all resulting lines  have popping pixels, missing pixels, and a general unevenness.

So, I present a new way. It’s purely in CSS, involves little code, draws very quickly and can be used for almost any diagonal line possible.

Of course it has its problems as well, and I will discuss them later.

Drawing Diagonal Lines

Diagonals Example

Demonstration of my diagonals technique

Lines can be drawn by improving on the trapezoid technique, and placing a “white triangle” (a trapezoid without a width) with a slight offset on top a “solid triangle” to erase its body.

Here’s how the HTML and CSS look (make sure you understand how the trapezoid is done before you continue reading, so you’ll understand the rest):

<style type="text/css">
  .line {
    position:absolute; /* doesn't have to be absolute */
    z-index: -1;         /* places the line behind other elements */
  }

  .line div {
    position:absolute;
    left: 0px; top: 0px;
    border-left-color: transparent;
    border-style: solid;
  }
</style>
<!-- a line from A to B -->
<div class="line" style="left:50px; top:40px; /* point A */">
<div style="
    border-width: 0px 0px 150px 100px;        /* point B (minus A) */
    border-bottom-color: red;                 /* line color */
  "></div>
<div style="
    left: 2px;                                /* line width */
    border-width: 0px 0px 150px 100px;        /* point B (minus A) */
    border-bottom-color: white;               /* background color */
  "></div>
</div>

Explanation

The outer div contains the line. Position it as you wish (I chose absolute positioning and this is probably what you want too. left and top can be used to set the coords for the beginning of the line).
Putting it in the background (using a negative z-index) is very important, or the erasing-triangle will erase your content as well.

Inside it, are two inner divs. The first one is the solid-triangle. Its border-width determines the end-point of the line (meaning: its angle and length). Its border-bottom-color is the color of the line.

The second div is the eraser-triangle. Its border-width must match the first div’s, unless you want your line to have a varying width (practically rendering it a triangle). The left property specifies the width of the line (by actually deciding the offset of the erasing-triangle). Now, this is not so simple: It specifies the horizontal width. So as the line leans more and more to the right, the line will seem thinner. It is important to increase left proportionally to maintain desired line-width. Another important issue is the erasing-color (that is, the border-bottom-color). Its color should match that of the background.

Suggestions

This only demonstrates a rising line, but can be as easily used for a falling line by using right borders and a negative left for the erasing triangle.

To save the trouble of calculations, a simple JavaScript function can be used to draw these lines.

Limitations

Nothing is perfect, and this technique is not even close to being so.

  • Background must be solid – I can’t think of a way to allow the lines on top of an image background.
  • Lines cannot intersect – This is basically the same as the previous, but worth mentioning
  • Cannot draw horizontal and vertical lines. They’re easy to draw, but it’s less pretty (for the programmer).
  • Must distinguish between rising and falling lines – again, less pretty.
  • Lines have to be declared in a specific order: Lower lines should appear later in the HTML (unless you specify their z-index manually)

Conclusion (just for fun)

I presented here a novel method for drawing lines using CSS. It has certain strengths and weaknesses. In general, I think it is very useful.

Enjoy!

Diagonals Example 2

Another demonstration


Open-Source Issues

July 25, 2008

Putting parsing aside for a bit, I wish to rant about some issues with open-source which have been bothering me for a while now.

What drives a person to work on an open-source project is usually the strong wish to contribute back to the community and be part of it. It’s somewhat of a moral decision. So very often, perhaps more often than not, a volunteer would decide that he wants to join a project before he even knows which project it’d be.

And so in selecting a project, some of the choices he faces are these:

  • Should he join an old and stable project — in which he will have to spend some time learning the existing code base, and will have to work very hard to be significant — or should he join a new project — in which everyone start from scratch, and in which he can more easily leave his mark? Perhaps he should even create a new project where he’d have the strongest say?
  • Should he join an infrastructure project which is meant for programmers, or should he join a GUI-based project that could appeal to anyone — in which he has much higher chances of fame and recognition ?
  • Should he join an innovative project that has to convince its users that they need it, or should he join a project with proven demand (such as IM) — which is more likely to be used and is easier to advertise?

There are many other choices an open-source contributor faces in selecting a project, but I’ve chosen these three because I think that the decisions commonly made for them, while understandable, are damaging to the world of open-source software.

This is a strong claim, I know, and so I’ll spend the rest of this post trying to establish it.

Common Decisions

I cannot prove that the common decisions are what I claim they are; I have no statistics to use. It is just my impression out of my own experience with open-source.

Without proof, I can only try and persuade you by using examples as evidence. While presenting these examples I’ll try to explain why I think they are so damaging to open-source.

Effects – Example 1: Attack of the Clones

Clones are possibly the most common pattern in open-source. For every purpose, need and audience there are many projects, and most of them are not different enough to technologically justify being separate projects. They do mostly the same thing but don’t share libraries, design or knowledge. A quick look at this page can demonstrates this easily: Can you really tell all of these parsers apart? I can’t. They can probably all be combined into two or three projects with some additional modules and options. This is just an easy example. There are dozens of IMs and bittorrent clients and hundreds of text editors and code editors with the same list of feature list and only a slightly different GUI. I see a new CMS announced every week, and that’s just for python. The clones have gotten beyond the point of healthy variety. There are just too many.

Why is this so bad? Isn’t competition beneficial for product quality?

Yes, competition and variety are important, but open-source is not a capitalistic system: The work force is very limited (and even more limited in work hours) and is not necessarily driven by profit, and the goals of all competitors are fairly well-defined and mostly similar. So, the capitalistic model doesn’t necessarily apply(and let’s not forget, open-source is communism 🙂 ).

Open-source acts like it’s got good programmers to spare, and it doesn’t. If we took all the developers of bittorrent clients (for example), removed them from their project and divided them between only three bittorrent clients, we would have significantly better clients – in stability, features, support, etc. Variety would come as options within the projects.

Effects – Example 2:  Are there really 30 ways to do it?

With similar goals should rise cooperation. Differences are important, but similarities should be exploited to reduce amount of code written in each project (more code = more time required and more bugs).

There are so many IMs, and each one is implementing its protocols on its own. The same goes for bittorrent clients, as this table demonstrates (remember: different features == different implementation). Think of all this wasted time that could’ve been used to make two or three good, stable and feature-rich bittorrent libraries. GUIs can then use these libraries and be as varied as they want. Amount of time saved would be incredible, without any loss of variety, and with the gain of quality.

Remember: Every time you write a library for programmers, you are cooperating.

But who wants to write tedious libraries and APIs? GUIs are so much more fun. *Click*

I’m picking on IMs and bittorrent clients because it’s easy. But this applies to most categories of open-source projects that I know.

Effects – Innovation

I cannot give an example to demonstrate that there is hardly any innovation. I can only note that I hardly run into any. Most new open-source projects I see have been done before, and in a fairly similar way. Lists of features in projects seem almost identical to their competitors’ lists. Sometimes I’ll find out a project has a little innovative feature — nothing radical — and it would disproportionately brag about it.

But I do know that clones must hurt innovation. The evolutionary process depends on a large variety of “mistakes” for it to choose from. When most of the projects are the same as their “predecessors”, natural selection has little to select from. We need odd projects that nobody thinks they need. Perhaps only one would succeed out of a hundred, but having true innovation might be worth the sacrifice.

Perhaps innovation is out there. If it is, please let me know.

But these are natural trends! What can we possibly do?

Suppose I convinced you that these are actual problems in open-source. Is there anything to do about them?

Well, obviously we cannot tell volunteers which projects to join and how to run their projects. We can’t force projects to cooperate. But we can try and persuade them. We can show them our reasoning and ask them politely.

Mature projects could try and be more friendly to new-comers. They can give them more freedom to do forks or optional modules. They can actively encourage ideas and involvement (even if not always directly beneficial). They can make their code-base more accessible to new readers.

If you have any more ideas I’d be happy to hear, but basically this is all in the hands of the masses. If you think my ideas are correct and want to see the trends shift, spread the word. (And if you think I’m incorrect, please let me know)

Open-Source Programmers Of The World – UNITE!


Python Parsing Ep. II: Antlr Issues

July 17, 2008

Continuing that post.

Before I go on, I wish to make 2 corrections to the code provided in the last post:

  1. I think that the grammar for COMMENT is incorrect (in the original too). It requires comments to always end with a new-line. I changed that requirement to be optional, and now it seems more agreeable with end-of-files. For your consideration.
  2. The code for INDENT/DEDENT cannot work because ANTLR’s Emit cannot be called twice from the same symbol: Only the last call registers. That is fixable by overriding Emit and NextToken (I found this code somewhere on the internet, cannot remember where). And so, here is my code for the lexer:
    @lexer::members {
    int implicitLineJoiningLevel = 0;
    int startPos=0;
    System.Collections.Generic.List<int> IndentStack = new System.Collections.Generic.List<int>() { 0 };
    
    System.Collections.Generic.List<IToken> tokens = new System.Collections.Generic.List<IToken>();
    public override void Emit(IToken token) {
            this.token = token;
    		token.Line = this.tokenStartLine;
    		token.CharPositionInLine = this.tokenStartCharPositionInLine;
        	tokens.Add(token);
    }
    public override IToken NextToken() {
        	base.NextToken();
            if ( tokens.Count==0 ) {
                return Token.EOF_TOKEN;
            }
    		Token t = (Token)tokens[0];
    		tokens.RemoveAt(0);
            return t;
    }
    
    }
    
    

Now that we’re done with the lexing…

We have a lexer working, which inputs python code and outputs tokens. So now all we have to do is feed the formal grammar into ANTLR and our work is done, right?

Well, ANTLR’s tree API is very unfriendly; I spent hours figuring it out and I still don’t really get it. And since the parser just throws a sequence of tokens at you, I had no obvious way of constructing some hierarchy out of the code.
But even if I could somehow untangle the mysteries of ANTLR’s trees, I had a bigger problem which is only relevant if you want to display the code (which I did): The output from the parser contains no regard for whitespace or comments that were originally there. I had to find the symbol’s start and end in the original text, and copy the text from there. But symbols didn’t end where they “ought to”. Since comments were removed before reaching the parser, symbols didn’t include them even in their position marker, so:

class A:
    hello = "world"
    # just some class

… would leave the comment out of the scope of the class, which is bad. It would be a lot easier to just write an interpreter…

But I conjured a solution to both of my problems at once: I will insert markers — fake symbols — at the start and end of all definitions. I took the chance to insert other markers which would ease my life. So my class definition looked something like that:

classdef:
	{ParserEmit($classdef.start, $classdef.stop, ParserToken.TokenType.ClassBegin, "");}
	CLASS classname
	{ParserEmit($classname.start, $classname.stop, ParserToken.TokenType.Name, $classname.text);}
	(LPAREN testlist? RPAREN
		{ParserEmit($testlist.start, $testlist.stop, ParserToken.TokenType.Args, $testlist.text);}
	)? colon
	(suite {ParserEmit($suite.start, $suite.stop, ParserToken.TokenType.ClassEnd, "");}
	| NEWLINE
		{ParserEmit($colon.start, $colon.stop, ParserToken.TokenType.Error, "Empty Class Def");
		 ParserEmit($colon.start, $colon.stop, ParserToken.TokenType.ClassEnd, "");
		}
	)
	;

(If you are familiar with ANTLR you’ll notice that I’m using the ‘Emit’ function “incorrectly”. I will explain soon enough)

Observant readers might notice that I only delayed the problem – the resulting stream of fake symbols are themselves a production of a formal grammar, and need to be parsed in order to construct hierarchy. True, but the result is a lot simpler than Python code; I am now dealing with symbols (=classes) and not with text (=strings).
However, this is not my only problem. As I said earlier, ‘Emit’ is only designed to be called once for each symbol. I didn’t like Emit very much anyway, and so I created my own emit function (I called it ‘ParserEmit’). It simply saves everything it receives in a list, which I can later iterate over in my code. Just for the record, here’s how it looks:

public System.Collections.Generic.List<ParserToken> parser_toks =
	new System.Collections.Generic.List<ParserToken>();

void ParserEmit(IToken start, IToken stop, ParserToken.TokenType t, string text)
{
	ParserToken tok = new ParserToken() { type=t, start=start, stop=stop, text=text };
	parser_toks.Add(tok);
}

(ParserToken is a class I made up. You can figure out what it looks like)

Now I had a list of symbols telling me exactly where each class/function starts/ends.
Next post will be Ep. 3 and last, where I will describe how I turned those symbols into a tree, rant about text-obsession in parsers, and provide my complete ANTLR grammar for Python (which just has too many changes/additions to describe each individually).


Python Parsing Ep. I: Lexing Python (with ANTLR)

July 12, 2008

For a while now, I’ve been writing a code editor for Python. I know there are quite enough of these, but I wanted something a bit different. I decided to write it in C#, and I’m glad I did  –  C#’s GUI is really easy and fun to write, probably the best GUI-writing experience I’ve had since VB.

So, at some point of the project I did the horrible mistake of attempting to parse Python. Now, had I been trying to parse C, or maybe Perl, or actually almost any other language (except for C++, which is a parser’s worst nightmare) it would’ve been a piece of cake. But no, I had to try python. And so, it began.

The Parser
Unlike for some other languages, deciding on a parser generator for C# was easy: There’s ANTLR, and there’s the rest. No competition. (Later I would find out about Grammatica, which is pretty good. More on that some other time).

ANTLR’s website provides a variety of grammars for the lazy programmer, and among them a grammar for Python. Thinking that I evaded real work, I tried to compile it:

F:\antlr_play> Antlr3.Tool.exe Python.g
ANTLR Parser Generator  Version 3.0 (May 17, 2007)  1989-2007
warning(105): Python.g:289:17: no lexer rule corresponding to
                               token: INDENT
warning(105): Python.g:289:32: no lexer rule corresponding to
                               token: DEDENT
... (some more warnings about how the grammar is not
     deteministic) ...

Ouch. Looks like someone did not implement INDENT and DEDENT in the lexer.
Turns out that this grammar was written for Java, which has some kind of a special pre-processor (or post?) for Python (called PythonTokenStream.java) to produce the right indentation tokens.

C# does not have one.

Googling produced no results that could help me, and so I set out to write the lexer grammar myself.

Lexing Python
Now, a few things about lexing python.

  1. Python’s indentation cannot be solved with a DFA. (I’m still perplexed at whether it can even be solved with a context-free grammar).
  2. PyPy produced an interesting post about lexing Python (they intend to solve it using post-processing the lexer output)
  3. CPython’s tokenizer is written in C. It’s ad-hoc, hand-written, and complex. It is the only official implementation of Python lexing that I know of.

But the best part is that I wasn’t aware of any of these points when I started writing my lexer grammar.

What I did find was the state-machine algorithm for the indentation, found in Python’s official documentation:

Before the first line of the file is read, a single zero is pushed on the stack; this will never be popped off again. The numbers pushed on the stack will always be strictly increasing from bottom to top. At the beginning of each logical line, the line’s indentation level is compared to the top of the stack. If it is equal, nothing happens. If it is larger, it is pushed on the stack, and one INDENT token is generated. If it is smaller, it must be one of the numbers occurring on the stack; all numbers on the stack that are larger are popped off, and for each number popped off a DEDENT token is generated. At the end of the file, a DEDENT token is generated for each number remaining on the stack that is larger than zero.

Or in python pseudo code:

stack = [0]
for line in code_lines:
    indent_level = get_indent_level( line )
    if stack[-1] < indent_level:
        stack.append( indent_level )
        emit_token( INDENT )
    else:
        while stack&#91;-1&#93; > indent_level:
            stack.pop()
            if stack[-1] < indent_level
                raise IndentationError()
            emit_token ( DEDENT )

    process_line ( line )

for dummy in stack&#91;1:&#93;:
    emit_token( DEDENT)
&#91;/sourcecode&#93;

Now, this is clean and pretty. That is because it's a post-processor, and is not part of the lexer. But when you deal with indentation while tokenizing, you have to take into account python's quirks: line joinings, both explicit (backslash) and implicit (parenthesis and friends). Also, indents and dedents are not counted for empty lines, so you have to take whitespace into account only if it is followed by text (using lookahead). And of course the whitespace must be thrown away after processing it (or your parser grammar will be awful).

So what person in his right mind wouldn't do it as post-processing? Well, me, and anyone else who doesn't know better. But, despite the complications, I managed to get the grammar to work, and I would like to share it with you. Only four disclaimers:
<ol>
	<li>It's written for C#. But with very little tweaking it should work for Java just the same (not like you need it).</li>
	<li>It's kinda ugly. Admittedly, it's not my best piece of code. But ugly code for an ugly task. Anyway, at least it's working. Oh, see 3.</li>
	<li>I've only tested it on few python files, and it's not impossible that it would fail on some eccentric piece of code.</li>
	<li>I had to paste it because wordpress doesn't seem to allow uploading non-media files, and it doesn't fit the width of the page. Just copy-paste it if you want to read it properly.</li>
</ol>
With that in mind, here's the relevant part changed in the lexer grammar (again, on top of <a href="http://www.antlr.org/grammar/1200715779785/Python.g" target="_blank">this</a> grammar):



/** Treat a sequence of blank lines as a single blank line.  If
 *  nested within a (..), {..}, or [..], then ignore newlines.
 *  If the first newline starts in column one, they are to be ignored.
 *
 *  Frank Wierzbicki added: Also ignore FORMFEEDS (\u000C).
 */
NEWLINE
@init {
    int spaces = 0;
}
    :   ((('\u000C')?('\r')? '\n' ) | '\t' | ' ' )* (('\u000C')?('\r')? '\n' )
		leading_space = (' '  { spaces++; } | '\t' { spaces += 8; spaces -= (spaces \% 8); } )*
        {
			if ( tokenStartCharPositionInLine==0 || implicitLineJoiningLevel>0 )
				//$channel=HIDDEN;
				Emit(new ClassicToken(NEWLINE,this.Text, HIDDEN));
			else {
				Emit(new ClassicToken(NEWLINE,this.Text));
				//newLine = true;
			}

			if (implicitLineJoiningLevel==0)
			{
				if ( spaces > IndentStack[IndentStack.Count - 1] ) {
					IndentStack.Add(spaces);
					Emit(new ClassicToken(INDENT,">"));

				}
				else
				{
					while (spaces < IndentStack&#91;IndentStack.Count - 1&#93;) {
						IndentStack.RemoveAt(IndentStack.Count - 1);
						Emit(new ClassicToken(DEDENT,"<"));
					}
				}

			}

        }
    ;

WS  :    {tokenStartCharPositionInLine>0}?=> (' '|'\t'|'\u000C')+ {$channel=HIDDEN;}
    ;

/** Grab everything before a real symbol.  Then if newline, kill it
 *  as this is a blank line.  If whitespace followed by comment, kill it
 *  as it's a comment on a line by itself.
 *
 *  Ignore leading whitespace when nested in [..], (..), {..}.
 */
LEADING_WS
@init {
    int spaces = 0;
}
    :   {tokenStartCharPositionInLine==0}?=>
        (   {implicitLineJoiningLevel>0}? ( ' ' | '\t' )+ {$channel=HIDDEN;}
           |    (     ' '  { spaces++; }
                |     '\t' { spaces += 8; spaces -= (spaces \% 8); }
                )+
            {
            // make a string of n spaces where n is column number - 1
            char[] indentation = new char[spaces];
            for (int i=0; i<spaces; i++) {
                indentation&#91;i&#93; = ' ';
            }
            String s = new String(indentation);
            Emit(new ClassicToken(LEADING_WS,new String(indentation)));

            }
            // kill trailing newline if present and then ignore
            ( ('\r')? '\n' {if (token!=null) token.Channel = HIDDEN; else $channel=HIDDEN;})*
           // {token.Channel = 99; }
        )
    ;

/** Comments not on line by themselves are turned into newlines.

    b = a # end of line comment

    or

    a = &#91;1, # weird
         2&#93;

    This rule is invoked directly by nextToken when the comment is in
    first column or when comment is on end of nonwhitespace line.

    Only match \n here if we didn't start on left edge; let NEWLINE return that.
    Kill if newlines if we live on a line by ourselves

    Consume any leading whitespace if it starts on left edge.
 */
COMMENT
@init {
    $channel=HIDDEN;
}
    :    {tokenStartCharPositionInLine==0}?=> (' '|'\t')* '#' (~'\n')* '\n'+
    |    {tokenStartCharPositionInLine>0}?=> '#' (~'\n')* // let NEWLINE handle \n unless char pos==0 for '#'
    ;

So, this is it so far. I hope it was useful.
The next part of the saga will discuss ANTLR specifically, explain why using it to parse python can also be complicated sometimes, and of course provide a solution.


Zen, Art and Programming

July 7, 2008

The main advantage of having no readership is that I have very few restrictions on what to write about.
(the main disadvantage is obvious)

So in this post, I would like to tell a short zen story –

Notice the excessive and redundant outlining (very blunt around the eyes), and that nothing is just quite right. I would be embarassed to display this in public, but everyones first steps were (or still are) awkward.

Notice the excessive and redundant outlining (very blunt around the eyes), and that nothing is just quite right. I would be embarassed to display this in public, but everyones first steps were (or still are) awkward.

Art
On my first drawing lesson (sketching, to be precise) my teacher gave me the task of copying a grayscaled painting with coal (coal is easy to erase and is good for beginners). The painting featured the face of a girl. As most people would, I started by outlining her face and hair, positioning her nose, shaping her eyes, and went on to coloring the paper to match the tones on the painting. I already spent a hour and a half working on the drawing yet naturally it was disfigured. The teacher did not seem to care much about this. Instead he remarked that in the painting the eyes are not outlined, nor is the face. He also remarked about the incorrect differences in tones. Then he said “this drawing is stuck”, took a small cloth and smeared the drawing, erasing most details and leaving only the general shape and tone. An hour and a half of work was ruined, and I was in shock. But I kept an open-mind and listened to his instructions. Here they are, partly in my own words and commentary, but in the same spirit:
1. Avoid detail. Anything you can’t see when reducing your eyes is not important
2. Your sight is biased. To be correct you have to compare elements to other elements (elements being location, size, color, etc.).
3. Observe. Spending time on understanding the drawing (the relationships of elements) will save you time when drawing.
He said this before but it only made sense at that point. And so I resumed drawing, trying to keep these rules. I was amazed by two things: How quickly I managed to reconstruct the painting, and how convincing it looked without any real detail. And so, I learned several things:
1. Your internal model is harder to construct than the drawing. Spend more time on observing, and you will spend less time on the drawing.
2. Detail distracts you from the more important problems.
3. Keep detail to the end. If you start from it, you will never get it right.

Programming
If you remember the title, you must ask yourself what all this has to do with programming. Well, as I returned home from the lesson I had some time to contemplate and I remembered a time when a friend and I had to teach someone how to program. This was at work, so that someone was committed. Among the teachings we conjured up an exercise which was supposed to bring the “student” to the right spirit. The right spirit, or as we called it, the “programming zen”, was very important to us. The exercise was this: The student was given a programming problem which he had to solve. After making sure that he completes the program correctly and that the code is of high quality, we order him to erase the program (and any copy) from his computer.
This seemingly cruel exercise was meant to give the following lesson: Code is not important. What’s important is your knowledge and your understanding of it. Once you solve a problem once, you can solve it again without an effort.

Zen
This was what I remembered, and I then realized that I have been taught what I tried to teach someone else some time ago, and it applies to both drawing and programming:

Your internal model is harder to construct than the output, and is also more important. Spend more time understanding the problem, and you will spend less time on implementing it.

I believe the other tips and learnings mentioned here also apply to programming, but these are stories for another time.