Interruption At Work Created A Mishap

2009-08-30 by monzool

INTERRUPTIONS AT WORK is a frequent occurrence but generally its not a big problem. This time however the unavoidable loss of focus on what you was doing before, gave an unpleasant surprise.

I was adding some new functionality and had just written the following:

switch (state) {
    case Step3:
        configuration.length = 10;
        break;
}

Next I added a line to specify the configuration data on index zero. With the intention of doing this for the remaining nine data indexes, I copy-pasted the first line and incremented the index.

switch (state) {
    case ConfigureTask:
        configuration.data[0] = 
        configuration.data[1] = 
        configuration.length = 10;
        break;
}

But this was at the exact moment a colleague asked a question. To figure out the answer I had to browse around in the same file I was just editing. Not finding the complete answer there, the hunt led on to opening a bunch of other files. Eventually the situation evolved to a discussion using a white-board.

Now, even though the above code is incomplete, it compiles to perfectly valid code!. What the above code does is to initialize configuration.data[0], configuration.data[1] and configuration.length to 10. Naturally this behavior was never the desired behavior for that code…

Later, returning to my workstation, I had completely forgotten about the unfinished implementation I worked on before. In my mind it was already done and I proceeded on other things that would eventually allow me to run some basic tests for the new implementations. The nature of the code is to delegate a state dependent number of black-box data to a task. The receiving task is found by peeking into the first byte of the black-box data (configuration.data[0]). Unfortunately ‘10′ is a perfect match for the first task to be configured. So when unit-testing, at first everything seemed to be okay.

Later some strange behavior appeared, for which I could find no good reasons. Eventually I found the faulting situation in great dismay.

This kind of logic errors is the kind that can become extremely difficult to find, and I’ve learned my lesson: if leaving in the middle of writing some source code, be sure to quickly add some non-code that will not compile.

Posted in C++, Entertainment, Lua, Personal, Programming | No Comments »

Upgraded From WordPress 2.2.2 to WordPress 2.7

2009-02-08 by monzool

WORDPRESS 2.7 UPGRADE from WordPress 2.2.2 done with no problems.
I was somewhat worried that my old theme would break on the upgrade, but It appears to be working. Well, I got at little issue with the Wp-Syntax plugin. I upgraded the plugin and now my GeShi style overrides do not work anymore – guess that is things one discover when doing infrequent upgrading.

I’m looking forward to working with this new WordPress version and exploring its added features.

Posted in Personal, Programming, Web | No Comments »

Numbers To Strings And Back Again – Standard C++ vs. Boost

2008-05-06 by monzool

CONVERTING NUMBERS TO strings or the opposite of converting strings to numbers, is an operation that is far from as trivial as one would expect from such an obvious task – at least when it comes to C++ programming using standard libraries. The converting can be performed by the iostringstream classes in the standard library. When searching Google for the C++ way of converting between numbers and streams, the stringstream library classes appears not to be the that well known, and especially its features of the number and string operations seems generally to be unknown by many.

The stringstream offers a large range of manipulating stream data, although if used for e.g. special formatted textual output, the implementation steps tends to be somewhat more cumbersome than the old printf family.

The example below takes a few more lines that doing e.g. a atoi or snprintf kind of operation, but depending on the situation, simple conversion scenarios do not require many lines of code.

Standard Input / Output Streams Library

The main function is extracted here, just not to obfuscate the picture of the actual converting. Note that stringstream is defined in the <sstream> header.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include <iostream>
#include <sstream>  // stringstream
 
// Prototypes
void Std_StringToInteger();
void Std_IntegerToString();
 
int main(int argc, char *argv[])
{
  std::cout << "Std_StringToInteger:" << std::endl;
  Std_StringToInteger();
  std::cout << "Std_IntegerToString:" << std::endl;
  Std_IntegerToString();
}

The function below handles conversion from strings to integers. First a simple conversion is done, then followed by an example of testing whether the conversion operation was a success. Last is shown how to enable exceptions on conversion errors.

15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
void Std_StringToInteger()
{
  std::string str = "1976";
  int val;
 
  // Load stringstream with text to convert
  std::istringstream is(str);
  // Convert by streaming to integer
  is >> val;
  std::cout << "  Val: " << val << std::endl;
 
  // Clear stream for another input
  is.clear();
 
  // Load stream with a non numeric convertible data
  is.str("Monzool.net");
  is >> val;
 
  // Test if conversion failed
  if (is.fail())
    std::cout << "  Conversion failed!" << std::endl;
 
  // Enable exceptions on conversion errors
  try
  {
    // Set failures to be thrown as exceptions
    is.exceptions(std::istringstream::eofbit  |
                  std::istringstream::failbit |
                  std::istringstream::badbit);
  }
  catch(std::istringstream::failure& e)
  {
    std::cout << "  Exception: " << e.what() << std::endl;
    std::cout << "  Conversion failed!" << std::endl;
  }
}

As the naming stringstream indicates, input and output is done by streaming measures. If not quite confident on stream directions, think of how functions cout and cin is used. Using stringstream is no different.

Last function is for converting from numbers to strings.

51
52
53
54
55
56
57
58
59
60
void Std_IntegerToString()
{
  int val = 1976;
 
  // Create empty stringstream for number to convert
  std::ostringstream os("");
  // Convert by streaming integer
  os << val;
  std::cout << "  Str: " <<  os.str() << std::endl;
}

Boost lexical_cast

To put it simple: when dealing with libraries for converting between numbers and strings the Boost library smokes the standard C++ library ditto.

The conversion features of Boost is located in the lexical_cast library and is embedded by including the lexical_cast.hpp file (most Boost libraries are implemented in header files and can be embedded by including the appropriate hpp file.).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include <iostream>
#include <boost/lexical_cast.hpp>
 
// Prototypes
void Boost_StringToInteger();
void Boost_IntegerToString();
 
int main(int argc, char *argv[])
{
  std::cout << "Boost_StringToInteger:" << std::endl;
  Boost_StringToInteger();
  std::cout << "Boost_IntegerToString:" << std::endl;
  Boost_IntegerToString();
}

Instead of using streaming functionality, Boost has chosen a much more obvious concept. Boost has added the functionality of simply casting between numbers and strings. Casting functions are already a familiar concept in C++, like casting between data types using static_cast or manipulating const’ness with const_cast.

The lexical_cast template function makes converting from string to integer trivial. The example below also shows how to handle conversion errors by exception handling.

15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
void Boost_StringToInteger()
{
  std::string str = "1976";
  // Cast string to integer
  int val = boost::lexical_cast<int>(str);
  std::cout << "  Val: " << val << std::endl;
 
  // Load string with non numeric convertible data
  str = "Monzool.net";
  try
  {
    // Non convertible values throws exceptions
    val = boost::lexical_cast<int>(str);
  }
  catch (boost::bad_lexical_cast &e)
  {
    std::cout << "  Exception: " << e.what() << std::endl;
    std::cout << "  Conversion failed!" << std::endl;
  }
}

Converting the other way from integer to string is just as trivial.

35
36
37
38
39
40
41
void Boost_IntegerToString()
{
  int val = 1976;
  // Cast integer to string
  std::string str = boost::lexical_cast<std::string>(val);
  std::cout << "  Str: " << str << std::endl;
}

When it comes to simple conversion between numbers and strings, Boost is far superior in simplicity. However note that the design goals have also been very different for the two libraries. The C++ Standard Input/Output Streams Library has been designed for flexibility. And flexible it is indeed, but sadly this side effects to complicating its usage even for obvious tasks that ought to be trivial to perform.

Posted in C++, Programming | No Comments »

Why Lambda?

2008-03-19 by monzool

I HAVE BEEN reading up on Python programming lately (more on that in a later post). I’ve now been introduced to anonymous functions. In Python, anonymous functions are available using the lambda keyword. Anonymous functions are great, but I think the Lua syntax for anonymous functions is superior to the syntax adopted in Python.


A normal function, in Python, is defined using the def keyword along with a function name.

>>> def f1(x, y):
...     return x + y
... 
>>> f1(1, 2)
3

In Python anonymous functions are created by a lambda expression.

>>> f2 = lambda x, y: x + y
>>> f2(1, 2)
3

Similar to anonymous function, normal Python functions are first class objects and can be assigned to other variables.

>>> f = f1
>>> f(1, 2)
3

However direct assignment of a function deceleration is not possible.

>>> f = def f3(x, y):
  File "<stdin>", line 1
    f = def f3(x, y):
           ^
SyntaxError: invalid syntax

This last example resembles the concept of the anonymous function syntax chosen in Lua. First a look on how a normal function is defined in Lua. Its not that different from the Python version.

> function f1(x, y)
>>   return x + y
>> end
> print( f1(1, 2) )
3

Like in Python, functions are first class objects in Lua and thus also supports aliasing functions.

> f = f1
> print( f(1, 2) )
3

The syntax for anonymous function in Lua differs not much for how normal functions are defined. The function name is omitted (hence anonymous) and secondly the function definition is wrapped in parentheses.

> f2 = (function(x, y)
>>   return x + y 
>> end)
> print( f2(1, 2) )
3
> -- Or as one-liner if preferred
> f2 = (function(x, y) return x + y end)
> print( f2(1, 2) )
3

In Lua a function is a function and defined as such – being anonymous or not. I think this approach is more elegant that using a dedicated lambda keyword.

Posted in Lua, Programming, Python | 1 Comment »

What GUI Toolkit To Use?

2008-02-29 by monzool

GUI PROGRAMMING IS not something I’ve done in quite a while. At work I do embedded programming and that’s mainly also what I’ve been doing for my own personal projects. Except for some small utility applications I really haven’t done large GUI projects since MFC 6.0 was cool (if such a time ever was) ]:->.

An absolute requirement is that the end result must be multi platform capable (Linux, BSD, Mac OS X and Windows). Plenty of frameworks and toolkits exists that fulfill that requirement, but I find that the Kde/Qt constallation is the most exiting and complete toolkit(s) around – especially given the multi platform perspective introduced by Kde 4. Although Kde 4 is not quite stable yet, I think the choice is wise in a longterm perspective.

I primarily do C/C++ programming (and a bit of Lua scripting), but I really would like to extend my horizon (or more precise raise above n00b level) in other programming languages like C# and Python. Given that I have much to learn about Kde, GUI’s and what else is hot in the desktop programming world, the option of C# is not a mandatory requirement. I could settle on a C++ and Python solution.

Mixing two complete different kinds of languages (static and dynamic) requires either good binding layers or Mono. The Kde Project provides a large suite of binders in the KdeBindings package. The README contains a concise description of the project contents:

This package contains:
* working:
  * korundum: KDE bindings for ruby
  * qtruby: Qt bindings for Ruby
  * smoke: Language independent library for Qt and KDE bindings. Used by QtRuby
    and PerlQt.
  * kalyptus: a header parser and bindings generator for Qt/KDE. Used for
    Smoke, Java, C# and KSVG bindings generation at present.
  * ruby/krossruby and python/krosspython which are plugins for the kdelibs/kross
    scripting framework to provide scripting with python+ruby to applications.
  * PyKDE: KDE bindings for python, requires PyQt from riverbankcomputing.co.uk
  * Qyoto: Qt bindings for C#
  * Kimono: KDE bindings for C#

The Mono project seems to be somewhat controversial. A lot of writing has being going on lately on Mono vs. Novell/Microsoft vs. freedom. Anders Hejlsberg and his team have created both clever and interesting stuff in the .NET architecture like C#, DTS (Common Type System) and CLR (Common Language Specification), but I can’t appreciate embracing other closed proprietary technologies from the .NET portfolio, when other alternatives exist in the FOSS community. I think Robert Devi summed it up nice in the Osnews.com comments (the personal rantings of Robert on Mono speed/memory, Amarok etc. I don’t agree on).

As far as I can tell, the above observations give me the following constructs:

  1. C++ + Kross + PyQt + PyKde(*1).
  2. Mono + C# + limited managed C++ + Qyoto/Kimono + IronPython.

*1: PyKde is not released for Kde 4 at present time.

Its a difficult decision whether to choose the one or the other construct. Because I have already done much C++ coding (and shot a foot off more that once) and not much C# coding, I lean mostly towards the Mono solution. Unfortunately this could potentially force me to use MonoDevelop. Tried it eight months ago and tried it again yesterday; it’s still the single most unstable piece of software I ever used :-(. Hope its not the case that it only works on OpenSuse or Suse. That would not be freedom. Anyways, selecting C# would mean that the money I spent on the book Professional C#, 3rd Edition won’t go to waste.

Posted in Personal, Programming, Software | No Comments »

Incomprehensive Hexdump Man Page

2008-02-18 by monzool

THE HEXDUMP MAN page, I find, is not the best written example of an applications manual. I recently had a task of finding the addresses of filename encounters generated when a bunch of files were written to an uncompressed jffs2 partition. Normally I’ve been sticking to the simple hexdump -C <device> use, but grepping filenames from the output is not applyible because of the line breakings.

$ hexdump -C /dev/mtd0  | grep count
00092df0  63 6f 75 6e 74 31 32 2e  64 61 74 ff 19 85 e0 02  |count12.dat.....|
000bfe80  00 00 0e 0b 08 63 6f 75  6e 74 31 33 2e 64 61 74  |.....count13.dat|
000c7f10  63 6f 75 6e 74 30 39 2e  64 61 74 ff 19 85 e0 02  |count09.dat.....|
000f9a80  ff 40 62 1d f9 72 7e e3  63 6f 75 6e 74 31 31 2e  |.@b..r~.count11.|
000ffb90  63 6f 75 6e 74 30 39 2e  64 61 74 e0 02 00 00 00  |count09.dat.....|
000ffd80  0a 00 00 00 0b 0b 08 63  6f 75 6e 74 31 30 2e 64  |.......count10.d|
00115e20  bd 6e 58 e6 63 6f 75 6e  74 30 37 2e 64 61 74 ff  |.nX.count07.dat.|
0012ebe0  63 6f 75 6e 74 30 38 2e  64 61 74 ff 19 85 e0 02  |count08.dat.....|
0013fcc0  00 08 0b 08 63 6f 75 6e  74 30 37 2e 64 61 74 e0  |....count07.dat.|
0013feb0  01 00 00 00 08 00 00 00  09 0b 08 63 6f 75 6e 74  |...........count|
0014af40  fa ce 22 36 63 6f 75 6e  74 30 34 2e 64 61 74 ff  |.."6count04.dat.|
00163d00  63 6f 75 6e 74 30 35 2e  64 61 74 ff 19 85 e0 02  |count05.dat.....|
0017fbc0  00 00 00 05 0b 08 63 6f  75 6e 74 30 34 2e 64 61  |......count04.da|
0017ffb0  00 07 0b 08 63 6f 75 6e  74 30 36 2e 64 61 74 e0  |....count06.dat.|
00180070  16 b6 2c e3 32 2e ad 46  63 6f 75 6e 74 30 31 2e  |..,.2..Fcount01.|
00198e30  75 8e d7 96 63 6f 75 6e  74 30 32 2e 64 61 74 ff  |u...count02.dat.|
001b1bf0  63 6f 75 6e 74 30 33 2e  64 61 74 ff 19 85 e0 02  |count03.dat.....|
001bfb00  0b 08 63 6f 75 6e 74 30  31 2e 64 61 74 e0 02 00  |..count01.dat...|
001bfcf0  00 00 02 00 00 00 03 0b  08 63 6f 75 6e 74 30 32  |.........count02|
001bfef0  63 6f 75 6e 74 30 33 2e  64 61 74 e0 02 00 00 00  |count03.dat.....|

Wanting to hexdump to produce an output more suitable for searching, I read the hexdump man page where it is evident that hexdump provides flexible output formatting.

     -e format_string
             Specify a format string to be used for displaying data.

The short description is elaborated in a later section

   Formats
     A format string contains any number of format units, separated by white-
     space.  A format unit contains up to three items: an iteration count, a
     byte count, and a format.

Okay, three parameters of which two of them are optional. Regarding the non-optional format specifier, it must be double quoted.

     The format is required and must be surrounded by double quote (" ")
     marks. It is interpreted as a fprintf-style format string (see
     fprintf(3)) ...

Okay. Not so hard. I know fprintf syntax. So what configuration am I optionally skipping? The first parameter is iteration count.

     The iteration count is an optional positive integer, which defaults to
     one.  Each format is applied iteration count times.

So, what does the iteration count actually do? Repeat the same printout x number of times? That of course would be a daft thing to do. Not being a native English speaker, I reassured that iteration did not have any dualistic meaning unknown to me. Dictionary.com defines

it·er·a·tion

  1. 1. the act of repeating; a repetition.
  2. 2. Mathematics.
    a. Also called successive approximation. a problem-solving or computational method in which a succession of approximations, each building on the one preceding, is used to achieve a desired degree of accuracy.
    b. an instance of the use of this method.
  3. 3. Computers. a repetition of a statement or statements in a program.

Hmm, still not exactly clear on what iteration does. I’d better experiment to figure it out. Next optional parameter defines a byte count.

     The byte count is an optional positive integer.  If specified it defines
     the number of bytes to be interpreted by each iteration of the format.

Huh? Byte count of what again? Does this relate to the amount of "%c"’s and what-not I put in the mandatory part?. Experimentations will tell. The final details on the optional parameters are how to apply them.

     If an iteration count and/or a byte count is specified, a single slash
     must be placed after the iteration count and/or before the byte count to
     disambiguate them.

That would be iterations or iterations/byte_count told in many words forming an obscure sentence?

Well, feeling armed for some basic hexdump formatting, I proceeded to do some experimentations.

$ hexdump -e "0x%08x" /dev/mtd0
hexdump: bad format {0x%08x}

What?! I took another look at the examples provided by the man page.

     Display the input in perusal format:

           "%06.6_ao "  12/1 "%3_u "
           "\\t\\t" "%_p "
           "\\n"

Hmm, and I write all three lines how? Or is it three examples? Tried the top line from the example. It worked, although of course not giving me the output format desired. Cos of the nature of the input data, the generated output actually didn’t make much sense, but now a little wiser I continued experimenting.

$ hexdump -e 8/1 "0x%08x" /dev/mtd0
hexdump: bad format {8/1}

Hmm, perhaps a double qouting is required before the “optional” parameters?.

-sh-2.05b# hexdump -e "" 8/1 "0x%08x" /dev/mtd0
Segmentation fault

WTF!? Near having a fury induced head explosion I resolved to Google. Seems that I’m not the only one having a hard time decoding the ‘-e’ description (even though the kind poster states that reading the man page is understanding hexdump). The apparent proper syntax is:

$ hexdump -e ' [iterations]/[byte_count] "[format string]" '

This was not the exact syntax mentioned in the man page, but I tried.

$ hexdump -e '6/1 "0x%08x, "' -e '"\\n"' /dev/mtd0

Hurraa, it worked. Having this figured out, the only thing left was to find out what the exact functionality of the iteration and byte_count parameters were? I wasn’t fully enlightened by the output, so a few more tests should reveal the purpose of them both.

$ hexdump -e '6/1 "0x%08x, "' -e '"\\n"' /dev/mtd0
0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff,
0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff,

Hmm, six columns…

$ hexdump -e '4/1 "0x%08x, "' -e '"\\n"' /dev/mtd0
0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff,
0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff,
0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff,
0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff,

Aha, so… iterations equals columns. Still not figured out the byte_count parameter though.

$ hexdump -e '6/2 "0x%08x, "' -e '"\\n"' /dev/mtd0
0x0000ffff, 0x0000ffff, 0x0000ffff, 0x0000ffff, 0x0000ffff, 0x0000ffff,
0x0000ffff, 0x0000ffff, 0x0000ffff, 0x0000ffff, 0x0000ffff, 0x0000ffff,

Perhaps the partial empty (erased) flash section was not the best example to learn from, so I created a file repeating the numbers 00 to 09.

$ hexdump -e '6/1 "0x%08x, "' -e '"\\n"' count.hex
0x00000000, 0x00000001, 0x00000002, 0x00000003, 0x00000004, 0x00000005,
0x00000006, 0x00000007, 0x00000008, 0x00000009, 0x00000000, 0x00000001,
0x00000002, 0x00000003, 0x00000004, 0x00000005, 0x00000006, 0x00000007,
0x00000008, 0x00000009, 0x00000000, 0x00000001, 0x00000002, 0x00000003,
$ hexdump -e '6/2 "0x%08x, "' -e '"\\n"' count.hex
0x00000100, 0x00000302, 0x00000504, 0x00000706, 0x00000908, 0x00000100,
0x00000302, 0x00000504, 0x00000706, 0x00000908, 0x00000100, 0x00000302,
0x00000504, 0x00000706, 0x00000908, 0x00000100, 0x00000302, 0x00000504,
0x00000706, 0x00000908, 0x00000100, 0x00000302, 0x00000504, 0x00000706,

Aha, guess that(?) would fit the byte count description…

Having finally decoded the man page I set on to find a proper output. After some unsuccessful attempts, I googled for a hint to a solution. Eventually I found some indications that, to get the desired formatting, I should utilize some of the non-fprintf formatting options provided by hexdump. More man page decoding? No fucking way! Enough of this shit!

Having wasted precious work time, I abandoned hexdump and put together a little Lua script that would do the hex dumping and format the output to fit my requirements.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
#!lua
--[[  Hex dump utility
      usage:   lua xdex.lua pattern file
 
      example: lua xdex.lua "count%d%d.dat" file.dat
--]]
 
local debug = false
 
-- http://lua-users.org/wiki/LuaPrintf
printf = function(s,...)
           return io.write( s:format(...) )
         end -- function
 
local f = assert(io.open(arg[2], "rb"))
local data = f:read("*all")
 
--
-- Locate offsets of all pattern matching items
--
local offset_begin, offset_end = 0, 0
local items = {}
local index = 1
 
repeat
  offset_begin, offset_end = string.find( data, arg[1], offset_begin+1 )
  if offset_begin == nil then
    break
  end
  items[index] = { offset_begin, offset_end }
  index = index+1
  if debug then printf("%08xh - %08xh\n", offset_begin, offset_end) end
until ( offset_begin == nil )
items[index] = { nil, nil } -- Terminate
 
--
--    Hexdump alike printing of results
--    (Inspired from test/xd.lua in lua5.1 distribution)
index = 1
local offset = 0
while true do
  local s = string.sub( data, offset+1, offset+16 )
  if s == nil or items[index][1] == nil then
    return
  end
 
  if (offset+16) >= items[index][1] then
    io.write( string.format("%08x  ", offset) )
    string.gsub( s,"(.)",
        function (c) io.write( string.format("%02x ",string.byte(c)) ) end )
    io.write( string.rep(" ", 3*(16-string.len(s))) )
    io.write( " ", string.gsub(s,"%c","."), "\n" )
 
    if (offset+16) >= items[index][2] then
      index = index+1
    end
  end
  offset=offset+16
end

The output from the above script correctly finds 26 encounters of the input pattern, where the original grepping on the hexdump output would only discover 20 encounters.

$ xdex.lua "count%d%d*[.]dat" /dev/mtd0
00092df0  63 6f 75 6e 74 31 32 2e 64 61 74 ff 19 85 e0 02  count12.datÿ.à.
000abba0  0b 08 00 00 03 f0 42 2c 83 b2 2d 83 63 6f 75 6e  .....ðB,²-coun
000abbb0  74 31 33 2e 64 61 74 ff 19 85 e0 02 00 00 10 44  t13.datÿ.à....D
000bfc80  00 00 00 01 00 00 00 0c 00 00 00 0d 0b 08 63 6f  ..............co
000bfc90  75 6e 74 31 32 2e 64 61 74 e0 02 00 00 00 0d 00  unt12.datà......
000bfe80  00 00 0e 0b 08 63 6f 75 6e 74 31 33 2e 64 61 74  .....count13.dat
000c7f10  63 6f 75 6e 74 30 39 2e 64 61 74 ff 19 85 e0 02  count09.datÿ.à.
000e0cc0  0b 08 00 00 61 f7 3b 03 c4 12 57 53 63 6f 75 6e  ....a÷;.Ä.WScoun
000e0cd0  74 31 30 2e 64 61 74 ff 19 85 e0 02 00 00 10 44  t10.datÿ.à....D
000f9a80  ff 40 62 1d f9 72 7e e3 63 6f 75 6e 74 31 31 2e  ÿ@b.ùr~ãcount11.
000f9a90  64 61 74 ff 19 85 e0 02 00 00 10 44 ee 2d 30 6f  datÿ.à....Dî-0o
000ffb90  63 6f 75 6e 74 30 39 2e 64 61 74 e0 02 00 00 00  count09.datà....
000ffd80  0a 00 00 00 0b 0b 08 63 6f 75 6e 74 31 30 2e 64  .......count10.d
000ffd90  61 74 e0 02 00 00 00 0b 00 00 00 02 00 02 0c d8  atà............Ø
000fff70  00 00 00 01 00 00 00 0b 00 00 00 0c 0b 08 63 6f  ..............co
000fff80  75 6e 74 31 31 2e 64 61 74 e0 02 00 00 00 0c 00  unt11.datà......
00115e20  bd 6e 58 e6 63 6f 75 6e 74 30 37 2e 64 61 74 ff  ½nXæcount07.datÿ
0012ebe0  63 6f 75 6e 74 30 38 2e 64 61 74 ff 19 85 e0 02  count08.datÿ.à.
0013fcc0  00 08 0b 08 63 6f 75 6e 74 30 37 2e 64 61 74 e0  ....count07.datà
0013feb0  01 00 00 00 08 00 00 00 09 0b 08 63 6f 75 6e 74  ...........count
0013fec0  30 38 2e 64 61 74 e0 02 00 00 00 09 00 00 00 02  08.datà.........
0014af40  fa ce 22 36 63 6f 75 6e 74 30 34 2e 64 61 74 ff  úÎ"6count04.datÿ
00163d00  63 6f 75 6e 74 30 35 2e 64 61 74 ff 19 85 e0 02  count05.datÿ.à.
0017cab0  0b 08 00 00 b7 fd 21 e2 80 0e 71 56 63 6f 75 6e  ....·ý!â.qVcoun
0017cac0  74 30 36 2e 64 61 74 ff 19 85 e0 02 00 00 10 44  t06.datÿ.à....D
0017fbc0  00 00 00 05 0b 08 63 6f 75 6e 74 30 34 2e 64 61  ......count04.da
0017fbd0  74 e0 02 00 00 00 05 00 00 00 02 00 00 af 50 00  tà...........¯P.
0017fdb0  00 00 01 00 00 00 05 00 00 00 06 0b 08 63 6f 75  .............cou
0017fdc0  6e 74 30 35 2e 64 61 74 e0 02 00 00 00 06 00 00  nt05.datà.......
0017ffb0  00 07 0b 08 63 6f 75 6e 74 30 36 2e 64 61 74 e0  ....count06.datà
00180070  16 b6 2c e3 32 2e ad 46 63 6f 75 6e 74 30 31 2e  .¶,ã2.­Fcount01.
00180080  64 61 74 ff 19 85 e0 02 00 00 10 44 ee 2d 30 6f  datÿ.à....Dî-0o
00198e30  75 8e d7 96 63 6f 75 6e 74 30 32 2e 64 61 74 ff  u×count02.datÿ
001b1bf0  63 6f 75 6e 74 30 33 2e 64 61 74 ff 19 85 e0 02  count03.datÿ.à.
001bfb00  0b 08 63 6f 75 6e 74 30 31 2e 64 61 74 e0 02 00  ..count01.datà..
001bfcf0  00 00 02 00 00 00 03 0b 08 63 6f 75 6e 74 30 32  .........count02
001bfd00  2e 64 61 74 e0 02 00 00 00 03 00 00 00 02 00 01  .datà...........
001bfef0  63 6f 75 6e 74 30 33 2e 64 61 74 e0 02 00 00 00  count03.datà....

Thank you Lua, thou truly are a light in the darkness.

Posted in Lua, Programming, Rant, Software | 3 Comments »

C++ Function Hidden, Not Overloaded Nor Overridden

2008-02-12 by monzool

THE C++ INHERITANCE model can be unintuitive some times – or perhaps more correctly, its easy to get tricked by C++ in some circumstances.

When having an existing code base the need sometimes comes up, that an often used class is needed with a few additional features. Not to mess with any of existing code, a new class is created deriving from the original class. No functionality can be extended without breaking any existing code using the original code – sweet. But care is advised, or one might inadvertently step on a landmine.

The example below is a snippet from a boat class that provides a member for setting the speed of the boat. Positive values indicates forward sailing while negative values is backward sailing. Given the task of producing a super fast race boat a new class FastBoat is derived so that unrealistically high speeds can be executed.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
class Boat
{
  public:
          virtual ~Boat() {;}
          virtual void Speed(int speed)
          {
            std::cout << "Speed of boat: " << speed << std::endl;
          }
};
 
class FastBoat: public Boat
{
  public:
          virtual ~FastBoat() {;}
 
          virtual void Speed(unsigned int speed)
          {
            std::cout << "Speed of fast boat: " << speed << std::endl;
          }
};
 
 
int main(int argc, char *argv[])
{
  std::cout << "Maneuvring the boat\n" << std::endl;
 
  const unsigned int ForwardKnots  = 22;
  const int          BackwardKnots = -3;
 
  FastBoat fastBoat;
  fastBoat.Speed(ForwardKnots);
  fastBoat.Speed(BackwardKnots);
 
  return EXIT_SUCCESS;
}

Glancing at the code one might be convinced that all is fine an dandy. Setting unsigned speeds on a FastBoat would trigger the FastBoat object, while signed speeds would propagate to the base Boat object.


This, however, is the output produced by the example code:

Speed of fast boat: 22
Speed of fast boat: 4294967293

Wanting to reverse the boat at mere 3 knots, the boat is sent forward at cartoon-fast speed. So what just happened?. Well, the Boat::Speed function was not called, and instead the BackwardKnots value was casted to fit the FastBoat::Speed function. This is because function overload resolution does not cross inheritance boundaries – that is, not by default. For the above code to work as intended, the hidden function from the base class must be brought into scope.

11
12
13
14
15
16
17
18
19
20
class FastBoat: public Boat
{
  public:
          virtual ~FastBoat() {;}
          using Boat::Speed;    // Bring Boat::String function into scope
          virtual void Speed(unsigned int speed)
          {
            std::cout << "Speed of fast boat: " << speed << std::endl;
          }
};

Daring another attempt to test the program, the result now is as intended.

Speed of fast boat: 22
Speed of fast boat: -3

The using directive has brought the base class function into the namespace scope and is thus called correctly as it is no longer hidden.

Its an easy mistake to make and thats surely why other languages (e.g. D and C#) have introduced keyword for explicitly specifying what action intended. If using GCC compiling with the -Woverloaded-virtual options is recommended for catching these kind of mistakes.

Posted in C++, Programming | No Comments »

Help On Python Regular Expressions

2007-10-15 by monzool

REGULAR EXPRESSIONS ARE a powerful friend, but the friendship doesn’t come easy. Regular expressions can be somewhat baffling getting a grasp on, but when finally understood, the possibilities are almost endless.

When developing the searching expression used in HTML Parsing With Beautiful Soup I realized that my regular expression knowledge had gotten a bit rusty. Fortunately I had double-up on the luck. 1) It was a Python program, hence the Python shell was available. 2) I found David Mertz’s book Text Processing in Python.

The Python shell makes it easy to experiment and tweak any regular expressions on the fly, but the downside is that its not easy to visually evaluate the outcome of your current expression. David’s book helped two folds. It has extensive theory on Python regular expression syntax, but most superhero-like is the small function provided, that makes it possible to see the outcome of an evaluated expression.

# Credits: David Mertz
def re_show(pat, s):
    print re.compile( pat, re.M ).sub( "{\g<0>}", s.rstrip() ), '\n'

Using regular expressions in Python requires importing of the regular expression libirary.

import re

If using the Python shell just enter the same in the shell prompt. The function by David Mertz can also be typed directly into the shell

>>> import re
>>> def re_show(pat, s):
...    print re.compile( pat, re.M ).sub( "{\g<0>}", s.rstrip() ), '\n'
>>>

The re_show wrapper displays the source and emphasizes the result of the expression, as being the contents between the ‘{‘ and ‘}’ pair.

Next is creation of some example text on which to experiment.

>>> s = 'if (Hulk.color != "green"): print "Grey Hulk"'

Now the experiments can begin. The following searches for everything between the first ‘(‘ to the last ‘)’.

>>> re_show(r'\(.*\)', s)

Result:

if{(Hulk.color != "green")}: print "Grey Hulk"

Another example could be an case-insensitive match on the colors of Hulk.

>>> re_show(r'(?i)green|(?i)grey', s)

Result:

if (Hulk.color != "{green}"): print "{Grey}Hulk"

This is just at minuscule introduction to the powers of regular expressions. If your into regular expressions in Python, I highly recommend to buying the book – or donate and read it online.

Posted in Programming, Python, Regular Expressions | 1 Comment »

HTML Parsing With Beautiful Soup

2007-10-15 by monzool

BEAUTIFUL SOUP IS an HTML/XML parser written in Python. Beautiful Soup excels as an easy to use parser that requires no knowledge of actual parsing theory and techniques. And thanks to the excellent documentation with many code examples, it is easy to fabricate some working code very quickly.

On Debian, Beautiful Soup can be install via apt-get / aptitude:
aptitude install python-beautifulsoup

The example below extracts the hit counter from this very page. Note that this is perhaps not the best example in the world (the only parse value used is the “footer” section), but it does exemplifies how easily the process of extracting data from a HTML page can be done when utilizing the Beautiful Soup parser.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#!/usr/bin/env python
# coding=utf-8
 
from BeautifulSoup import BeautifulSoup          # For processing HTML
import urllib2                                   # URL tools
import re                                        # Regular expressions
 
def FindHits(proxyUrl):
    # URL to HTML parse
    url = 'http://monzool.net/blog/index.php'
 
    if len(proxyUrl) > 0:
        # Proxy set up
        proxy = urllib2.ProxyHandler( {'http': proxyUrl} )
 
        # Create an URL opener utilizing proxy
        opener = urllib2.build_opener( proxy )
        urllib2.install_opener( opener )
 
        # Aquire data from URL
        request = urllib2.Request( url )
        response = urllib2.urlopen( request )
    else:
        # Aquire data from URL
        response = urllib2.urlopen( url )
 
    # Extract data as HTML data
    html = response.read()
 
    # Parse HTML data
    soup = BeautifulSoup( html )
 
    # Search requested page for <div> section with id="footer"
    # (The result is returned in unicode)
    footer = soup.findAll( 'div', id="footer" )
 
    # Hint: on this site, it is known that only a single "footer" section
    # exists, and that the hit counter resides in that same section
 
    # Search for the frase "Hits=<some number>"
    pattern = re.compile( r'Hits=.*[0-9]' )
    items = re.findall( pattern, str(footer[0]) )
 
    # Print result
    print items[0]        # -> "Hits=<count>"
 
 
if __name__ == "__main__":
    print "Processing..."
    FindHits("")          # Supply proxy if required. 
                          # FindHist("http://<proxyname>:<port>")

Explanation: If connecting to the internet through a proxy, some additional setup must be done to urllib2. Although urllib2 do provide some automatic proxy configuration detection, but here the configuration is made explicitly.

When the URL is opened the HTML is feed to the Beautiful Soup parser. Here after the member call findAll is used for finding the HTML div section identified as “footer” (<div id="footer">). As noted, no further parsing is done, as this page on contains only one footer section, but otherwise Beautiful Soup provides functions like findAllNext and findNextSiblings to iterate through the parse tree (Beautiful Soup is unicode aware, but not using it in this example, so converting the found section to ascii before inputting it to findall).

The resulting output from the search is the hit counter is extracted from this page.

Posted in Programming, Python | 1 Comment »

C++ Contructor/Destructor Call Order

2007-09-19 by monzool

WHEN BEGINNING C++ programming I’ve experienced people to have trouble remembering the correct construction and destruction call order. Personally I conquered this using a common memory technique – creating a story.

Here are two stories that might help in remembering. The stories might seem quite long, but don’t fear, it’s not necessary to remember the stories word by word. Btw. the storytelling does not fly 100[%], but they are good enough… at least for me ;-)

Story #1: The Skyscraper

The skyscraper story is exemplified from this simple class structure:

1
2
3
4
5
6
7
8
9
class Base
{
  Base() { cout << "Base" << endl; }
};
 
class Derived : public Base
{
  Derived() { cout << "Derived" << endl; }
};

A couple of prerequisites are necessary:

1) Imagine the UML class diagram of the above turned upside down.

Class diagram turned upside down

2) Imagine that the each class in the upside down diagram represents a floor in a skyscraper.

Class diagram is a building

The Base is the foundation and all Derived objects are floors that builds upon that foundation.

Construction
Start constructing here --->

So to construct a skyscraper the building process must be Base first, then Derived next, as buildings (usually) are build from bottom and upwards.

When destructing the destruction order is the same as dismantling a building, top to bottom.

Start destructing here --->
Destruction

So what about the special case of destruction when the object is polymorphic?

When allocated is looks like its a building with only a base level; even though we know its a building two storages high.

10
Base* pBase = new Derived;

The problem here is that the demolition team only have access to the base level and when destroying the building, disaster will happen…

11
delete pBase;

As the building is two storages high, it will collapse when the supporting foundation is removed first (allegorizing a bad situation as the Derived object would not get destroyed when only deleting the base object).

So how to fix this situation? You provide the demolition team with an elevator. The “elevator” is a special demolition model called virtual.

12
13
14
15
16
class Base
{
  Base() { cout << "Base" << endl; }
  virtual ~Base() { cout << "~Base" << endl; }  // Virtual destructor
};

As the building is equipped with an elevator the demolition team can escalate to the top of the building and begin the destruction from top to bottom and get everything removed properly.

Story #2: File Manipulation

The second story relate the base/derived situation to file contents manipulation.

A file must be opened before it can be closed, and if opened it must be closed again at some point. Thus it make sense to create a class that opens the file in the constructor, and closes the file again in the destructor.

1
2
3
4
5
class FileAccess
{
  FileAccess() { cout << "Open file..." << endl; }
  ~FileAccess() { cout << "Close file..." << endl; }
};

Read and write operations are similar functionality (transfer data, but in opposite direction) and thus it makes sense to collect this functionality in one class. As the goal is to modify the contents of a file, the reading of the existing file content can be placed in the constructor, and writing of the modified content in the destructor.

6
7
8
9
10
11
12
class FileManipulate
{
  FileManipulate() { cout << "Read from file..." << endl; }
  ~FileManipulate() { cout << "Write to file..." << endl; }
 
  // Content manipulation functions follows here...
};

Two classes are now at hand. One that opens and closes a file, and one that reads the contents of the file and writes it back to the file.

A prerequisite of reading from a file or writing to a file is that the file is open. Therefore the basic but essential functionality of opening and closing is made the base class (FileAccess). The more advanced and flexible functionality of reading and writing is then made in the derived class (FileManipulate).

11
12
13
14
15
16
17
18
19
20
21
class Base /* FileAccess */
{
  Base() { cout << "Open file..." << endl; }
  ~Base() { cout << "Close file..." << endl; }
};
 
class Derived /* FileManipulate */ : public Base /* FileAccess */
{
  Derived() { cout << "Read from file..." << endl; }
  ~Derived() { cout << "Write to file..." << endl; }
};

Thus every time an object is created of the derived class for modifying a files contents, it automagically also inherits the capabilities of opening and closing files. And as a file cannot be read from before the file is opened, it can be remembered that the base constructor must be called before the derived constructor. Equally it cannot be written to the file after the file is closed, thus is can be remembered that the derived destructor is called before the base destructor.

Posted in C++, Programming | No Comments »

« Previous Entries