Friday, December 25, 2009

Burrows–Wheeler transform

Authors: Michael Burrows and David Wheeler
As some of you might noticed, I haven't posted in quite some time, this is because I'm working on a secret compression algorithm, but I'm willing to share some knowledge about how compression works and what I've implemented or discovered.
I'm writing this post about Burrows–Wheeler transform(a.K.a. block sorting) because it really really really shaked me about how it works.
This block sorting is done before compression so that a lot of repeating bytes/characters in data will get one-after-another, other methods are also used after BWT, like Move-To-Front(which I'm saving for another post... sorry).
More information about history and authors can be found on Wikipedia.
The idea is that you take some data(let's say 256 bytes) encode it and your output should be 257 bytes, why?
input = 256 bytes -encode-> output = 256 bytes + 1 byte for index.
As you read this post you will understand that the more data you encode the bigger the index is but the more memory you need(but not necessarily).
BWT Encoding
Definition: take the data that you wish to encode, create a matrix with first row being the data, for each byte/character in data add a new row with first byte/character to be the second character in precedent row until you have a square matrix(number of columns = number of rows), i.e.
data: "abcde"
constructing the matrix:
abcde <-- the data !! bcdea <-- shifted row from original data(precedent) cdeab <-- shifted row from precedent row deabc eabcd -------- stop! no. columns = no. rows!! -------- Now sort the first column of the matrix in alphabetical order(along with it's rows), in the above example there's no need for sorting! now search in the matrix for the row(first row = ZERO as index) that matches your original data and that row number will tell you the index with which you can reconstruct the original data, in the above example the INDEX = 0 (ZERO). The output is the last column of the matrix "eabcd" and index = 0.
Decoding
The extraordinary thing about this encoding is the decoding, it's out of the box!
So we know the last column of the matrix and the index, we know that first column of the matrix is the encoded data in alphabetical order.
WARNING!! In order to decode you do not need to reconstruct the entire matrix, you just need to sort the encoded data in alphabetical order and put the sorted column next to the encoded one and you get:
a _ _ _ e
b _ _ _ a
c _ _ _ b
d _ _ _ c
e _ _ _ d
Now assign indexes to the left column starting from row zero = 0 and increment it by 1 for each row and you get:
0 a _ _ _ e
1 b _ _ _ a
2 c _ _ _ b
3 d _ _ _ c
4 e _ _ _ d
at this point you just need to assign indexes to the last column so that the index of char "a" from first column will be assigned to the char "a" from last column like so:
0 a _ _ _ e 4
1 b _ _ _ a 0
2 c _ _ _ b 1
3 d _ _ _ c 2
4 e _ _ _ d 3
Now we know:
- first column
- last column
- the index
- the first and last char of the original data "a" and "e"
Decoding is done by starting from last column where the char's index is equal to the index you received along with encoded data.
So index = 0(which is "a", we already know the first char) we go along the row to the first column and we find out that the 2nd char is "b"(1 b <-- a 0), now we know the 2nd char and the index = 1, we go to the last column and where index is 1 and go along the row from right to left and we get the 3rd char "c" along with a new index which is 2, again we go to the last column where index = 3 and go along the row from right to left and find out that the 4th char is "d", again we get a new index which is 3, go to the last column find the the row where index is 3, go along it to the left and find the new index which is not important because we already know the last char but just for the sake of the theory we got the last char which is "e", voia! you have decoded the message, of course the algorithm can be improved by using indexes directly, but first you must understand how it works! Now a real example of the power of BWT. Let's presume you wish to encode "i can what you can", create the matrix:

i can what you can <-- original data can what you cani can what you cani an what you cani c n what you cani ca what you cani can what you cani can hat you cani can w at you cani can wh t you cani can wha you cani can what you cani can what ou cani can what y u cani can what yo cani can what you cani can what you ani can what you c ni can what you ca





After creating the matrix, you have to sort it by first column of each row, in order of appearance of each character that is repeating and the sort result is
can what you cani
 what you cani can
 you cani can what
 cani can what you
an what you cani c
at you cani can wh
ani can what you c
can what you cani 
cani can what you 
hat you cani can w
i can what you can
n what you cani ca
ni can what you ca
ou cani can what y
t you cani can wha
u cani can what yo
what you cani can 
you cani can what

the output is " ntuchc wnaayao "(last two chars of out put is two spaces, copy-paste it" and index = 10(counting first row as 0).
Problem, reconstruct the initial data based on the small example presented above with matrix sorted, index known and everything in front of you, to help you even more I make the indexes also here there are:
0  |  can what you cani | 10
1  |  what you cani can | 11
2  |  you cani can what | 14
3  |  cani can what you | 15
4  | an what you cani c | 7
5  | at you cani can wh | 9
6  | ani can what you c | 8
7  | can what you cani  | 0
8  | cani can what you  | 1
9  | hat you cani can w | 16
10 | i can what you can | 12
11 | n what you cani ca | 4
12 | ni can what you ca | 5
13 | ou cani can what y | 17
14 | t you cani can wha | 6
15 | u cani can what yo | 13
16 | what you cani can  | 2
17 | you cani can what  | 3

Have fun and drop me a comment if you like the post or the sort.

Wednesday, December 16, 2009

Understanding arrays

One of the most hard-to-understand thing for a beginner in programming(from my point of view) is the array.
If you're a "nub" in programming and arrays are making fun of you, then you should continue reading as I walk you to the bright side of arrays.
Consider a glass of milk as a element of a array and 2 glasses of milk as a array of milk glass, in Delphi code you can define as
// define a new type as record called TMilkGlass
type TMilkGlass = record
                // some info about the milk glass
                Height,
                Quantity : integer;
     end;

// define the array of TMilkGlass defined above
type TArrayOfMilkGlass = array of TMilkGlass;
When you define a array of some type, you tell Delphi's compiler how much memory to allocate when the array size is increased(when you add a new element) or how much memory to free when array size is decreased(when you delete a element).
A way simpler example of the above example would be:
- the milk glass

- the array of milk glass

Seeing the above array of milk glasses, we can observe that we have 6 milk glasses we can define a array like so
type TArrayOfMilkGlass = array[0..5] of TMilkGlass;
When you define a new array of type and you DO NOT mention a range like in above example [0..5] this type of array is called "dynamic array", a array with range i.e. [0..5] is called "static array".
At this time your eyes should be opened(if not, keep trying, don't give up!), but remember:
- dynamic arrays: when you define a dynamic array you must set the length of the array before accessing a element, otherwise you will get a Access Violator :), use SetLength - in order to set the length of the array, Length - to get the length of the array, Low - to get the first element, High - to get the last element.
example of fooling around with a dynamic array
var x : array of integer;
begin
     // allocate space in array for 50 integers
     SetLength(x, 50);
     // get the length of array
     ShowMessage(IntToStr(Length(x)));
     // show as message the first element from array
     ShowMessage(IntToStr(x[Low(x)]));
     // show as message the last element from array
     ShowMessage(IntToStr(x[High(x)]));
     // show as message the starting index of array
     // keeping in consideration that x could have
     // been defined as:
     // x: array[-5..20] of integer;
     ShowMessage(IntToStr(Low(x)));
     // show as message the ending index of array
     ShowMessage(IntToStr(High(x)));
     // free allocated space for x
     SetLength(x, 0);
end;
Note: that arrays that contain objects|classes, needs you to handle the creation and freeing of the objects, array will NOT create or free objects by itself.
- static arrays: Note: that you cannot set the length of a static array at runtime! this type of array is mostly used when you need to define a list of TYPE at design time that would hold some kind of data with a fixed size.
If you have any questions, don't hesitate to ask.
See also all post with array label.

Sunday, December 13, 2009

Bitmap flip and mirror

Yesterday I and a friend of mine needed a bitmap flip(horizontally) algorithm that should be extremely fast. As most of you might know that ScanLine is the fastest approach in manipulating bitmaps, we browsed the net for a already cooked one so we don't waste time.
Well we found some good some bad but at the end we came to the conclusion that we should invest 10-15 min. to create a faster one. The algorithm that my friend created is in procedure FlipHorizontal, I modified the code to be more convenient but the actual definition was procedure FlipBitmap(var thisBitmap: TBitmap).
Anyways here is a class helper which gives 4 handy procedures to a TBitmap.
Note: in order to use the bellow code you must copy-paste all code in the unit you wish to use the flips or mirrors OR add uDGBitmap to uses clause and access directly like BitmapVariable.FlipVertical|FlipHorizontal|MirrorVertical|MirrorHorizontal
unit uDGBitmap;

interface

uses SysUtils,
     Windows,
     Graphics;

// bitmap max width = 32768 pixels
const CMAX_BITMAP_WIDTH = 1024 * 32;

// define a static array of TRGBTriple
type TArrayOfPixel = array[0..CMAX_BITMAP_WIDTH -1] of TRGBTriple;
     PArrayOfPixel = ^TArrayOfPixel;

type TDGBitmap = class Helper for TBitmap
     public
           // define public procedures
           procedure FlipHorizontal;
           procedure FlipVertical;
           procedure MirrorHorizontal;
           procedure MirrorVertical;
     end;

implementation

{ TDGBitmap }

procedure TDGBitmap.FlipHorizontal;
var y, x : Integer;
    Row : PArrayOfPixel;
    tmp : TRGBTriple;
begin
     for y := 0 to Self.Height -1 do begin
         // scan each line starting from top
         Row := Self.ScanLine[y];
         // from left side of the bitmap to half it's size
         for x := 0 to Self.Width div 2 do begin
             // replace left side with right side of the bitmap
             // this is the magical flip
             tmp := Row[x];
             Row[x] := Row[Self.Width -x -1];
             Row[Self.Width -x -1] := tmp;
         end;// for x := 0 to Self.Width div 2 do begin
     end;// for y := 0 to Self.Height -1 do begin
end;

procedure TDGBitmap.FlipVertical;
var i, y : integer;
    RowSrc, RowDest : PArrayOfPixel;
    // temporary bitmap on which we draw the
    // flipped bitmap
    tmpBitmap : TBitmap;
begin
     // create a temporary bitmap
     tmpBitmap := TBitmap.Create;
     // assign self to it
     tmpBitmap.Assign(Self);
     // scan each line starting from top
     for y := 0 to Self.Height -1 do begin
         // RowSrc is current line from self
         RowSrc := Self.ScanLine[y];;
         // RowDest is current line from tmpBitmap
         RowDest := tmpBitmap.ScanLine[Self.Height -1 -y];
         // copy each pixel from RowSrc to RowDest
         for i := 0 to Self.Width -1 do
             RowDest[i] := RowSrc[i];
     end;// for y := 0 to Self.Height -1 do begin
     // copy all data from tmpBitmap to self
     Self.Assign(tmpBitmap);
     // free allocated memory
     FreeAndNil(tmpBitmap);
end;

procedure TDGBitmap.MirrorHorizontal;
var tmpBmp : TBitmap;
    Left : integer;
begin
     // create a temporary bitmap
     tmpBmp := TBitmap.Create;
     // assign data from self
     tmpBmp.Assign(Self);
     // flip it horizontally
     tmpBmp.FlipHorizontal;
     Left := Self.Width;
     // increase the width of bitmap
     // by two
     Self.Width := Self.Width * 2;
     // draw the temporary bitmap
     // with x = Left and y = 0
     Self.Canvas.Draw(Left, 0, tmpBmp);
     // free allocated memory
     FreeAndNil(tmpBmp);
end;

procedure TDGBitmap.MirrorVertical;
var tmpBmp : TBitmap;
    Top : integer;
begin
     // create a temporary bitmap
     tmpBmp := TBitmap.Create;
     // assign data from self
     tmpBmp.Assign(Self);
     // flip it vertically
     tmpBmp.FlipVertical;
     Top := Self.Height;
     // increase the height of bitmap
     // by two
     Self.Height := Self.Height * 2;
     // draw the temporary bitmap
     // with x = 0 and y = Top
     Self.Canvas.Draw(0, Top, tmpBmp);
     // free allocated memory
     FreeAndNil(tmpBmp);
end;

end.

Tuesday, December 8, 2009

Turbo parser example

Here is a example of counting tokens from a file with Turbo parser.
Create a new VCL forms application, drop a TOpenDialog(rename it to OpenDialog) and a TButton on the form, double-click Button1 and copy-paste this code:
procedure TForm1.Button1Click(Sender: TObject);
// token record that holds the type and count
type TTokenRecord = record
                  Token : TurboTokenType;
                  Count : Integer;
     end;

// array of tokens
type TTokenRecordArray = array of TTokenRecord;

var Parser : TurboParser;
    Token : TurboTokenType;
    TokenArr : TTokenRecordArray;

      // this procedure tries to find "Token" type in
      // TokenArr and increment the count.
      // if "Token" type does not exist in array then
      // it creates a entry for it by increasing the
      // number of elements in "TokenArr"
      procedure CountThisToken;
      var index : Integer;
      begin
           for index := Low(TokenArr) to High(TokenArr) do
               if TokenArr[index].Token = Token then begin
                  Inc(TokenArr[index].Count);
                  Exit;
               end;// if TokenArr[i].Token = Token then begin
           SetLength(TokenArr, Length(TokenArr) +1);
           TokenArr[High(TokenArr)].Token := Token;
           TokenArr[High(TokenArr)].Count := 1;
      end;// procedure CountThisToken;

      // this procedure loops through all elements
      // in "TokenArr" and creates a formated string
      // Token "NAME" Count = VALUE + LineBreak
      // Token "NAME" Count = VALUE ...
      procedure ShowCounters;
      var s : string;
          i : integer;
      begin
           s := EmptyStr;
           for i := Low(TokenArr) to High(TokenArr) do
               s := s + Format('Token "%s" Count = %d', [
                 TurboTokenToString(TokenArr[i].Token), TokenArr[i].Count]) + #$D#$A;
           ShowMessage(s);
      end;// procedure ShowCounters;

begin
     if OpenDialog.Execute then begin
        // create a instance of TurboParser class and
        // assign the filename
        Parser := TurboParser.Create(OpenDialog.FileName);
        // start a loop
        while True do begin
              // grab next token
              Token := Parser.GetNextToken;
              // if reached EOF(End-Of-File) or internal error
              // then exit loop
              if Token in [ttEndOfFile, ttInternalError] then
                 Break;
              // find and increment the count of this token
              CountThisToken;
        end;// while True do begin
        // show a formated message with all counters
        ShowCounters;
        // free allocated memory to parser
        FreeAndNil(Parser);
        // set the length of "TokenArr" to zero
        SetLength(TokenArr, 0);
     end;// if OpenDialog.Execute then begin
end;
This is a demo file which on which I've tested the above code(wrote in 7-8 minutes, yes it's that easy to use TurboParser -- which can be easily modified and/or extended)
application TurboParserExample
{
 include
 {
  "filename.extension",
  "otherfilename.extension"
 }

 function main
 {

 }
}
and here is the message I got
---------------------------
Project1
---------------------------
Token "application" Count = 1
Token "" Count = 6
Token "{" Count = 3
Token "include" Count = 1
Token "" Count = 5
Token "." Count = 2
Token "}" Count = 3
Token "function" Count = 1

---------------------------
OK   
---------------------------

Monday, December 7, 2009

Delphi Distiller v1.82 released





New release of Delphi Distiller, currently on version 1.82.
Updates(source http://www.liteapplications.com/)
> Automatically select page for currently installed Delphi version (this feature was present in previous versions but disappeared in v1.80 by mistake).

> Undo button in packages and experts manager allows to discard changes in case that you messed up your selections.

> "Run Delphi 2010" button between "Start Weareg2" and "Stop Weareg2". This makes it easier to call the three of them in a row, since Weareg2 doesn't need to be active after D2010 has started. The "Run Delphi 2010" button saves any pending changes automatically.

MD5 hash: 3ed88cbbf6f5ca8f1f54a777ce79cc6c

Turbo parser

Oh yeah! finally finished a working parser for Turbo, I'm not 100% sure it's bug-free but I hope so :)
The parser is made out of 3 units:
- uTurboParser -- TurboParser class(based on TJvInterpreterParser)
- uTurboParserConst -- constants and variables that TurboParser needs
- uTurboUtils -- this is a work in progress
and here are comes the source code starting with
uTurboParser.pas
(*
  uTurboParser -- is part of Turbo project
  Copyright© Dorin Duminica

  Usage of this file is permited for comercial and non-comercial programs.

  NOTE: if you optimize some of the bellow functions please let me know
*)

unit uTurboParser;

interface

uses
    SysUtils,
    Classes,
    uTurboUtils,
    uTurboParserConst;

type TurboParser = class
     private
            FSource, FPCPos : PChar;
            SourceString : string;
            function GetPos: Integer;
            procedure SetPos(Value: Integer);
            procedure RaiseParserError(const Msg: string; const Args: array of TVarRec);
     public
           TokenValue : string;
     public
           procedure LoadFromFile(const FileName: string);
           function GetNextTokenString: string;
           function GetNextToken: TurboTokenType;
           function TokenAsBoolean: Boolean;
           function TokenAsChar: Char;
           function TokenAsDouble: Double;
           function TokenAsInteger: Integer;
           function TokenAsString: string;
     public
           property Position: Integer
                    read GetPos write SetPos;
     public
           constructor Create(const FileName: string = '');
           destructor Destroy; override;
     end;

implementation uses Windows, Dialogs;

{ TurboParser }

constructor TurboParser.Create(const FileName: string);
begin
     LoadFromFile(FileName);
end;

destructor TurboParser.Destroy;
begin
     SourceString := EmptyStr;
end;

function TurboParser.GetNextTokenString: string;
var
  P, F: PChar;
  F1: PChar;
  I: Integer;
  PrevPoint: Boolean;

  procedure Skip;
  begin
       case P[0] of
            '(':
                if P[1] = '*' then begin
                   F := P + 2;
                   while True do begin
                         F := StrScan(F, '*');
                         if F = nil then
                            RaiseParserError(SBadRemark, [P - PChar(FSource)]);
                         if F[1] = ')' then begin
                            Inc(F);
                            Break;
                         end;// if F[1] = ')' then begin
                         Inc(F);
                   end;// while True do begin
                   P := F + 1;
                end;// if P[1] = '*' then begin
            '*':
                if (P[1] = ')') then
                   RaiseParserError(SBadRemark, [P - PChar(FSource)]);
            '/':
                if (P[1] = '/') then
                   while not CharInSet(P[0], TURBO_LF_CR_NULL) do
                         Inc(P)
                else
                    if (P[1] = '*') then begin
                       F := P + 2;
                       while True do begin
                             F := StrScan(F, '*');
                             if F = nil then
                                RaiseParserError(SBadRemark, [P - PChar(FSource)]);
                             if F[1] = '/' then begin
                                Inc(F);
                                Break;
                             end;// if F[1] = '/' then begin
                             Inc(F);
                       end;// while True do begin
                       P := F + 1;
                    end;// if (P[1] = '*') then begin
       end;// case P[0] of
       while CharInSet(P[0], TURBO_SPACE_LF_CR_TAB) do
             Inc(P);
  end;// procedure Skip;

begin
     { New Token }
     F := FPCPos;
     P := FPCPos;
     PrevPoint := false;
     if (P > PChar(FSource)) and (P[-1] = '.') then
        PrevPoint := true;
     { Firstly skip spaces and remarks }
     repeat
           F1 := P;
           Skip;
     until F1 = P;
     F := P;
     if CharInSet(P[0], TURBO_IDFIRSTSYMBOLS) or PrevPoint and IsCharAlpha(P[0]) then begin
        { token }
        while CharInSet(P[0], TURBO_IDSYMBOLS) or PrevPoint and IsCharAlpha(P[0]) do
              Inc(P);
        SetString(Result, F, P - F);
     end else
         if CharInSet(P[0], TURBO_CONSTSYMBOLS10) then begin
            { number }
            while CharInSet(P[0], TURBO_CONSTSYMBOLS10) or (P[0] = '.') do begin
                  if (P[0] = '.') and (P[1] = '.') then
                     Break;
                  Inc(P);
            end;// while CharInSet(P[0], TURBO_CONSTSYMBOLS10) or (P[0] = '.') do begin
            SetString(Result, F, P - F);
         end else
             if ((P[0] = '$') and CharInSet(P[1], TURBO_CONSTSYMBOLS)) then begin
                { hex number }
                Inc(P);
                while CharInSet(P[0], TURBO_CONSTSYMBOLS) do
                      Inc(P);
                SetString(Result, F, P - F);
             end else
             if P[0] = '''' then begin
                { string constant }
                Inc(P);
                while not CharInSet(P[0], TURBO_LF_CR_NULL) do begin
                      if P[0] = '''' then
                         if P[1] = '''' then
                            Inc(P)
                         else
                             Break;
                      Inc(P);
                end;// while not CharInSet(P[0], TURBO_LF_CR_NULL) do begin
                Inc(P);
                SetString(Result, F, P - F);
                I := 2;
                while I < Length(Result) - 1 do begin
                      if Result[I] = '''' then
                         Delete(Result, I, 1);
                      Inc(I);
                end;// while I < Length(Result) - 1 do begin
             end else
                 if ((P[0] = '#') and CharInSet(P[1], TURBO_CONSTSYMBOLS10)) then begin
                    { Char constant }
                    Inc(P);
                    while CharInSet(P[0], TURBO_CONSTSYMBOLS10) do
                          Inc(P);
                    SetString(Result, F + 1, P - F - 1);
                    Result := '''' + Chr(StrToInt(Result)) + '''';
                 end else
                     if CharInSet(P[0], ['>', '=', '<', '.']) then begin
                        if (P[0] = '.') and (P[1] = '.') then begin
                           Result := '..';
                           Inc(P, 2);
                        end else
                            if (P[0] = '>') and (P[1] = '=') then begin
                               Result := '>=';
                               Inc(P, 2);
                            end else
                                if (P[0] = '<') and (P[1] = '=') then begin
                                   Result := '<=';
                                   Inc(P, 2);
                                end else
                                    if (P[0] = '<') and (P[1] = '>') then begin
                                       Result := '<>';
                                       Inc(P, 2);
                                    end else begin
                                        Result := P[0];
                                        Inc(P);
                                    end;
                     end else
                         if P[0] = #0 then
                            Result := EmptyStr
                         else begin
                              Result := P[0];
                              Inc(P);
                         end;
     FPCPos := P;
end;

function TurboParser.GetNextToken: TurboTokenType;
begin
     TokenValue := GetNextTokenString;
     Result := StringToTurboToken(TokenValue);
end;

function TurboParser.GetPos: Integer;
begin
     Result := FPCPos - PChar(FSource);
end;

procedure TurboParser.LoadFromFile(const FileName: string);
begin
     SourceString := FileToString(FileName);
     FSource := PChar(SourceString);
     FPCPos := PChar(FSource);
end;

procedure TurboParser.RaiseParserError(const Msg: string; const Args: array of TVarRec);
begin
     raise Exception.CreateFmt(Msg, Args);
end;

procedure TurboParser.SetPos(Value: Integer);
begin
     FPCPos := PChar(FSource) + Value;
end;

function TurboParser.TokenAsBoolean: Boolean;
begin
     Result := (StringToTurboToken(TokenValue) = ttTrue);
end;

function TurboParser.TokenAsChar: Char;
begin
     StringIsChar(TokenValue, Result);
end;

function TurboParser.TokenAsDouble: Double;
begin
     StringIsDouble(TokenValue, Result);
end;

function TurboParser.TokenAsInteger: Integer;
begin
     StringIsInteger(TokenValue, Result);
end;

function TurboParser.TokenAsString: string;
begin
     StringIsString(TokenValue, Result);
end;

end.
uTurboParserConst.pas
(*
  uTurboParserConst -- is part of Turbo project
  Copyright© Dorin Duminica

  Usage of this file is permited for comercial and non-comercial programs.

  NOTE: if you optimize some of the bellow functions please let me know
*)

unit uTurboParserConst;

interface

type TurboHash = type Integer;

type TurboTokenType =
     (
                 ttEndOfFile,
                 ttInternalError,
                 ttUnknown,
                 ttIdentifier,
                 ttApplication,
                 ttConsole,
                 ttLibrary,
                 ttInclude,
                 ttSemicolumn,
                 ttResource,
                 ttIn,
                 ttConst,
                 ttEnum,
                 ttVar,
                 ttClass,
                 ttLB,
                 ttRB,
                 ttLS,
                 ttRS,
                 ttPoint,
                 ttFunction,
                 ttBegin,
                 ttEnd,
                 ttReturn,
                 ttInitialization,
                 ttFinalization,
                 ttIf,
                 ttElse,
                 ttWhile,
                 ttDo,
                 ttSwitch,
                 ttColumn,
                 ttDefault,
                 ttTry,
                 ttExcept,
                 ttFinally,
                 ttLabel,
                 ttGoTo,
                 ttFactorial,
                 ttNot,
                 ttPower,
                 ttAnd,
                 ttOr,
                 ttXor,
                 ttShl,
                 ttShr,
                 ttMod,
                 ttIntDiv,
                 ttPercent,
                 ttMul,
                 ttDiv,
                 ttPlus,
                 ttMinus,
                 ttEqual,
                 ttNotEqual,
                 ttGreater,
                 ttLess,
                 ttGreaterOrEqual,
                 ttLessOrEqual,
                 ttAssign,
                 ttAssignMul,
                 ttAssignDiv,
                 ttAssignPlus,
                 ttAssignMinus,
                 ttTrue,
                 ttFalse,
                 ttVInteger,
                 ttVChar,
                 ttVString,
                 ttVDouble
     );

const szTurboTokenType = sizeof(TurboTokenType);

type TurboTokenRec = record
                  TokenType: TurboTokenType;
                  Value: string;
                  Hash: TurboHash;
     end;

const CTURBO_KEYWORD_COUNT = 61;

// this is a map of all available tokens in Turbo initializaed with
// hash = 0, because in the initialization section of unit
// each token will be hashed
var TurboTokens: array[0..CTURBO_KEYWORD_COUNT -1] of TurboTokenRec =
    (
     (TokenType: ttApplication    ; Value: 'application';    Hash: 0),
     (TokenType: ttConsole        ; Value: 'console';        Hash: 0),
     (TokenType: ttLibrary        ; Value: 'Library';        Hash: 0),
     (TokenType: ttInclude        ; Value: 'include';        Hash: 0),
     (TokenType: ttSemicolumn     ; Value: ';';              Hash: 0),
     (TokenType: ttResource       ; Value: 'resource';       Hash: 0),
     (TokenType: ttIn             ; Value: 'in';             Hash: 0),
     (TokenType: ttConst          ; Value: 'const';          Hash: 0),
     (TokenType: ttEnum           ; Value: 'enum';           Hash: 0),
     (TokenType: ttVar            ; Value: 'var';            Hash: 0),
     (TokenType: ttClass          ; Value: 'class';          Hash: 0),
     (TokenType: ttLB             ; Value: '(';              Hash: 0),
     (TokenType: ttRB             ; Value: ')';              Hash: 0),
     (TokenType: ttLS             ; Value: '[';              Hash: 0),
     (TokenType: ttRS             ; Value: ']';              Hash: 0),
     (TokenType: ttPoint          ; Value: '.';              Hash: 0),
     (TokenType: ttFunction       ; Value: 'function';       Hash: 0),
     (TokenType: ttBegin          ; Value: '{';              Hash: 0),
     (TokenType: ttEnd            ; Value: '}';              Hash: 0),
     (TokenType: ttReturn         ; Value: 'return';         Hash: 0),
     (TokenType: ttInitialization ; Value: 'initialization'; Hash: 0),
     (TokenType: ttFinalization   ; Value: 'finalization';   Hash: 0),
     (TokenType: ttIf             ; Value: 'if';             Hash: 0),
     (TokenType: ttElse           ; Value: 'else';           Hash: 0),
     (TokenType: ttWhile          ; Value: 'while';          Hash: 0),
     (TokenType: ttDo             ; Value: 'do';             Hash: 0),
     (TokenType: ttSwitch         ; Value: 'switch';         Hash: 0),
     (TokenType: ttColumn         ; Value: ':';              Hash: 0),
     (TokenType: ttDefault        ; Value: 'default';        Hash: 0),
     (TokenType: ttTry            ; Value: 'try';            Hash: 0),
     (TokenType: ttExcept         ; Value: 'except';         Hash: 0),
     (TokenType: ttFinally        ; Value: 'finally';        Hash: 0),
     (TokenType: ttLabel          ; Value: 'label';          Hash: 0),
     (TokenType: ttGoTo           ; Value: 'goto';           Hash: 0),
     (TokenType: ttFactorial      ; Value: '!';              Hash: 0),
     (TokenType: ttNot            ; Value: 'not';            Hash: 0),
     (TokenType: ttAnd            ; Value: 'and';            Hash: 0),
     (TokenType: ttOr             ; Value: 'or';             Hash: 0),
     (TokenType: ttXor            ; Value: 'xor';            Hash: 0),
     (TokenType: ttShl            ; Value: 'shl';            Hash: 0),
     (TokenType: ttShr            ; Value: 'shr';            Hash: 0),
     (TokenType: ttMod            ; Value: 'mod';            Hash: 0),
     (TokenType: ttIntDiv         ; Value: 'div';            Hash: 0),
     (TokenType: ttPercent        ; Value: '%';              Hash: 0),
     (TokenType: ttMul            ; Value: '*';              Hash: 0),
     (TokenType: ttDiv            ; Value: '/';              Hash: 0),
     (TokenType: ttPlus           ; Value: '+';              Hash: 0),
     (TokenType: ttMinus          ; Value: '-';              Hash: 0),
     (TokenType: ttEqual          ; Value: '=';              Hash: 0),
     (TokenType: ttNotEqual       ; Value: '<>';             Hash: 0),
     (TokenType: ttGreater        ; Value: '>';              Hash: 0),
     (TokenType: ttLess           ; Value: '<';              Hash: 0),
     (TokenType: ttGreaterOrEqual ; Value: '>=';             Hash: 0),
     (TokenType: ttLessOrEqual    ; Value: '<=';             Hash: 0),
     (TokenType: ttAssign         ; Value: ':=';             Hash: 0),
     (TokenType: ttAssignMul      ; Value: '*=';             Hash: 0),
     (TokenType: ttAssignDiv      ; Value: '/=';             Hash: 0),
     (TokenType: ttAssignPlus     ; Value: '+=';             Hash: 0),
     (TokenType: ttAssignMinus    ; Value: '-=';             Hash: 0),
     (TokenType: ttTrue           ; Value: 'true';           Hash: 0),
     (TokenType: ttFalse          ; Value: 'false';          Hash: 0)
    );

const LowTurboTokens = Low(TurboTokens);
      HighTurboTokens = High(TurboTokens);

const TURBO_UNDERSCORE      = ['_'];
      TURBO_DIGITS          = ['0'..'9'];
      TURBO_DOUBLE_CHARS    = TURBO_DIGITS + ['.'];
      TURBO_A_TO_F          = ['A'..'F', 'a'..'f'];
      TURBO_A_TO_Z          = ['A'..'Z', 'a'..'z'];
      TURBO_HEX             = TURBO_DIGITS + TURBO_A_TO_F;
      TURBO_IDSYMBOLS       = TURBO_UNDERSCORE + TURBO_DIGITS + TURBO_A_TO_Z;
      TURBO_IDFIRSTSYMBOLS  = TURBO_UNDERSCORE + TURBO_A_TO_Z;
      TURBO_CONSTSYMBOLS    = TURBO_DIGITS + TURBO_A_TO_F;
      TURBO_CONSTSYMBOLS10  = TURBO_DIGITS;
      TURBO_SEPARATORS      = ['(', ')', ',', '.', ';'];
      TURBO_CR              = #$D;
      TURBO_LF              = #$A;
      TURBO_TAB             = #9;
      TURBO_NULL            = #0;
      TURBO_SPACE           = ' ';
      TURBO_CRLF            = TURBO_CR + TURBO_LF;
      TURBO_LF_CR_TAB       = [TURBO_LF, TURBO_CR, TURBO_TAB];
      TURBO_SPACE_LF_CR_TAB = [TURBO_SPACE, TURBO_LF, TURBO_CR, TURBO_TAB];
      TURBO_LF_CR_NULL      = [TURBO_LF, TURBO_CR, TURBO_NULL];

resourcestring
              SBadRemark = 'Bad remark at position: %d';

implementation uses uTurboUtils;


procedure HashTokens;
var i : integer;
    s : string;
begin
     for i := LowTurboTokens to HighTurboTokens do begin
         s := TurboTokens[i].Value;
         TurboTokens[i].Hash := TurboHashString(s);
     end;// for i := LowTurboTokens to HighTurboTokens do begin
end;

initialization
              HashTokens;

end.
uTurboUtils.pas
(*
  uTurboUtils -- is part of Turbo project
  Copyright© Dorin Duminica

  Usage of this file is permited for comercial and non-comercial programs.

  NOTE: if you optimize some of the bellow functions please let me know
*)

unit uTurboUtils;

interface

uses SysUtils, uTurboParserConst;

function TurboHashString(const s: string): TurboHash;
function FileToString(const FileName: string): string;
function FileToStringA(const FileName: string): AnsiString;
function FileToStringU(const FileName: string): string;
function TurboTokenToString(thisToken: TurboTokenType): string;
function StringToTurboToken(const s: string): TurboTokenType;
function StringIsHex(const s: string): Boolean; overload;
function StringIsHex(const s: string; var Value: Integer): Boolean; overload;
function StringIsChar(const s: string): Boolean; overload;
function StringIsChar(const s: string; var Value: Char): Boolean; overload;
function StringIsString(const s: string): Boolean; overload;
function StringIsString(const s: string; var Value: string): Boolean; overload;
function StringIsInteger(const s: string): Boolean; overload;
function StringIsInteger(const s: string; var Value: Integer): Boolean; overload;
function StringIsDouble(const s: string): Boolean; overload;
function StringIsDouble(const s: string; var Value: Double): Boolean; overload;
function IsFileUnicode(const FileName: string): Boolean;
function IsFileUTF8(const FileName: string): Boolean;

implementation uses Classes, Windows;

function TurboHashString(const s: string): TurboHash;
var i : integer;
    c : Char;
begin
     Result := 0;
     for i := 1 to Length(s) do begin
         c := s[i];
         c := UpCase(c);
         Result := ((Result shl 7) or (Result shr 25)) + Ord(c);
     end;// for i := 1 to Length(s) do begin
end;

function FileToString(const FileName: string): string;
begin
     Result := EmptyStr;
     if not FileExists(FileName) then
        Exit;
     if IsFileUnicode(FileName) or IsFileUTF8(FileName) then
        Result := FileToStringU(FileName)
     else
         Result := String(FileToStringA(FileName));
end;

function FileToStringA(const FileName: string): AnsiString;
var FileStream : TFileStream;
begin
     Result := AnsiString(EmptyStr);
     if not FileExists(FileName) then
        Exit;
     FileStream := TFileStream.Create(FileName, fmOpenRead);
     SetLength(Result, FileStream.Size);
     FileStream.Read(Pointer(Result)^, FileStream.Size);
     FreeAndNil(FileStream);
end;

function FileToStringU(const FileName: string): string;
var FileStream : TFileStream;
begin
     Result := EmptyStr;
     if not FileExists(FileName) then
        Exit;
     FileStream := TFileStream.Create(FileName, fmOpenRead);
     SetLength(Result, FileStream.Size);
     FileStream.Read(Pointer(Result)^, FileStream.Size);
     FreeAndNil(FileStream);
end;

function TurboTokenToString(thisToken: TurboTokenType): string;
var i : integer;
begin
     Result := EmptyStr;
     for i := LowTurboTokens to HighTurboTokens do
         if TurboTokens[i].TokenType = thisToken then begin
            Result := TurboTokens[i].Value;
            Exit;
         end;// if TurboTokens[i].TokenType = thisToken then begin
end;

function StringToTurboToken(const s: string): TurboTokenType;
var i : integer;
    h : TurboHash;
begin
     if s = EmptyStr then begin
        Result := ttEndOfFile;
        Exit;
     end;// if s = EmptyStr then begin
     Result := ttUnknown;
     h := TurboHashString(s);
     for i := LowTurboTokens to HighTurboTokens do
         if TurboTokens[i].Hash = h then begin
            Result := TurboTokens[i].TokenType;
            Exit;
         end;// if TurboTokens[i].Hash = h then begin
     if StringIsHex(s) or StringIsInteger(s) then begin
        Result := ttVInteger;
        Exit;
     end;// if StringIsHex(s) or StringIsInteger(s) then begin
     if StringIsString(s) then begin
        Result := ttVString;
        Exit;
     end;// if StringIsString(s) then begin
     if StringIsChar(s) then begin
        Result := ttVChar;
        Exit;
     end;// if StringIsChar(s) then begin
     if StringIsDouble(s) then begin
        Result := ttVDouble;
        Exit;
     end;// if StringIsDouble(s) then begin
end;

function StringIsHex(const s: string): Boolean;
var i : integer;
begin
     Result := False;
     if Length(s) > 1 then begin
        if s[1] = '$' then
           for i := 2 to Length(s) do
               if not CharInSet(s[i], TURBO_HEX) then
                  Exit;
     end else
         Exit;
     Result := True;
end;

function StringIsHex(const s: string; var Value: Integer): Boolean;
begin
     Value := 0;
     Result := StringIsHex(s);
     if Result then
        Value := StrToInt(s);
end;

function StringIsChar(const s: string): Boolean;
var c : char;
begin
     Result := StringIsChar(s, c);
end;

function StringIsChar(const s: string; var Value: Char): Boolean;
const CHAR_ORDINAL = '#';
      CHAR_HEX = '#$';
var i, p, len : integer;
begin
     Result := True;
     len := Length(s);
     p := Pos(CHAR_HEX, s);
     if p = 1 then begin
        // in this case we have something like #$HEX_VALUE
        // check if remaining chars are hex
        for i := 3 to Len do
            if not CharInSet(s[i], TURBO_HEX) then
               Exit;
        Value := Char(StrToInt(Copy(s, 2, Length(s) -1)));
        Exit;
     end else begin
         // in this case we have something like #ORDINAL_VALUE
         // check if remaining chars are digits
         for i := 2 to Len do
             if not CharInSet(s[i], TURBO_DIGITS) then
                Exit;
         p := Pos(CHAR_ORDINAL, s);
         if p = 1 then begin
            Value := Char(StrToInt(Copy(s, 2, Length(s) -1)));
            Exit;
         end;// if p = 1 then begin
     end;// if p = 1 then begin
     Result := False;
end;

function StringIsString(const s: string): Boolean;
begin
     Result := (Length(s) >= 2) and ((s[1] = '"') and (s[Length(s)] = '"'));
end;

function StringIsString(const s: string; var Value: string): Boolean;
begin
     Result := StringIsString(s);
     if Result then
        Value := Copy(s, 2, Length(s) -1);
end;

function StringIsInteger(const s: string): Boolean;
var i : integer;
begin
     Result := False;
     for i := 1 to Length(s) do
         if not CharInSet(s[i], TURBO_DIGITS) then
            Exit;
     Result := True;
end;

function StringIsInteger(const s: string; var Value: Integer): Boolean;
begin
     Result := StringIsInteger(s);
     if Result then
        Value := StrToInt(s);
end;

function StringIsDouble(const s: string): Boolean;
var d : Double;
begin
     Result := StringIsDouble(s, d);
end;

function StringIsDouble(const s: string; var Value: Double): Boolean;
var i, PointCount : integer;
begin
     Result := False;
     PointCount := 0;
     for i := 1 to Length(s) do
         case s[i] of
              '.': Inc(PointCount);
              '0'..'9': ;
              else
                  Exit;
         end;// case s[i] of
     if PointCount <= 1 then begin
        Value := StrToFloat(s);
        Result := True;
        Exit;
     end;// if PointCount <= 1 then begin
end;

function IsFileUnicode(const FileName: string): Boolean;
var hFile : THandle;
    Buffer: Word;
    BytesRead: DWORD;
begin
     Result := False;
     hFile := FileOpen(FileName, FILE_SHARE_READ);
     if hFile <> INVALID_HANDLE_VALUE then
        try
           Buffer := 0;
           SetFilePointer(hFile, 0, nil, FILE_BEGIN);
           Result := ReadFile(hFile, Buffer, SizeOf(Buffer), BytesRead, nil) and
                     (BytesRead>=SizeOf(Buffer)) and
                     ((Buffer=$FEFF) or (Buffer=$FFFE));
        finally
               CloseHandle(hFile);
        end;// try...
end;

function IsFileUTF8(const FileName: string): Boolean;
var hFile : THandle;
    Buffer: packed array[0..2] of byte;
    BytesRead: DWORD;
begin
     Result := False;
     hFile := FileOpen(FileName, FILE_SHARE_READ);
     if hFile <> INVALID_HANDLE_VALUE then
        try
           FillChar(Buffer, SizeOf(Buffer), 0);
           SetFilePointer(hFile, 0, nil, FILE_BEGIN);
           Result := ReadFile(hFile, Buffer, SizeOf(Buffer), BytesRead, nil) and
                     (BytesRead>=SizeOf(Buffer)) and
                     ((Buffer[0]=$EF) and (Buffer[1]=$BB) and (Buffer[2]=$BF));
        finally
               CloseHandle(hFile);
        end;
end;

end.
Turbo roadmap
[X] complete grammar
[X] dynamic parser which can be easily extended in the future
[-] expression evaluator
[-] compiler
[-] first program(GUI and console)
[-] Turbo IDE(which can be extended using Turbo)

Note that anyone can join the project if he/she has:
- medium Delphi knowledge
- the will to learn new things

Thursday, December 3, 2009

Turbo grammar

Surprise, surprise!
Do you guess what Santa will post today? Yes! that's right, the grammar for Turbo!
Abstract definition of a application|console|library in Turbo

application|console|library IDENTIFIER_NAME
{
        // use external libraries
        using LIBRARY_NAME1;
        using LIBRARY_NAME2;

        // include files as resource
        resource RESOURCE_NAME in "[PATH\]FILENAME.EXTENSION";

        // define constants
        const CONST_NAME = CONST_VALUE;

        const CONST_ARRAY_NAME = [VALUE_LIST];

        // enumeration
        enum {ENUM_NAME[, ENUM_NAME_LIST}

        // define variables
        var VAR_NAME[, VAR_NAME_LIST];

        // define class
        class CLASS_NAME ( BASE_CLASS_NAME )
        {
                // define class constants
                const CONST_NAME = CONST_VALUE;

                // define class variables
                var VAR_NAME[, VAR_NAME_LIST];

                // define class functions
                function FUNCTION_NAME( PARAM[, PARAM_NAME_LIST)
                {
                        // define function constants
                        const CONST_NAME = CONST_VALUE;

                        // define function variables
                        var VAR_NAME[, VAR_NAME_LIST];

                        return RETURN_VALUE;
                }
        }

        // define application's MAIN function, reserved word!
        // this will be executed only in application|console mode
        // for library use "initialization" simulate "main"
        function main
        {
                // do something
        }
}

But wait! that's some JavaScript(ECMA Script) combined with C#?!
Yup! and a bit of Delphi(Pascal) if you continue reading.
After searching the net for hours I came to the conclusion that C-like languages are more welcomed by masses.

Here is how statements will be defined in Turbo
STATEMENTS
-- IF
        if (CONDITION)
                STATEMENT;

        if (CONDITION)
        {
                STATEMENT_LIST;
        }
        else
        {
                STATEMENT_LIST;
        }


-- FOR
        for (IDENTIFIER = VALUE; CONDITION; UPDATE_EXPRESSION)
                STATEMENT;

        for (IDENTIFIER = VALUE; CONDITION; UPDATE_EXPRESSION)
        {
                STATEMENT_LIST;
        }


-- WHILE
        while (CONDITION)
                STATEMENT;

        while (CONDITION)
        {
                STATEMENT_LIST;
        }


-- DO
        do
        {
                STATEMENT_LIST;
        }
        while (CONDITION);


-- SWITCH
        switch (IDENTIFIER|VALUE)
        {
                CASE_VALUE: STATEMENT;

                default: STATEMENT;
        }

        switch (IDENTIFIER|VALUE)
        {
                CASE_VALUE:
                {
                        STATEMENT_LIST;
                }

                default:
                {
                        STATEMENT;
                }
        }


-- TRY
        try
        {
                STATEMENT_LIST;
        }

        try
        {
                STATEMENT_LIST;
        }
        except
        {
                DO_ON_EXCEPTION;
        }

        try
        {
                STATEMENT_LIST;
        }
        finally
        {
                DO_FINALLY;
        }

        try
        {
                STATEMENT_LIST;
        }
        except
        {
                DO_ON_EXCEPTION;
        }
        finally
        {
                DO_FINALLY;
        }
But wait, there's more! initialization and finalization of application|console|library
INITIALIZATION/FINALIZATION
executes a statement or statement list at the initialization or
finalization of the application|console|library.

ex:

initialization
        STATEMENT;

initialization
{
        STATEMENT_LIST;
}

finalization
        STATEMENT;

finalization
{
        STATEMENT_LIST;
}
Even if you love labels or not, there will be goto's!
LABELS
using labels is possible by writing "label" keyword before the label name
label LABEL_NAME:

use "goto LABEL_NAME;" to jump to a label
How about embeding binary or text files?
RESOURCES
you can embed external files(binary files, text files, etc.) by using
the following syntax

resource RESOURCE_NAME in "[PATH\]FILENAME.EXTENSION";

RESOURCE_NAME -- will be used to access resource
Here they come, who? the operators!
OPERATORS
!       factorial, retrieves the factorial value of IDENTIFIER
ex:     a!

^       power, BASE^EXPONENT
ex:     a ^ b

not     negate
ex:     not a, negate value of a

and     logial AND
ex:     a and b, a and b is true

or      logical OR
ex:     a or b, a or b is true

xor     exclusive OR
ex:     a xor b

shl     bitwise shift left
ex:     a shl b, shift a's bits to the left by b

shr     bitwise shift right
ex:     a shr b, shift a's bits to the right by b

mod     modulo
ex:     a mod b, the remain of dividing a by b

div     integer division
ex:     a div b, divide a by b and return a integer value

%       percent left of right
ex:     a % b is equal to a * b / 100

*       multiply
ex:     a * b

/       divide
ex:     a / b

+       add
ex:     a + b

-       subtract
ex:     a - b

=       equal
ex:     a = b, check equality

>       greater than
ex:     a > b

<       less than
ex:     a < b

>=      greater or equal
ex:     a >= b

<=      less or equal
ex:     a <= b

:=      assignment
ex:     a := b

*=      assignment of multiplication
ex:     a *= b, same as a := a * b

/=      assignment of division
ex:     a /= b, same as a := a / b

+=      assignment by addition
ex:     a += b, same as a := a + b;

-=      assignment by subtraction
ex:     a -= b, same as a := a - b;

IDENTIFIER++    increment IDENTIFIER's VALUE
IDENTIFIER--    decrement IDENTIFIER's VALUE
And who can forget about the good old comments?
COMMENTS
-- single line
        // comment

-- multiline
        /* comment line 1
           comment line 2
                ...
           comment line n */
STRINGS
-- strings are between double quotes
"this is a simple string"
CHARACTERS
characters can be represented as single "a" or
#128 -- 128 is it's ordinal value
characters can be concatenated with strings
ex: "this is a simple string" + #128 = "this is a simple string €"
But where are the numbers?
NUMBERS
-- integer 1, 2, 3
-- hex $FFAACC
-- float 123.456
Any comments?

Wednesday, December 2, 2009

Turbo project start

After reading comments on Turbo's grammar I've decided that I will stick with a grammar similar to Pascal even though it's a bit too verbose -- new programmers can take advantage of this and embrace the world of computer programming, therefore I will start working on a parser for Turbo, if anyone is willing to contribute to the project please let me know via e-mail or post a comment.
From now on I will post the progress of Turbo.
Turbo roadmap
[-] complete grammar
[-] dynamic parser which can be easily extended in the future
[-] expression evaluator
[-] compiler
[-] first program(GUI and console)
[-] Turbo IDE(which can be extended using Turbo)

Blogroll(General programming and Delphi feeds)