1. fast as possible
2. work on ANSI and Unicode version of Delphi
3. must be able to search both case sensitive and insensitive
4. must be able to search from a given position in a string
5. it's code must be as easier to read as possible
Of course, the first thing I did(like anyone would do) was to search the Internet using Google Almighty for a Boyer-Moore implementation(preferably in Delphi) that would meet at least half of the requirements listed above, guess what? I did not found one that would NOT give a error.
Next step: „back to the drawing board¯ -- „If you want something done right, you must do it yourself¯ or something like that.
Here's with what I came up:
function FindString(const Value, Pattern: string;
const CaseSensitive: Boolean = True;
const StartPos: Integer = 1): integer;
var
Index: Integer;
jIndex: Integer;
kIndex: Integer;
LLenPattern: Integer;
LLenValue: Integer;
LSkipTable: array[Char] of Integer;
LFound: Boolean;
LChar: Char;
function __SameChar: Boolean;
begin
if CaseSensitive then
Result := (Value[Index] = Pattern[jIndex])
else
Result := (CompareText(Value[Index], Pattern[jIndex]) = 0);
end; // function __SameChar: Boolean;
begin
LFound := False;
Result := 0;
LLenPattern := Length(Pattern);
if LLenPattern = 0 then begin
Result := 1;
LFound := True;
end; // if LLenPattern = 0 then begin
for LChar := Low(Char) to High(Char) do
LSkipTable[LChar] := LLenPattern;
if CaseSensitive then begin
for kIndex := 1 to LLenPattern -1 do
LSkipTable[Pattern[kIndex]] := LLenPattern -kIndex;
end else begin
for kIndex := 1 to LLenPattern -1 do
LSkipTable[Windows.CharLower(@Pattern[kIndex])^] := LLenPattern -kIndex;
end; // if CaseSensitive then begin
kIndex := LLenPattern + (StartPos -1);
LLenValue := Length(Value);
while (NOT LFound) and (kIndex <= LLenValue) do begin
Index := kIndex;
jIndex := LLenPattern;
while (jIndex >= 1) do begin
if __SameChar then begin
jIndex := jIndex -1;
Index := Index -1;
end else
jIndex := -1;
if jIndex = 0 then begin
Result := Index +1;
LFound := True;
end; // if jIndex = 0 then begin
kIndex := kIndex + LSkipTable[Value[kIndex]];
end; // while (jIndex >= 1) do begin
end; // while (NOT LFound) and (kIndex <= LLenValue) do begin
end;
If you find any problems(or improvements) in the algorithm please let me know.
Dorin,
ReplyDeleteWould you consider extending the TurboHashedStringList to duplicate the TStringList.Objects property?
Hi SportsGuy,
ReplyDeleteuntil now, no one asked for that and I had no need for that yet.
I'm pretty tied up for the moment and time is pretty limited, but you could implement that very easily, because each entry in the list hash also a pointer that can point to a object therefore extending the TurboHashedStringList class would be simple as 1, 2, 3.
Bottom line I will extend it as soon as I get some free time.
Dorin,
ReplyDeleteHere is what I added. I used the TStringList code in Classes.pas as my model.
First, the private setter and getter:
procedure PutObject(Index: Integer; AObject: TObject);
function GetObject(Index: Integer): TObject;
Then, the property declaration itself:
property Objects[Index: Integer]:TObject read GetObject write PutObject;
Finally, the setter and getter implementation:
function TurboHashedStringList.GetObject(Index: Integer): TObject;
begin
if (Index < 0) or (Index >= self.Count) then result:=nil;
Result:=PStringRec(FList[Index]).ObjectRef;
end;
procedure TurboHashedStringList.PutObject(Index: Integer; AObject: TObject);
begin
if (Index < 0) or (Index >= self.Count) then exit;
PStringRec(FList[Index]).ObjectRef:=AObject;
end;
Does that look about right?
Thank you for investing a few mins on this, I will update the class immediately and create a new post for it.
ReplyDeleteI thing tho'
PStringRec(FList[Index]).ObjectRef:=AObject;
should be replaced with
PStringRec(FList[Index])^.ObjectRef:=AObject;
It's better to dereference the pointer even tho' the compiler is smart enough to do it for you.
UpCase should not be used since it works *only* for ASCII characters!
ReplyDelete@Anonymous yes... maybe LoCase should be used since it's a Windows routine function LoCase(ch: PWideChar): PWideChar; but this is just an example.
ReplyDeleteI was just looking for a b-m delphi version that handled unicode
ReplyDeleteThis looked promising, but you do know that it doesn't work ?
Test app, 1 button, 1 memo
Button click code here
procedure TForm1.btn1Click(Sender: TObject);
const
NumItems = 150000;
var
x, y, z: Integer;
t1: Int64;
FindStr, DataStr: string; s:string;
begin
DataStr:='';
z:= (2* (NumItems div 3) );
// build a string like
// "0 asdshfty","1 jgyrkdsl" etc
for x := 0 to NumItems - 2 do
begin
s:= '"'+IntToStr(x)+ ' ';
for y := 0 to 7 do
s:=s+char( Ord('A')+ random(25) );
DataStr :=datastr+ s+'",';
if x = z then
FindStr :=s
end;
s:= '"'+IntToStr(NumItems-1)+ ' ';
for y := 0 to 7 do
s:=s+char( Ord('A')+ random(25) );
DataStr :=datastr + s +'"';
x:= pos(FindStr,DataStr);
memo1.Lines.Add(IntToStr(x));
y:= FindString( DataStr ,FindStr,1,true);
memo1.Lines.Add(IntToStr(x));
if x=y then
memo1.lines.add('OK')
else memo1.lines.add('Fail');
end;
did you test it? This don't working (D2010), always ret -1
ReplyDeleteI've rewrote the search function and done few test cases, unfortunately I do not have enough time to test more cases, so If you find bugs let me know.
ReplyDeleteHi, very interesting article, I'm wondering if that could be modified to search a portion of a .bmp image inside a bigger one. Is that doable ?
ReplyDeleteThanks in advance
Yes it is, but be more clear on what you wish to do, do you wish for example to search for a portion of a bitmap inside other bitmap with exact color intensity? if so you can do it with scanline method from TBitmap class.
ReplyDeleteBut if you wish to do the search more abstract then you should be more specific on your expected results, tolerance, etc. let me know with precise specifications.
Sorry for the delay, I had lost my bookmarks :\
ReplyDeleteYes indeed, I wanted to search for a portion of a bitmap inside another one with exact color matching.
That would be great
Regards
Mike
No worries, searching inside bitmaps is a bit tricky, because a bitmap can be flipped, mirrored, etc. therefore please give more details of what you wish to achieve exactly and if it can have impact on a larger scale(meaning that more people can benefit from the search algorithm) I will spend some time on this.
ReplyDeleteIf you have some time, maybe you can add wildcard support. Actually I still having found that one that works with >D2009.
ReplyDelete