- Feature Articles
- CodeSOD
-
Error'd
- Most Recent Articles
- Secret Horror
- Not Impossible
- Monkeys
- Killing Time
- Hypersensitive
- Infallabella
- Doubled Daniel
- It Figures
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Did you consider that you need to read more than the first three words?
Admin
You don't count the numbers after the decimal. (sorry had to be said)
Admin
Me too. But they've offered to raise it to 6 1/2 if I promise to never program in any language ever again.
--Rank
Admin
I know nothing of how perl's regex's are implemented, but my guess is that, if substring matches are somehow optimized, they achieve something more like O( log(n) ). Better than O(n), certainly not as great as O(1).
O(log(n)) is what most good search algorithms that work on sorted data get. But if a cache is somehow involved, performance could potentially be better than that (or probably significantly worse under the right conditions).
Also, for speed of search to be that good, the cache must use huge amounts of memory (proportional to the data being searched). Imagine the size of this cache on a 4GB file. If you are not on an awesome computer, you're probably going to run the cache into your swap file, at which point you can say goodbye to any kind of performance gain the cache offers.
But again, I don't know how perl does it so I am only speculating here.
Admin
I just had to go through hours of SarbOx audit trailing for accidentally re-saving a program I had opened to see what exactly it was doing...Opened it, looked at it, fixed the formatting on one complex part so it was easier to read, then, LIKE A MORON, I saved instead of cancelling.
Every piece of code is monitored. When that one piece changed, it got flagged on our "needs documentation" list, and I had to hunt up someone who could authorise me to make a change on that program, post facto, so I could get documentation for the auditors.
That's the modern business world. Don't expect to be able to modify code to yer hearts content.
Admin
Your analogy sucks.
Think of it as fixing a key structural support in a building that's 200 stories high. Would you rather some schmuck maintenance guy just randomly saw it, and decided to fix it, knowing NOTHING about the true nature of the problem? Or would you rather that there was a lot of consultation and preparation and thought that went into the problem, and a good team assmbled to fix it, and the building evacuated, etc.
Emergencies are emergencies, and sometimes you've got to violate procedure in an emergency. But that sort of thing almost never happens, and when it does, generally you're not doing programming when it does.
If someone working under me took it upon themselves to fix a problem in such a way that they shut down the system, or broke some critical app, or introduced a MATH ERROR, I would come down on them so hard that it'd be ten years before they could look at a keyboard without puking in fear. Then I'd fire them. Even if nothing broke, and I found out they had made a programming logic change without authorisation, there would be hell to pay. Then I would fire them. That is not the way you work in the professional world.
Admin
No. A general substring search can never be faster than O(n), except if you parallelize it - you can do it in O(1) time with O(n) CPUs, of course.
This is easy to prove: consider the regexp /A/, i.e. matching any string that contains an A anywhere. Consider a String of length n consisting entirely of B's which may or may not have a single A thrown in at a random location. If you can, at the lowest level, only examine one character at a time (which is the case with a single-CPU computer), then you have to look at every single character to be certain there is no A, and if there is one you will find it, on average, after n/2 comparisons.
Search algorithms require, as you said, sorted (or otherwise specially ordered) data. A string is generally not sorted or ordered in a special way.
Admin
When we all agree what the right thing is and none of us ever make mistakes.
Admin
Then this is an even bigger WTF! All the code will need to know about this -1, and most of them will have to account for it. IF this -1 is needed in one place, it will be obvious in the other places that it is needed and why. As a programmer working on the code where it is needed I will be wondering why they didn't do it, and how it can possibly work. When I discover it in the stored procedure I will go WTF, you are making the sp less generic by doing something for me that I can do easily.
Admin
This is, of course, only true for the special "pixie dust edition" of Perl that emptyset uses. The license requires you to give your firstborn to the sidhe and put a saucer of milk on your doorstep every night.
Admin
Actually, that's exactly the way you work in my professional world :). My professional world is run by cowboys. Everyone does their own thing, we've never even heard of code reviews, and testing? Sure, we do that, because the company I contract for requires it. But they have no guidelines for it except that it be done; so long as a developer says, "yeah, I tested it," it gets moved to production.
As harsh as you come off in your post, I think I'd like to work in someplace like you describe. I have this awful feeling in the back of my mind that I'm learning a lot of really bad habits where I'm at.
Admin
On a good system, there is no such thing (nor can there ever be) a "clearly defined business-logic layer." It's a fallacy to believe that you can encompass all business logic in once place. If you try, you will create a horrendously ugly, un-maintainable system that will be full of nasty bugs. I’ve been there, seen it tried, and watched it fail.
Business logic needs to be placed at all layers of the system, often times in duplicate, triplicate, or (painfully) more. The key is placing it thoughtfully in places that will be easy to find and easy to maintain.
Stored procedures are a right step in this direction; they provide a limited interface with strong cohesion. This is what you want. A "who knows what's accessing what" type of cohesion that inline-SQL provides is a recipie for disaster.
Admin
Good Practices (everything from Source Control to Change Management to Quality Assurance) make better software cheaper. You should take the opportunity to sell this fact (we have 40-some years of studies on this) to management and take it upon yourself to work to implmenet said practices. I'd like to say, "think of how much better your company will be." But, screw that -- think of the ROR*!
* Like ROI, ROR is Return on Resume
Admin
<FONT face="Courier New" size=2>i've described it before, and have written a very successful implementation of it in FiPL.</FONT>
<FONT face="Courier New" size=2>your basic regex has to look at every character, providing no added benefit to the loop/parse method. the way to achieve O(1) performance is by 'caching'. when a substring to find in a string is provided as input, the regex 'caches' all the possible substrings in the string. it is then a trivial matter to determine that the substring exists, with a constant number of characters. simply poll a few cleverly chosen characters in the string and viola! there you have it. now, i hear what you're saying. yes, an adversary may construct a string that can defeat this algorithm. that's why you poll a constant number of well-chosen characters and a constant number of randomly chosen characters. comparing which characters are found and their position in the strings against the 'cached' values produces an answer in O(1).</FONT>
<FONT face="Courier New" size=2>granted, somebody mentioned 4gb strings. the good news is that the algorithm takes things like that into account. after a certain string length, you can't neccessarily garuntee constant time performance. so, you do the next best thing: approximation. by varying the constant number of random queries into the string, you can better approximate finding the substring. perl programmers know this. having a fast but 65% reliable method of finding substrings in constant time is pretty good. as my complexity teacher once put it, "you can find out anything with a 51% approximate solution" - of course, he was talking in the more theoretical sense, but it still applies.</FONT>
Admin
<FONT face="Courier New" size=2>you don't need O(n) CPUs for my solution. i was able to program this algorithm in FiPL, and you can too.</FONT>
Admin
I work for a medium insurance company. It would take 5 minutes to make the change and rebuild. Then I need to fill out a change control form, meet with QA (hope I could keep them to 1 hour), create the implementation schedule, and set up the change in 3 test environments, obtain sign-off from the testers, and finally coordinate the deployment with the IT staff.
Admin
And here I thought that saying "it's trivial" as a method of proof was a joke...
Your are spewing bullshit. O(1) for a general substring search is impossible. I provided proof of that a few postings above. Caching and randomization are completely irrelevant. And that's true for many common cases, not freak accidents as with a randomized quicksort.
As slamb wrote, clever substring searches can make the search independant from the length of the needle and return very fast with negative results for large needles. Maybe that's what you mean with O(1), but what everyone else meant is the length of the haystack, and there, anything faster than O(n) is impossible.
Admin
If that is way you work, then there are two possibilities. First, your management may be idiots. But the alternative is that there is relatively little penalty for mistakes in production. Right now I'm working for a lot of financial services customers, helping them generate reporting for their prime brokerage customers. If a report is wrong, it may cause a billion dollar hedge fund to make the wrong trades. The penalty for a mistake is high, and so their are a lot of controls.
Previously, however, I worked on an internal financial system for a large professional services firm. I've made fifty million dollar mistakes. The penalty, was a call from the CFO, a 10 minute code fix, and re-running a batch process that evening. That environment doesn't require excessive controls.
Systems should be divided into three categories based on the impact of mistakes: those that cause minor inconvenience, those that cause significant loss of money, and those that cause loss of life. The controls around a system should be based on this.
Admin
You must be one heck of a programmer. Switch to a real programming language, and you'll earn 7 figures.
Admin
<FONT face="Courier New" size=2>he probably meant six figures in yen.</FONT>
Admin
<FONT face=Arial size=2>Hmm.. I don't think you actually understand what O(x) means...</FONT>
<FONT face=Arial size=2>
</FONT><FONT face=Arial size=2>And now I'm sure of it. If your algorithim is O(1), then it runs in constant time regardless of the input. Period. Otherwise the algorithim is not O(1).</FONT>
<FONT face=Arial size=2>Implementation does not affect the O(x) expression of the algorithim. I suggest you read up on Big-O before talking about things you clearly do not understand.</FONT>
<FONT face=Arial size=2></FONT>
Admin
<FONT face="Courier New" size=2>look. if you 'cache' all possible substrings of a string, you can preform the lookup in constant time. the substrings have a well-defined order on them.</FONT>
<FONT face="Courier New" size=2>no, a generalized O(1) substring search is possible, and that's what perl uses. if it wasn't, nobody would program in perl because then it wouldn't be faster than other languages.</FONT>
Admin
Oh, and brazzy is correct. A general substring search cannot be done more efficently than O(n). Not under any circumstances, and no matter what your implementation is. If you do not understand why, then you really do not fully grasp the concept of Big-O algorithim measurement in the first place.
Admin
<FONT face="Courier New" size=2>i understand it quite well.</FONT>
<FONT face=Arial size=2><FONT face="Times New Roman" size=3>
</FONT><FONT face="Courier New" size=2>well, it's an issue with the 'caching' algorithm. generating all possible substrings in constant time is still a work in progress. only approximate constant time solutions exist.</FONT>
<FONT face=Arial size=2><FONT face="Times New Roman" size=3>
</FONT><FONT face="Courier New" size=2>i would suggest any perl programmer to do the same.</FONT>
Admin
Sounds like empty set is talking about O(1) with respect to the size of the regex pattern. Everyone else seems to be talking about O(n) where n = length of the string to be searched.
Admin
The number of all possible substrings of a string can be expressed as n!. Searching through them, if well indexed, can be done in log n time. Therefore your method can be, at best, O(log (n!)), which is still greater than O(1).
Perl is not faster than other languages. Why would it be? Algorithim measurement is not language dependant.
Admin
<FONT face="Courier New" size=2>no.</FONT>
Admin
<FONT face=Arial size=2>Doesn't really matter how long it takes to generate those substrings. Searching through them cannot be O(1) either. At best that search of any kind can only be O(log n), where n is the number of substrings you're searching through. </FONT>
<FONT face=Arial size=2>And since there's n! possible substrings of a string of length n, the total time for your substring search algorithim is O(log n!). N</FONT><FONT face=Arial size=2>ow, this might be faster in terms of seconds for your specific case, but the Big-O representation of it is not O(1) no matter how you munge it about.</FONT>
Admin
<FONT face="Courier New" size=2>as a great man once said, O(log(n)) <= 64. QED.</FONT>
Admin
If this guy was all that great, he probably would have tried to make sense. Comparing an O(x) expression to a real number doesn't make sense in any world I know of (including this one).
Admin
It's not very often I want to be that message-board jackass who flames people by laughing at them without explanation, but...
HAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHA
Ok, sorry about that.
Admin
Oh, now I get it. In you're own little world, O(1) = O(log(n)) but you took three days and paragraphs of posts to tell us that. Nice...
Admin
It makes sense -- think about how big N has to be for log(N) to be more than 64. He's making a point about how slowly an O(log n)-time cost increases.
Nevertheless, you can't just throw out stuff like "O(1) substring matching" and then back off to "O(log n), but that's not really very bad!" when people call you on it.
Admin
Yeah, but 64 what? What is his measurement in? Comparing O(anything) to "64" doesn't make any sense at all. Measuring actual things in terms of numbers is not what Big-O does. So comparing it to anything measured in numbers doesn't really make any sense. I mean, saying O(log n) doesn't even specify the base for "log", for example.
Admin
So umm... given a string of length-n how in the world can you find out if it contains the letter 'a' in O(1) time? Are you talking worst case scenario or simply mean case?
Admin
Quite apart from the amount of storage it would require (which is O(n!*n)) and the amount of time it would take to generate all that substrings (which is proportional to the storage). You could probably cut that down massively by reusing sub-substrings, but it's still going to be at least O(n^2).
As I said: he's spewing bullshit, probably intentionally so. Let's just stop taking him serious just because he makes the bullshit sound vaguely coherent in this thread.
Admin
<FONT face="Courier New" size=2>"caching".</FONT>
Admin
emptyhead, why don't you post your algorithm, to show us how it is O(1).
There had better not be any loops in it.
You're not even allowed to count the length of the string.
Don't forget to include the substring caching!
Judging by the other posts,
the caching is O(n!) and the searching is O(log(n)) at best, so your algorithm would then be:
(big surprise here:) O(log(n)) + O(n!)
Try not to be so stupid.
Admin
Ok, more accurately, it doesn't make SENSE but it does make a POINT.
Admin
In fact, O(log (n!)) is O(n).
Admin
The number of substrings of an n-character string is not n!, it is n+(n-1)+...+2+1 = n(n+1)/2 (there are n 1 character substrings, n-1 2 character substrings, and so on up to 2 n-1 character substrings and 1 n character "substring" (i.e. the entrie string).) So the search can be, at best, O(log n2), which is better than O(log n!), but still worse than O(1)
Admin
Ah, I understand! emptyset calls his bag of magic pixie dust a "cache". Sprinkling it over the data to execute an O(n) algorithm in O(1) is therefore "caching".
Admin
Read Alex's post.
Admin
Doh! You're correct, sir.
Admin
<FONT face="Courier New" size=2>i can't post my algorithm because it was developed in FiPL. you have to interpret the aftermath of the scales in order to see if the substring was found or not:</FONT>
<FONT face="Courier New" size=2>S1 is flip-flop of BLACK and WHITE;
BlackEvent is incident when S1 becomes BLACK;
Cycle1 is aftermath of BlackEvent as follows [
S50 is haphazard; /* change s50’s color */
]
WhiteEvent is incident when S1 becomes WHITE;
Cycle2 is aftermath of WhiteEvent as follows [
S50 is haphazard;
</FONT><FONT face="Courier New" size=2>]
TomatoEvent is incident when S50 becomes TOMATO;
EndProgram is aftermath of TomatoEvent as follows [
thanks for all the fish; /* stop the FVM */
]</FONT>
Admin
Putting aside the fact that it's not a factorial as I thought, do you still think this is true? If it was O(log (2^n)), then I'd agree with you. But n! grows much faster than an exponential function like 2^n. So in fact, O(log (n!)) is not the same as O(n).
Admin
Umm.. That just changes the state of scale 50 randomly until it becomes TOMATO. It uses S1 as a BLACK/WHITE timer.
Don't act like using esoteric languages wins you any points here. I can deal with the Fish as well as anybody.
Oh, and posting bits of code off of Wikipedia is just lame, man.
Admin
<FONT face="Courier New" size=2>what can i say? i was pressed for time. time is money. i'm a very busy person.</FONT>
Admin
Uh...yeah. Right.
Admin
For small values of n, log(n!) is approximately linear (but, of course, big O notation doesn't really care about small values of n). For large n, a better approximation is ln(n!) ~= n ln(n) - n, which is O(n log(n)) (http://en.wikipedia.org/wiki/Factorial#Logarithm_of_the_factorial)