Random line from a file
From
Amcleod@VERT to
All on Tue Aug 21 05:39:34 2001
The question arises periodically -- how to select a line randomly from a textãfile? If the file were filled with fixed-length records you could compute theãoffset of a random line and FSEEK/FREAD the apropriate line. Alas, text filesãare annoyingly variable in line-length. Or you could read all the lines intoãan array (simulated on disk if needed) and select randomly from the array. ãUgh! So here is my method-of-choice, which reads through the entire file fromãtop to bottom, and randomly picks one line out of the file:ãã #----------------------------------------------------#ã # randomly select one random-length line from a file #ã #----------------------------------------------------#ãã !INCLUDE FILE_IO.INCãã STR randomfile inbuffer winnerã INT handle line_count rval cvalãã #--------------------#ã # set up some values #ã #--------------------#ã SET randomfile "c:\\sbbs\\exec\\oneliner.txt"ã SET line_count 0ã SET winner "Default Value -- in case routine fails"ãã #---------------#ã # open the file #ã #---------------#ã FOPEN handle O_RDONLY randomfileã IF_FALSEã PRINTF "Error opening %s\n" randomfileã RETURNã END_IFãã #----------------------------------------#ã # loop through entire file, line by line #ã #----------------------------------------#ã :Loopã # check for End-of-Fileã FEOF handleã IF_TRUEã # we're done!ã GOTO Were_Doneã END_IFãã # read line -- watch for read failureã FREAD_LINE handle inbufferã IF_FALSEã # we're done!ã GOTO Were_Doneã END_IFã ADD line_count 1ãã # compute random value and comparison valueã RANDOM rval 1000000ã SET cval 1000000ã DIV cval line_countãã # is this line a winner, replacing previous winner?ã COMPARE rval cvalã IF_LESSã # we have a winnerã COPY winner inbufferã END_IFãã # check remaining linesã GOTO Loopãã #------------------------------------------------------------------------#ã # we should have a winner by this point, and can do with it what we want #ã #------------------------------------------------------------------------#ã :Were_Doneã PRINT winnerããIt's quite clever how it works (no, I didn't think of it first!). It readsãeach line and determines whether that line would have been selected _assuming_ãthere are no more lines in the file. If it _does_ find another line, itãdetermines whether it would have been chosen over what ever line had _already_ãbeen chosen. Like this:ããIt reads the first line. Possibly the ONLY line. So this line must beãselected 100% of the time. Hence Line #1 is selected randomly 1/1 of the time.ãThen it reads the second (and possibly last) line. If there are only two linesãin the file, then there is a 1/2 chance that line #2 will be selected. Shouldãa random comparison show that the 1/2 chance is met, line #2 replaces what wasãprevioulsy selected (IOW line #1). Now it reads line #3. This line is 1/3ãlikely to be selected. If it is, it replaces the previously selected lineã(either #1 or #2). Line #4 replaces the previously selected line 1/4 of theãtime. Line #5 replaces the previous winner 1/5 of the time... and so on.ããThe only thing tricky is that since we don't have FP values, we have toãgenerate integer random numbers and scale the count accordingly. (You can'tãhave values of 1/4 = 0.25 to compare with randiom numbers between 0 and 1, soãyou have to use a scaling factor -- 1,000,000 in this case -- and compareã1000000/4 with a random number between 0 and 1000000.)ããCaution -- this routine will select blank lines as well as any other, so watchãout for blank lines PARTICULARLY at the end of the file. Or check for blankãlines immediately after reading and "GOTO Loop" immediately, without adjustingãthe counter, if you find one. Skip Comments the same way.ããYou could also select "cookies" which consist of more than one line using aãmethod similar to this. You'd have to separate groups of lines somehow (say aãblank line), accumulate all non-blank lines as a possible winner, and when youãdiscover a blank, THEN you incriment your counter, make your random check, andãif successful replace the previous winning lines with the new set of lines youãhave accumulated.ããI've run this algorythm in a loop 250,000 times with a file containing 25ãdifferent lines, and each line is selected 10,000 times +/- 1% which is goodãenough for government work.ã---ã þ Synchronet þ Vertrauen þ Home of Synchronet þ [vert/cvs/bbs].synchro.netã