Pick-set
This is pick-set, it is a program to find random subsets of a given
list of files, this set is constrained by a given target size and a
percentage underun.
The current version is
0.3 See the Changelog for what has
changed since the last revision.
Pick-set was created because I own a RIO 500 MP3 player and I wanted a
random selection of music from my mp3 collection that didn't waste too
much of the rio's precious (expensive!) memory.
There are no doubt other uses but this is what it was created for.
Usage Notes
I have several thousand mp3 files (I do not support piracy as I have all
the originals, so don't ask, you cant have any of them!), these are
some notes I have made while using pick-set.
- I provide a script to upload to my Rio, there is no reason this
couldn't be extended for other tools and players (with minimal shell
script knowledge).
- I use Linux 2.2.18 which has the newer usb support and the rio500
kernel module built in and the latest(0.7) version of the mp3 tools at
(see http://rio500.sourceforge.net/) .
- Previously the Linux
2.2.14pre20+usb back-port and the (then current) cvs version of
the mp3 tools were used.
The fillrio script still has support for these older tools.
- In operation thousands of files take fractions of a second to stat
and add to the unselected list at start, so speed should not be an
issue (if this is wrong for you please mail me details).
- The exclusion list only removes the first occourance of a file so
if you have a file 10 times in the input set only the first reference
is removed and the set used for selection will still have 9 references
to the file.
- Setting the fidget value higher generally gets a better result
more often but takes longer.
- When using weightings is you have say 5000 files a file stands a
one in five thousand chance of being picked if the file has a 50%
weighting you lengthen this to one in ten thousand, those are pretty
slim odds. My point? be conservative with weightings unless you
really hate a track (why keep it then?) as with lots of tracks they
don't come round often anyway.
- When selecting a set for my Rio 500 thats 63Mb of data, one
percent of that is a lot of space, hence you need to be careful with
specifying your acceptable limit (I usually use 99.9%) this is why the
range was extended to tenths of a percent in the first place.
Basic Setup
- Unpack the source tar.
- Type make - Known to work on Linux and solaris 2.6 (with gcc)currently.
- For Rio500 operation:
- Install pick-set in a bin directory if you want
- Edit the fillrio script as required.
- Run fillrio
Basic Operation
command usage
pick-set -s <size> [options] listfile...
Option Switches and their explanations
Switch | Name | Default | Description |
v | Version | | Prints the version infomation |
h | Help | | Displays brief command usage help |
i | Input from stdin | Off | sets the list input to come from stdin |
w | Enable weigthing | Off | Turns on weighting percentage checking |
e <filename> | Exclude list | | File with list of entries to exclude |
s <size> | Target size | | Output target size |
p <10ths percent> | Target percentage | 99.0% | percentage of the target size that is acceptable specified in tenths of a percent to allow for greater flexibility with large file sizes. |
f <number> | Fidget max | 100 | See theory of operation for a more detailed explanation of this option. |
d <level> | debug | Off - 0 | Turns debug output on. |
Known Problems & Omissions
- Acceptable limit setting needs looking at (either 1000/percent or a size
setting)
- Weighting system should be improved (allow for wildcard names to set
weights - say for whole albums or artists).
- Need to develop a program/script to generate weightings based
on number of tracks by artist name.
- Build environment is primitive (maybe autoconf?)
Theory of operation
- Aim is to produce a random selection of files constrained by a
given total size and a allowable percentile underun. Note this is not
a best fit algorithm! It is intended that it not pick a non optimal
set of results or the output set would be the same every time for a
given input set!
- There are two lists the "selected" list and the "unselected" list
these represent the basic data structures the program uses.
- At the program start input from the specified list file(s) is
taken. Each line from the list file is read and the specified file
examined (if readable) it is placed in the unselected list else it is
discarded (error message indicating why the file was unsuitable is
produced). If a command line switch is set the option of supplying a
weighting for each file as a percentile value on the end of each line
(default is 100%) is available.
- If required a scan is performed for an "exclusion" list of
entries, this is primarily so a given set can be excluded from the
unselected list before selection process begins this allows for
previously selected sets to be removed from consideration on a
subsequent program run.
- A loop then makes pseudo random selections from the unselected
list and moves them to the selected list until the total size of the
selected files exceeds the specified target size. Weightings are taken
into consideration when a selection is made by performing a second
random test to generate a number between 1 and 100 if this number lies
below the specified weighting the item is added to the selected list
otherwise it is omitted.
- A second loop then removes pseudo random entries from the selected
list until the total selected filesize falls below the target size.
- These two loops are repeated until the total selected size after
the removal stage is within a given percentile value of the target
size or they have been repeated a given number of times (the fidget
value).
Licensing
See the licence file for more details
Any problems with this web page mail Vincent Sanders
There's lots more to the site! A good place to start is the Main Page.
100% hand typed HTML
© Vincent Sanders
$Id: index.html,v 1.1 2002/12/14 23:53:10 vince Exp $