Archive for the 'Progs/Tools/Libs' Category

Finding top-N items in a stream

Wednesday, July 4th, 2007

How to (approximately) generate a top-N items list without counting the number of occurrences of all instances? Two interesting papers I found on the topic: http://citeseer.ist.psu.edu/charikar02finding.html and http://citeseer.ist.psu.edu/jin03dynamically.html. I also somebody’s seminar powerpoint presentation explaining it.

Adding custom firewall rules in OSX

Tuesday, May 1st, 2007

Having extensively used Linux before I found GUI configuration of OSX firewall somewhat lacking. In particular, I wanted to limit outgoing access to some IP addresses (but I can imagine you may want to play with other things as well).

I found that I could buy Flying Buttress which should allow me to do this, but I really don’t need a graphical ipfw frontend, especially the one I’d have to pay for ;-) All I needed was to write some ipfw rules and make them persistent.

Here’s what I did:

 mkdir /Library/StartupItems/CustomIPFWRules
 cd !$

Created a file called StatupParameters.plist containing:

{
  Description     = "Custom Tadek's IPFW Rules";
  Provides        = ("CustomIPFWRules");
  Uses            = ("Network");
}

Created a file called CustomIPFWRules (the name has to match the directory name) containing a simple shell script:

#!/bin/sh

. /etc/rc.common

case "$1" in
        start)

        ConsoleMessage "applying tadek's ipfw rules"
        ipfw add 2045 deny tcp from any to "ip_I_want_to_block" out
        ;;
esac

exit 0

Voila!

BTW: a useful link on playing with Firewall in OSX.

Recovering deleted photos - my experiences

Thursday, January 4th, 2007
  • (playing with the camera) Cool. You can change the format of the CF card… Whops….
  • My CF card got corrupted.
  • The photos I thought had been already uploaded to the gallery were deleted from both the CF card and the harddisk. Moreover, I already took 50 photos on the CF card after reformatting it.

Time and time again, I realize how precious my photos are, only after they are gone. The last time they were my PhD defense photos… Auch. Fortunately, they are gone, but not forever ;-)

You will find several tools for photo recovery on Google, but most of them either commercial or give you a only a “free preview”. Thinking about it, it’s a very good business model: people are very willing to swipe their card if they can see that they get their photos back. Should have written one as well ;-) Fortunately, there are also free ones, two of them I tried by myself.

PC Inspector Smart Recovery is an excellent and easy-to-use Windows application for photo recovery. It is extremely easy to use and does an amazing job. It is also completely free, but I think is fair to reward the author with a PayPal donation.

PhotoRec is a cross-platform program for photo recovery. In fact, it runs on Dos/Windows/Linux/BSD/Solaris/MacOSX, which is really impressive. It also supports a whole range of filesystems, including FAT/NTFS/Ext2/3/HFS+. It has a simple (n)curses based interface, which is a bit of a disappointment in the age of cool animated GUIs, but it’s also relatively easy to use. Also, it has an impressive range of configuration options. And it works on Linux and my Mac. I no longer need Windows! Simialrly, the program is really free (including the sourcre) and you can rewerd the author with a donation.

Both tools worked for me smoothly and (unlike some other commercial ones I tried which ran out of memory) managed to recover almost one thousand photos from my 4GB CF card, including the wanted defense photos ;-) I don’t know which of the tools is better and I recommend both (although I would lean slightly towards the PhotoRec as I don’t have Windows anymore). Also, if you don’t get what you’re looking for, you can try both of them (as a rule all image/disk recovery tools are read-only so you can try all of them many times with no risk).

Two friends: GeoWebStats and GeoBroStats - visualizing Apache and Bro logs with Google Maps

Tuesday, January 2nd, 2007

One of my pet (a.k.a. procrastination) projects has been to visualize my server logs using Google Maps. In fact, this has been my ‘procrastination hub’ giving me excuses to work on a variety of pet projects, including:

  • playing with Bro and packaging Bro for Debian
  • playing with Apache logs and importing them to the relational database
  • playing with Bro logs and importing them to the relational database
  • learning Python and Javascript
  • playing with Google Maps
  • writing a web application to visualize the collected logs on Google maps
  • creating a webpage documenting all the above.

As with procrastination projects, they are by definition never complete. I do have something working now, and you can see it in action (works best in a decent browser, but should show something in IE as well).

GeoWebStats

Visualizing Apache logs on a webpage. Here are three links (it might take a while to load them for the first time, so please be patient):

The script is quite customizable (for example you can specify the regular expressions you want to filter on, group stuff) but for security resons those demo links are locked.

GeoBroStats

Simiarly to GeoWebStats, GeoBroStats visualizes raw TCP/UDP conections based on Bro conection summaries (this might also take a while to load):

The script is also quite customizable, but for security resons those demo links are locked.

Let me know what you think about it. I know that the user interface is very crude and needs some work. I have also almost finished GeoWebStat’s website, but knowing me, it will take a while ;-)

Polish keyboad on OSX - a rant

Tuesday, January 2nd, 2007

I recently had to write some Polish text on my MacBook Pro and discovered that the Polish keyboard is messed up. In fact, coming from a PC world I’ve always thought Mac’s keyboards are messed up (e.g., lack of Home/End PageUp/PageDown, which can be simulated by some weird and application-dependent two/three key combination, an almost completely useless Enter/Rename key, an annoying Eject key, which pressed accidentally generates an eject sound regardless whether you have something in your drive or not), but this time I got annoyed.

To give a bit of background, in Poland, we use nine additional letters, namely ęóąśłżźćń (and their uppercase counterparts) and historically typewriter’s keyboard had them allocated at the right side (where brackets and quotes are). Now, unless you’re a typewriter, this is not very useful (especially if you need the braces and quotes more often) and we have two Polish keyboard mappings: a typewriter’s keyboard and a programmer’s keyboard (with Polish letters generated with an Alt+<Latin letter>). As we have two z-derivatives: żź one of them is Alt+z (the more common ż) and the other is Alt+x (the less common ź).

Playing with my Mac I discovered that żź are swapped. I am not sure if there’s any rationale for it (apparently it was ok in OS9 and only changed in OSX), maybe it’s easier to press Alt+x (which gives a more commonly used character), in particular that on a PC it’s a right Alt, not the left one (in fact, I was trying to get it to be more ergonimic, I would remap the right Enter to Alt), but I found it confusing. To get a feeling what it’s like, imagine what if Apple replaced a Control key with Enter or PageUp with an eject button. Whops… they already did it. Imagine something else then ;-)

Doing a bit of research I found discovered I am not the only one annoyed with it. Somebody made a correct programmer’s keyboard and which can be downloaded from here. There are two versions: one replacing a system file and one installing a local keyboard for a user. I took the latter approach and it works great!

Bro IDS - Debian Package

Tuesday, January 2nd, 2007

I’ve been using bro for quite a while on my server and consider is a great IDS. Actually, I’ve been using it mostly as a network analysis tool (connection summaries, tracking HTTP connections, analyzing headers, etc.), rather than an IDS itself, but I still think it’s great.

What has been bothering me most this time is that my cleanly-installed server with a proper package manager (I’m running Debian and I am very happy about it, regardless what some friends of mine say) is running a service installed in my home directory in a screen. In fact, as the server’s uptime is on average half a year, it’s not such a big problem, but it really bothered me ;-)

Almost a half a year ago, I started Bro’s ‘Debianization’ process, as one of my many procrastination projects (a.k.a. pet project), but I haven’t been active (maybe now that I defended my thesis I don’t need to procrastinate so much? :-)). Now during the Christmas break I finally managed to (almost) finish it!

The whole job turned out to be more difficult than I’d thought, but it works now. Here’s a proof:


tadekp@plum:~$ apt-cache show bro
Package: bro
Version: 1.1d-1
Priority: optional
Section: net
Maintainer: Tadeusz Pietraszek <tadek@pietraszek.org>
Depends: libc6 (>= 2.3.2.ds1-21), libgcc1 (>= 1:3.4.1-3), libncurses5 (>= 5.4-1), libpcap0.7, libssl0.9.7, libstdc++5 (>= 1:3.3.4-1), c-shell
Architecture: i386
Filename: ./bro_1.1d-1_i386.deb
Size: 3061038
Installed-Size: 8916
MD5sum: 880901a64a7fc44766e4645f445799a6
Description: Network Intrusion Detection System (NIDS)
 Bro is an open-source, Unix-based Network Intrusion Detection System (NIDS)
 that passively monitors network traffic and looks for suspicious traffic.
 .
 Bro detects intrusions by comparing network traffic against a customizable
 set of rules describing events that are deemed troublesome. These rules
 might describe specific attacks (including those defined by signatures)
 or unusual activities (e.g., certain hosts connecting to certain services
 or patterns of failed connection attempts).
 .
 Bro uses a specialized policy language that allows a site to tailor Bro's
 operation, both as site policies evolve and as new attacks are discovered.
 If Bro detects something of interest, it can be instructed to either generate
 a log entry, alert the operator in real-time, execute an operating system
 command (e.g., to terminate a connection or block a malicious host
 on-the-fly). In addition, Bro's detailed log files can be particularly
 useful for forensics.

tadekp@plum:~$


tadekp@plum:~$ /etc/init.d/bro status
Bro is running (pid: 2859)
Autorestart: ON
Running since: Mon Jan  1 16:11:37 CET 2007
Bro Version: 1.1d
Active log suffix: plum.07-01-01_16.11.33
tadekp@plum:~$ 

The package is in alpha stage now and I still get a few lintian errors (for example, the man page is missing), but otherwise is ok (even including the init.d scripts and checkpointing). If you’re interested in trying it out, please let me know.

Gallery2 plugin - displaying googlemaps with GPS coordinates from EXIF

Thursday, August 31st, 2006

After resuming my geotaggin script (see this post), I decided to do something useful with it. We’re using gallery2 to store our photos and with a googlemap plugin, but found it useful only for displaying a single pointer per album (see here). For a more fine-grained selection we needed something else.

Therefore, I decided to write my own plugin (yeah, there are already two out there, why not write my own? ;-)) and also learn Gallery2 API. The idea is to display a google map at the bottom of each photo, showing exactly where the photo was taken. Yes, it photo-specific and there’s only one pointer on the map. I find it nonetheless very useful.

Here’s a sample output (you can also admire the beautiful scenery of Melchsee ;-)). The plugin adds a new “block” in the template (therefore can be configured using a standard block management tool in Gallery2).

The position of the current photo is always in the middle (although you can move the map around, change the map type, zoom in and out etc.). The changes you make are stored as session cookies, and preserved between consecutive photo loads. Also, the whole panel can be hidden to speed up load and only shown on demand (show map|hide map).

Any comments? suggestions? ideas?

The plugin is currently in alpha stage, I will release it in a week or two (I want to create a webpage for it as well). In the meantime, if you’re interested in trying it out, drop me a line ;-)

BTW: I also found out that when iPhoto edits a photo, it converts Exif from Intel to Motorola (little endian -> big endian). There was a bug in exifer used in gallery, which corrpted the tags. The patch is only two lines long and can be found here (I also emailed the author):

--- gps.inc.orig        2006-08-31 10:25:27.000000000 +0200
+++ gps.inc     2006-08-31 10:36:37.000000000 +0200
@@ -116,13 +116,24 @@
                        $minutes = GPSRational(substr($data,16,16),$intel);
                        $hour = GPSRational(substr($data,32,16),$intel);

  • /* now we need a hack, since the whole data has been flipped in :103
    • the order here is sec:min:hour. However, in the motorla mode the data
    • has not been flipped and the order is h:m:s. This breaks compatibility
    • with Motorola exif. (Tadek) */
  • if($intel==1) $data = $hour+$minutes/60+$seconds/3600;
  • else
  •                                 $data = $seconds+$minutes/60+$hour/3600;
            } else if($tag=="0007") { //Time
                    $seconds = GPSRational(substr($data,0,16),$intel);
                    $minutes = GPSRational(substr($data,16,16),$intel);
                    $hour = GPSRational(substr($data,32,16),$intel);
    
  •                 /* I guess the same HACK as above. Tadek */
    
  • if ($intel==1) $data = $hour.”:”.$minutes.”:”.$seconds;
  • else
  • $data = $seconds.”:”.$minutes.”:”.$hour; } else { if($bottom!=0) $data=$top/$bottom; else if($top==0) $data = 0;

GeoTagging in EXIF (resumed)

Thursday, August 31st, 2006

After almost a year after I had done some initial experiments with geotagging my images. I decided to pursue this idea further. I also realized that taking an approach “It’s simple let’s write in Perl” is good, but “Let’s see if somebody hasn’t done it yet” is even better ;-)

First, I recall I had some problems with reading GPS data and hacked gpstransfer to do this. In the meantime, I found out that pygarmin is a nice and working interface to Garmin GPSs, which works. More recently I discovered a much better and more versatile tool gpsbabel. Like the tower of Babel it really does speak all the languages (including plurality of GPS and very exotic formats (see here). The user interface is a bit weird and not all types of information (i.e. waypoints, tracks) are supported in all formats. Sadly, if the format is not supported, the program does transfer all the data (which takes a while) and prints nothing just to confuse you ;-)

After a bit of experimenting I discovered a magic formula:

gpsbabel -t -i garmin -f /dev/ttyS0 -o psitrex -F <track_file>

“psitrex” stands for KuDaTa PsiTrex format, however, as exotic as it sounds, it is just a comma separated format. I tried a couple of others (actually all of them in my version (1.2.7) and the only ones I found useful were:

  • psitrex -> an easy to parse CSV format
  • gpx -> produces results in XML
  • nmea -> looks ok, but does not show the start and end of each segment, which is useful for determining whether to interpolate data or not.

I also revisited my geotagging script. First, as Diego pointed out there’s a nice tool on Mac to do this: iPhototoGoogleEarth, which unfortunately doesn’t work on intel-Macs yet. For geotagging they use GPSPhotoLinker, which in turn uses gpsbabel. So at the end we use the same backend tool.

In the meantime I revisited my geotagging script and made it a bit more useful adding a bunch of options, debugging it, etc. Now that I actually use it on my photos, I had to make sure that it actually works ;-)

Here it is:

$ ./geotag.pl 
ERROR: Need track file to proceed - use -t <trackfile>
Usage:
  ./geotag.pl -t <track_file> [-b] [-f] [-s <seconds>] [-a <seconds>] [-n <seconds>] [-z <tz>] files_to_process...


  Where:
  -b -> keep backup,
  -f -> force, overwrites an existing EXIF tag (or removes it)
  -s <seconds> -> time shift (in case of time discrepancies)
  -a <seconds> -> approximate non-continuous segements in the tracklog (default 300s)
  -n <seconds> -> don't approximate but take the closest segment (default 10800s)
  -z <tz> -> use the follwoing timezone for your camera (default current timezone)

  <track_file> can be best created using gpsbabel (http://www.gpsbabel.org):
   (e.g. Serial Garmin GPS: gpsbabel -t -i garmin -f /dev/ttyS0 -o psitrex -F <track file>)    
   at ./geotag.pl line 288.

Now what do I do with the script? Well… I wrote a gallery2 plugin to display google maps. See this post.

Pattern-based file renaming

Monday, April 3rd, 2006

Ever wanted to do bulk operations on files, similar to xargs, but much more flexible? For example:

  • rename all files .jpeg to .jpg
  • remove a prefix from many file names?
  • add a suffix/extension?
  • remove a prefix/suffx/extension?

Here’s a script I wrote:

!/bin/bash

#

Pattern-based file rename

#(c)2006 by Tadeusz Pietraszek

#

Usage:

./mv-pattern -i a *.txt <- delete all 'a's in file names

./mv-pattern -i jpeg -o jpg *.jpeg <- rename all jpegs to jpg

./mv-pattern -o .txt <- add an extension

./mv-pattern -i .txt <- remove an extension

#

if [ $# -eq 0 ] then echo "Usage: basename $0 [-i ] [-o ] [-c ] files" exit -1; fi

INPATTERN=""; OUTPATTERN=""; COMMAND="echo";

while getopts "i:o:c:" Option do case $Option in i ) INPATTERN=$OPTARG;; o ) OUTPATTERN=$OPTARG;; c ) COMMAND=$OPTARG;; * ) echo "Unimplemented option chosen. Has to be one of -i -o -c"; exit -1;; esac done

if [ -z "$INPATTERN" ]; then echo "No input pattern. Are you sure it's what you want?";

exit -1;

fi

if [ -z "$INPATTERN" ]; then echo "No input pattern. Are you sure it's what you want?";

exit -1;

fi

shift $(($OPTIND - 1))

Decrements the argument pointer so it points to next argument.

echo "in: $INPATTERN, out: $OUTPATTERN";

rename

for FILE in "$@" ; do if [ -f $FILE ]; then NEWFILE=echo $FILE | sed -re "s/(.*)$INPATTERN(.*)/\1$OUTPATTERN\2/"; if [ "$FILE" != "$NEWFILE" ]; then $COMMAND $FILE $NEWFILE; fi; fi; done

exit 0