 2012/05/17
|
Last update 2009/02/17

 The Labs - Design & Functionality For The Net
- Summary
- Introduction
- Implementation
- Download
- ChangeLog
- Usage
- Remote Access, SQL vs Proprietary Protocol
- TODO
- Further Ideas
- Limitations
- Further Readings
| Universal Annotation System (UAS)1. Summary
|
Following package resolves the problem of attaching meta data to files, in particular
where such functionality isn't available in existing setup, either filesystem level or system-wide (e.g. distributed heterogenous systems).
The meta data are stored separately, in my proof-of-concept implementation in a SQLite db, and tied via hash code (SHA2) to the file.
Additionally to the hash code also the URI (Uniform Resource Identifier) is used to refer to the meta data, e.g. in case where a file content has changed and a new hash code calculated, but no new filename and no new URI thereby, then the meta data is inherented.
The meta data are field and data, such as
- comment = 'this is my comment', or
- rate = '3 (5)', or
- tags = 'MacOS-X, FreeBSD, Filesystem'
The meta data is freely defineable.
| Universal Annotation System (UAS)2. Introduction
|
I came up with the idea after I wanted to annotate web-pages but also local files with metadata, may it be:
- commenting
- rating
- tagging
- preview image
- linking
- and so forth
and use one system to do this. In order to do this, most systems tie this data based on the URI (Uniform Resource Identifier), as example http://the-labs.com/ is one, a simple link.
Also, file://usr/home/user/Documents... is another URI, it's a local URI valid only on my local machine.
When I discovered Dolphin in KDE-4.1, with the ability to comment, rate and tag, I was excited, but
when I realized it was only able to preserve the metadata as long I did everything with Dolphin itself,
and if I moved the data somewhere else or renamed the file, the metadata was lost, unless I did it with
Dolphin - which isn't universal enough for me, because I like to move files or have files recognized based
on content, not the URI.
Since I have been pondering on this problem for quite some time, I decided to do a small package which
implements an "Universal Annotation System" (UAS), a proof-of-concept:
- identify files with a hash (SHA2), which makes trackable and identifiable independent of media and location, this becomes the UID (Unique Identifier)
- optional: local UAS client tracks a file based on UID and URI (this behaviour is now default)
- when content changes but URI remains the same, the UID is updated and metadata carried on (reattached or inherented)
- when content is renamed, new URI, but UID remains the same, the URI is updated
- ability to store metadata on internet-based UAS-server
| Universal Annotation System (UAS)3. Implementation
|
In order to implement the idea, two or three SQL tables are used:
- annotation: uid, uri, hash, size, filename, ctime, mtime
- uid: Universal ID, for now the same as the SHA2 hash
- uri: full path of the file
- hash: SHA2 hash (base64 digest)
- size: filesize in bytes
- filename: basename of file (without path)
- ctime: creation time
- mtime: modification time
- meta: uid, field, data, mtime
- uid: Universal ID
- field: field name (e.g. 'comment')
- data: field data (e.g. 'this is a test')
- mtime: modification time
- hashcache: uri, hash, mtime
I use Perl to implement this proof-of-concept, with DBD::SQLite as main storage facility, as hash I use SHA2 via Digest::SHA.
For sake of simplification the UID is the same as the SHA2 hash.
- uas: command line interface (single host or distributed/client of uasd)
- uasc / uasd: client/server interface (distributed content), uasc is kind of obsolete as uas supports this functionality as well now (since 0.013)
- uasrc: resource file resides in ~/.UAS/ and used by uas, uasc+uasd
- uasfm (planned): graphical filemanager to enter annotations
- uasfs (planned): FUSE based filesystem which makes annotations available as file
SQL Tables, and Application Interaction (Illustration)
| Universal Annotation System (UAS)4. Download
|
Latest version:
UAS-0.014.tar.gz (95.3 KB), GPLv2 licensed
Requirements

| |
sqlite3
perl-5.x
DBD::SQLite
Digest::SHA
Time::HiRes
File::ExtAttr (only for ./xattr for now)
install perl and sqlite3 with your OS dependent package system, and
install perl-modules via
perl -MCPAN -e 'install DBD::SQLite'
perl -MCPAN -e 'install Digest::SHA'
perl -MCPAN -e 'install Time::HiRes'
perl -MCPAN -e 'install File::ExtAttr'
|
| Universal Annotation System (UAS)5. ChangeLog
|
2009/06/06: 0.014
- code clean up for 'uas', 'uasc' and 'uasd'
2009/02/15: 0.013:
- 'uas' and 'uasc' support -uri <uri>
- 'uas' supports '-s <server>', unlike 'uasc' it sends direct sql queries to 'uasd' in this case (more verbose)
but makes 'uasc' obsolete, but for now it's still included in the distribution
- more statistic for 'uas' and 'uasc' to compare traffic
2009/02/04: 0.012
- small bug fixes in 'uas', 'uasc' and 'uasd', for better support of http based URIs (see USAGE)
- more detailed -v output for 'uasc/uasd'
2009/02/03: 0.011
- 'uas' and 'uasc' support -w (watching) switch, checks every 30 secs if files
with annotations are changed (in content) so to reattach annotations
(only works with <1000 files really)
- documented uasrc included
- NEW 'xattr' util to use like 'uas', uses setxattr and getxattr (requires File::ExtAttr)
- FreeBSD-6.0 or later: works out of the box
- Linux/ext3: mount -o user_xattr
- option '-l' list other namespaces beside 'user'
- will use set/getxattr as fallback on next version of UAS
- NEW 'uasfm', perl/glade/gtk2 filemanager: not yet functional
2009/02/01: 0.010
- using SHA2: SHA512 with base64 digest (dropped outdated SHA1 with hex digest)
(please delete existing ~/.UAS/uas.db and ~/.uas-hashcache.db)
- 'uas' and 'uasd' fixed to allow filenames have quotes (') as well
- prettier -dump for 'uas' and 'uasd/uasc'
2009/01/31: 0.009
- 'uasc' also supports multiple URI with same UID now
- 'uas' and 'uasc' now have -stat and -dump option
2009/01/31: 0.008
- 'uas' tracks files with same content but multiple URIs:
'uasc/uasd' doesn't support this yet (may do so later)
- multiple entries of same UID/Hash, but different URIs, but same META
- when one changes content (new UID/Hash) META is inherented
- more details for 'uas' and 'uasc' with '-l' switch
- uri has now the hostname included
- NEW location for UAS files: ~/.UAS/ uas.db and uasrc
- NEW SQL layout for uas.db (please delete existing ~/.uas.db or uas.db)
- uasrc support:
uas: override host: <host>
uasc: override
server: <server>
port: <port>
uasd: override port: <port>
- very rudimentary 'uasfs', where 'test.txt.meta' is hidden meta data
of 'test.txt', very experimental
2009/01/28: 0.007
- default is '-followuri', and use '-nofollowuri' to disable this;
consequence: 'uas' and 'uasc' by default reattaches metadata to files which
content has changed but URI has not, also files which URI/filename has
changed but UID (hash) has not, the URI is updated - let's see if
this it suitable
- 'uas' and 'uasd' delete meta field if field-data is ''
2009/01/28: 0.006
- 'uasc': has now local hash-cache (per user) to speed up lookup
- 'uasc': supports '-followuri' in order to reattach meta to changed file
2009/01/28: 0.005
- 'uas' has '-followuri' switch, which follow URI and if necessary
reattaches the metadata to new file (file content changed, new hash)
- 'uasc' and 'uasd' don't have this functionality yet
2009/01/27: 0.004
- 'uasc': UAS client, connect to uasd
- 'uasd': UAS server via port 10101
2009/01/27: 0.003
- more useable, fixed a few bugs (like finding uas.db correctly)
2009/01/27: 0.002
- 'uas' command-line interface behaves a bit like 'ls'
2009/01/27: 0.001
- first crude implementation
| Universal Annotation System (UAS)6. Usage
|
% more test.txt
this is a test
% ./uas
test.txt
% ./uas -l
-rw-r--r-- 1001 0 15 2009/01/27 12:49:12 test.txt
% ./uas -comment "Hello" test.txt
% ./uas test.txt
test.txt
comment: "Hello"
% ./uas -l test.txt
-rw-r--r-- 1001 0 15 2009/01/27 12:49:12 test.txt
uid: fVdo1Htrwn3E+n6XMs+i3lBsomKidJyxCJI+Xd3/3oQrv+5suNaS+0OsoPEpRsUhzOJjOIeRTKH5aJhHjRCtPw
uri: ghost:/mnt/ata1/Home/kiwi/Projects/UniversalAnnotationSystem/test.txt
filename: test.txt
size: 15
ctime: 2009/01/27 19:13:39
mtime: 2009/01/27 19:13:39
comment: "Hello" (2009/01/27 19:13:39)
% ./uas -help
USAGE: uas [-l] [-help] [-v] [-dump] [-<field> <content>]
[--<field>=<content>] [file1] .. [fileN]
% ./uasd &
% ./uas -s localhost test.txt
test.txt
comment: "Hello"
% ./uas -s localhost -l test.txt
-rw-r--r-- 1001 0 15 2009/01/27 12:49:12 test.txt
uid: fVdo1Htrwn3E+n6XMs+i3lBsomKidJyxCJI+Xd3/3oQrv+5suNaS+0OsoPEpRsUhzOJjOIeRTKH5aJhHjRCtPw
uri: ghost:/mnt/ata1/Home/kiwi/Projects/UniversalAnnotationSystem/test.txt
filename: test.txt
size: 15
ctime: 2009/01/27 19:13:39
mtime: 2009/01/27 19:13:39
comment: "Hello"
% scp test.txt spirit:
% ssh spirit
% echo "you require to install UAS also on the remote machine (spirit)"
% ./uas -s ghost test.txt
test.txt
comment: "Hello"
% cat > ~/.UAS/uasrc
server: ghost
^C
% ./uas -v test.txt
connecting ghost:10101
test.txt
CLIENT: SIZE 15
hash of test.txt
CLIENT: META fVdo1Htrwn3E+n6XMs+i3lBsomKidJyxCJI+Xd3/3oQrv+5suNaS+0OsoPEpRsUhzOJjOIeRTKH5aJhHjRCtPw
comment: "Hello"
% ./uas -l test.txt
uid: fVdo1Htrwn3E+n6XMs+i3lBsomKidJyxCJI+Xd3/3oQrv+5suNaS+0OsoPEpRsUhzOJjOIeRTKH5aJhHjRCtPw
uri: spirit:/usr/home/kiwi/test.txt
filename: test.txt
size: 15
ctime: 2009/01/27 19:13:39
mtime: 2009/01/27 19:13:39
comment: "Hello"
% exit
% echo "back on the local machine (ghost)"
% echo "note: -s <server> or ~/.UAS/uasrc with 'server: <server>' line makes uas use that server"
% ./uas -rate "3 (5)" test.txt
% ./uas -l test.txt
-rw-r--r-- 1001 0 15 2009/01/27 12:49:12 test.txt
uid: fVdo1Htrwn3E+n6XMs+i3lBsomKidJyxCJI+Xd3/3oQrv+5suNaS+0OsoPEpRsUhzOJjOIeRTKH5aJhHjRCtPw
uri: ghost:/mnt/ata1/Home/kiwi/Projects/UniversalAnnotationSystem/test.txt
filename: test.txt
size: 15
ctime: 2009/01/27 19:13:39
mtime: 2009/01/27 23:23:38
comment: "Hello" (2009/01/27 19:13:39)
rate: "3 (5)" (2009/01/27 23:23:38)
% cat >> test.txt
changing the content ...
^C
% ./uas -l test.txt
test.txt has changed, reattach meta data
-rw-r--r-- 1001 0 36 2009/02/03 10:10:19 test.txt
test.txt content has changed, reattach meta data
uid: RfBJcwNETDV26CDgDAxShOl/kBswuhU7JTsRBjLFg02mUxfvWOAUS1fPAUBpv+PY0vbbBS22tvFAuqPcfF2o7A
uri: ghost:/mnt/ata1/Home/kiwi/Projects/UniversalAnnotationSystem/test.txt
filename: test.txt
size: 36
ctime: 2009/01/27 19:13:39
mtime: 2009/01/28 09:55:15
comment: "Hello" (2009/01/27 19:13:39)
rate: "3 (5)" (2009/01/27 23:23:38)
% mv test.txt test4.txt
% ./uas -l test4.txt
-rw-r--r-- 1001 0 36 2009/02/03 10:10:19 test.txt
uid: RfBJcwNETDV26CDgDAxShOl/kBswuhU7JTsRBjLFg02mUxfvWOAUS1fPAUBpv+PY0vbbBS22tvFAuqPcfF2o7A
uri: ghost:/mnt/ata1/Home/kiwi/Projects/UniversalAnnotationSystem/test.txt
filename: test4.txt
size: 36
ctime: 2009/01/27 19:13:39
mtime: 2009/01/28 09:55:15
comment: "Hello" (2009/01/27 19:13:39)
rate: "3 (5)" (2009/01/27 23:23:38)
% ./uas -comment "annotating a http uri" http://the-labs.com/
% ./uas http://the-labs.com/
comment: "annotating a http uri"
% ./uas -l http://the-labs.com/
uid: xUgEa81hHeALlpWXpXH052XtnlaS+aGeaOAXSHYnc5qST4xrxZysNbQzJNQc1KQDrcfrmxw+2SkLDiIcWV49WQ
uri: http://the-labs.com/
filename:
size: 24458
ctime: 2009/02/04 16:25:57
mtime: 2009/02/04 16:25:57
comment: "annotating a http uri"
% ./uas -comment 'UAS version 0.013' http://the-labs.com/UniversalAnnotationSystem/UAS-0.013.tar.gz
% wget -q http://the-labs.com/UniversalAnnotationSystem/UAS-0.013.tar.gz
% ./uas ./UAS-0.013.tar.gz
comment: "UAS version 0.013"
| Universal Annotation System (UAS)7. Remote Access, SQL vs Proprietary Protocol
|
Since 0.013 uas also support remote access via -s <server>, but uas sends SQL direct
whereas uasc uses a proprietary protocol to reduce to a minimum of traffic,
here the traffic comparison:
% uas -s server -v test.txt
socket.receive.bytes: 660
socket.receive.requests: 5
socket.send.bytes: 691
socket.send.requests: 5
|
% uasc -s server -v test.txt
socket.receive.bytes: 266
socket.receive.requests: 2
socket.send.bytes: 104
socket.send.requests: 2
|
compression:
40%
40%
15%
40%
|
So it's obvious uasc has 40% to 15% of the traffic of uas which queries with SQL direct.
For now I keep both uas and uasc as part of the distribution.
| Universal Annotation System (UAS)8. TODO
|
A few things I may implement:
- DONE
uasd: daemon, running on a dedicated port which allows net connection (among internet or intranet)
- DONE
using hash-cache for uas (as part of main database), for uasd (for compatiblity reasons to uas), and uasc (for local lookup of already calculated hashes
- mysql integration: running via mysql so concurrent annotations are possible (with SQLite only one uas can do this at a time since it's a file-based db)
| Universal Annotation System (UAS)9. Further Ideas
|
A couple of further ideas and ways to develop it further:
What is Data or a Document

| | In order to deepen the concept of metadata, one requires to lament and discuss the nature of data itself.
Currently UAS defines data based on the content, e.g. two files with identical content are the same document. This is not really
true though, because two files with identical content may have a different meaning dependening on the context (e.g. its path), like
one file is the original and another is the backup. The backup may vanish, whereas the original is important to be kept.
Another more reallife application, files with zero bytes length have the same HASH, and are identitical, one file may be a starting capture of a movie,
another an empty file which is just a switch or flag for a program.
UAS makes no distinction, in its current implementation, whether it sees a copy, a backup or the original, it tracks the different locations
though even they all have they same HASH. So, right now, they all have the same UID because it's also the SHA2 hash as implemented.
This is breaking the logic in a way, it would be better to track all documents with an unique ID:
- whenever a new file appears, with unique location/URI, it gets a new UUID
- metadata are tracked based on URI, HASH and/or UUID
As example, tieing metadata to:
- URI: metadata is context sensitive
- fallback: if document with same HASH is found, metadata is carried on
- HASH: metadata is content sensitive
- fallback: if several documents have the same content, but different URI, those are tracked individually (this is what UAS does currently)
- UUID: metadata is tied to the file itself, whereas the UUID should be stored in or most close to the file itself (e.g. extended file attribute); independent on context/location and content.
There is Universally Unique Identifier (UUID) which could be adapted as well, yet, they still using SHA1 as of 2009/02.
|
Creating UUID

| | Currently the uid of the document is also the hash code, but it would be more consequent to define an UID or UUID which is unique all over the world.
One approach would be to create UUID like this:
random + unixtime (incl. fractions of the seconds), and then base64 or hex the number.
UAS.AB32u4i8dqj38fjaGJoe.72jsGh7jsHja
that UUID would universal. every document (even those with identical hashes) have an truly unique identifier.
|
Filemanager

| | The filemanager, like Dolphin under KDE, stores comment, rating and tags via UID (SHA2 hash). If a file with the same URI is changed (e.g. a file edit)
are new UID is created (with a waiting time of 10 secs or so) and metadata is reattached or inherented (the follow-uri option).
Once a file is moved among servers, different locations out of sight of a filemanager,
it can be still be recognized via UAS.
|
Global/Network Scope

| | A web-based XML service, e.g. as Apache-HTTP module or mod_perl handler(), to allow lookup would enable a joint pool of annotation, and
all documents within a company intranet, all its annotations are retrievable. Additionally,
if lookup comes with URI as well, all documents can easily be tracked.
Partially uasd does this, yet, uses proprietary client server protocol, a fairly simple one though.
|
Filesystem Integration

| | Further the idea is to have a FUSE which overlays via unionfs the metadata, and make it available like [file].meta, opendir() does not display .meta files, yet, open() does respond to .meta.
% ls
test.txt
% more test.txt.meta
comment: Hello
Since 0.008 a very experimental (read-only) uasfs is included, to use as such:
% mkdir test
% ./uasfs test/ &
% cd test
% cat > test.txt
this is a test
^C
the test/ is actually mapped into /tmp/fusetest-* ;
in order to test uasfs create in the real filesystem any file with the same content (either rewrite the content or simply copy the content).
and then use uas to give it some metadata; then access the uasfs mounted directory and do
% cat test.txt.meta
and it will display the metadata.
|
Extended File Attributes

| | UAS actually provides an alike "Extended File Attributes" (aka xattr) which is available on some platforms,
with perl it goes with File::ExtAttr, the program xattr as part of the UAS gives you an interface
like the uas:
% ./xattr -comment 'this is a test' test.txt
% ./xattr test.comment
comment: "this is a test"
should the comment not appear, then you have a system which doesn't support xattr, e.g. you require to mount an ext3 partition with -o user_xattr,
under FreeBSD user xattr work right away. A later version of UAS will use xattr where available, to store an id like uasid='hash' and connect the annotations this way.
For MacOS-X users, xattr is a system utility, with a different command line arguments than UAS's xattr, find out using man xattr.
Either way, ./xattr (from UAS) and MacOS-X's xattr are compatible in the sense they read/write the same file attributes.
|
| Universal Annotation System (UAS)10. Limitations
|
UAS does not do well when URI change (filename or relocating) and content change (new hash) done without UAS
stepping in between to catch URI or hash change; if both are done without UAS catching it, the annotations (meta data) are disconnected and lost.
Although with UAS-0.011 and later, uas and uasc have the -w switch, which makes them go check every 30 seconds (or set watch_sleep: 60 in ~/.UAS/uasrc), but this only covers
content changes to existing files with annotations registered with UAS. This isn't very efficient to lookup files with meta data for changes as there are can be hundred thousands of files, and then this won't work anymore.
So, UAS is good to attach meta data in a controlled environment, where UAS is invoked when data is moved around or content altered, in such case the mata data remains tightly connected.
| Universal Annotation System (UAS)11. Further Readings
|

Hipocrisy of the finest: "I agree that no single company can create all the hardware and software. Openness is central because it's the foundation of choice." -- Steve Balmer (Microsoft) blaming Apple regarding iPhone, February 18, 2009Last update 2009/02/17 
All Rights Reserved - (C) 1997 - 2009 by The Labs.Com |