Category Archives: Data

RSS adverts go mainstream

Google has moved RSS adverts into a wider beta, and Robert Scoble has been considering the benefits or otherwise of them. And he ranks types of feeds from worst (Headline only, with ads) to best (Full text with no ads).

Deciding whether or not to put adverts in your RSS (and indeed if your feed has all your text or just the partial text) is, I think, a matter of what you’re trying to do with your content. To bring it to total black and white, are you trying to make money, or get your ideas out?

Reality, of course, is shades of grey. For one thing, if you go the total black option (headlines only, ads in the feed, and presumably more ads on the site — since that’s the only reason you’d want to provide only headlines in the feed) then unless your content is pretty damn compelling, you’ll get no readers (at least not from feeds, and this is increasingly the way people consume their web sites), and thus no money, and your content goes nowhere.

Other end of the scale (full text in feeds, no ads anywhere) is okay, as long as you don’t get snowed under by readers, and end up paying so much in bandwidth that you can’t afford it anymore. Not likely these days, but theoretically possible, especially if your content is multimedia.

For most of us, I suspect, the balance is somewhere closer to white than black.

Proximity sense travel cards are vital; processes support falible memory

I lost my train ticket the other day. My monthly. A hundred bucks worth. I recalled that I’d validated it on the bus to get home (because the bus was there; I don’t wait for it if it’s not there – the timing’s a little vauge and I’m not that adverse to exercise). I remembered left in my back pocket along with a bus timetable. And I knew it was lost, because I have processes to deal with a decaying memory. I lock the car with the car keys now, because the car can be locked without them and I know that I can and have left the keys in the car; so locking it with the key means I can’t do that. I knew that I’d only recently walked in the door, and that I’d only been in a limited number of places. I knew that there was only one place it should have been, where I leave all my pcoket stuff – phone, wallet, MP3 player, keys, coins, ID lanyard and travel ticket. And it wasn’t there. Because I was in the process of trying to put it there. But the other stuff was. It wasn’t in any of my pockets.

I concluded that the only remaining explaination is that I had dropped it, which seemed ludicrous. How could that have happened? It was in my pocket! I retraced my steps back to the bus stop, and halfway there I found the bus pass. Another hundred metres and I found the ticket. During the walk home it had worked its way out, sliding up against the bus timetable and onto the footpath.

Now, the reason I had it in my back pocket was because it was a Friday, and on Fridays its casual day at work and as such my shirt didn’t have a pocket in it. So, there was process failure there, but it was to be expected. Little I can do about casual day.

I’ve had scares like this in the past. The reason I keep my ticket in my pocket is because I need it easily accessible, for feeding into the barriers to let me in and out of the train stations. There are most secure locations I can keep it, but they are less accessible. So I’ve left it in the pocket of the previous day’s shirt and not realised until I’ve arrived at the train station.

But the crux of the matter, the reason this is a GeekRant article, is because if the damn ticket was proximity detect I could keep it in my wallet or on my ID lanyard and never lose it and also have it ready to validate at a moment’s notice. The lanyard would be best, because then I couldn’t get to work without taking my lanyard with me, which would remove another thing I could forget and would inconvience me. And this is all the more important now that I’m lugging a thousand buck yearly ticket around with me. It’s not like it can’t be done either – all the validating machines have proximity sense detectors on them. At least the yearly tickets are plastic and will survive a trip through the washing machine.

Stupid MetCard.

Access-style Autonumber fields in Oracle

One of the things I like about MS Access is the AutoNumber field type. And SQL Server has its Identity field type. Very handy for primary keys. But from time to time I convert applications into a more heavy-duty database, such as Oracle, where to do the same thing, you need to select from a sequence first. Here’s how to do an Autonumber-style field without mucking about with the sequence.

You do, of course, need the sequence, to track unique IDs.

CREATE SEQUENCE [Schema].[SequenceName] START WITH [Number] INCREMENT BY [Number] MINVALUE [Number] NOCACHE NOCYCLE;

NoCache will avoid skipping values due to Oracle creating any numbers in advance. NoCycle avoids the number repeating itself. I generally name my sequence after the field it’s being used for, with an _SEQ suffix.

Putting the sequence number into the field is done by a trigger:

CREATE TRIGGER [TriggerName]
before insert on [TableName]
for each row
begin
select [SequenceName].nextval into :new.[ColumnName] from dual;
end;

Easy!

There is a catch. While in Access or SQL Server there are easy ways to find out the value of the AutoNumber/Identity field in the row you just added (through @@IDENTITY in SQL Server or in Jet 4.0 or later), this isn’t so easy in Oracle.

You can query the column’s MAX value or the sequence CURRVAL afterwards — but this isn’t reliable if yours is not the only process inserting data. Would it work if you placed your code in a transaction? I’m not convinced.

So the trigger method should only be used when you don’t need to know the number entered straight afterwards. If you need to know, then forget the trigger — just select your Sequence value first, then use that in your insert(s).

Putting crap in database fields

As someone who administers a number of databases, I’d say to anyone that you need a damn good reason to put non-standard values into a field.*

Today’s curse is sorting out the “Counties” field, which is supposed to be used to show British counties. To add insult to injury, people have to select from a drop-down menu, and if they try to type in a value that doesn’t exist, they are specifically asked if they mean to add that value (methinks the database is not locked down quite tightly enough, but I can’t control that bit at the moment).

People, let me tell you, from even basic general and geographical knowledge: Austria, Australia, BD20 0DX, Bahrain, Brentwood, Brighton, Jacksonville Canada, NW10 0HD… none of these are British counties, nor were they ever, nor will they ever likely be.

* That good reason doesn’t exist. The same goes for towns, titles, gender, and a whole host of other fields with clearly defined standards.

Creating unique reference IDs in Excel

I often need to create unique IDs in an Excel spreadsheet for importing into our database system. Doing this by hand is time-consuming and error-prone, so here is a method I’ve found which saves a lot of time and effort.

Let’s say I want to create an ID in the format “IMPORTA-1234”, where 1234 is a unique number from 1 to the total number of records in the spreadsheet. However, I would also like them neatly formatted (purist that I am), so that record 5 is shown as IMPORTA-0005 not IMPORTA-5, record 124 is shown as IMPORTA-0124 not IMPORTA-124, and so on.

1. Create a blank column in your spreadsheet, if one doesn’t already exist, for the ID. Call it something like ImportID or UniqueID.

2. Enter the following formula into the first data row (probably row 2 of the spreadsheet):
=CONCATENATE("IMPORTA-",LEFT("0000",4-LEN(ROW(A2)-1)),ROW(A2)-1)
(where A2 is the reference of the cell the formula is in)

3. Now copy and paste this into each row – being a relative formula ensures that each ROW references the correct cell.

What does this do? Let’s look at each part of the formula in turn:

CONCATENATE – This creates one string comprising the three distinct elements needed to form a unique ID.

Element 1 is the static “IMPORTA-” string.
Element 2 is the appropriate number of ‘0’ characters to pad out shorter numbers.
Element 3 is (row number minus one) which gives a unique number based on the record’s position in the spreadsheet (and accounting for a header row)

To ensure that the correct number of zeroes are used as padding, we use the LEFT function to grab a portion of the string “0000”. We use the ROW(cell reference)-1 function to get the unique number, and we use 4-LEN(ROW(cell reference)-1) to work out how many characters long that number is.

Let’s walk through a couple of examples to show how this works:

For row number 25, which is record 24:

First part of string is “IMPORTA-“
We now use LEFT(“0000”,4-LEN(ROW(A25)-1)).
The length of ROW(A25)-1 is 2 (24 is two characters long)
The (4-2) left most characters of “0000” are “00”
Final part of the string is the (row number – 1) = 24

Output ID = IMPORTA-0024

For row number 164, which is record 163:

First part of string is “IMPORTA-“
We now use LEFT(“0000”,4-LEN(ROW(A164)-1)).
The length of ROW(A164)-1 is 3 (163 is three characters long)
The (4-3) left most character of “0000” is “0”
Final part of the string is the (row number – 1) = 163

Output ID = IMPORTA-0163

and so on.

Once the spreadsheet is complete (no more records are to be inserted or deleted) it may be a good idea to ‘lock down’ the IDs by highlighing all the ID cells, copying them into the clipboard, and then doing a ‘Paste Special’ specifying to paste the actual values. This removes the formulae and makes the IDs static, but still unique of course.

To adapt for longer or shorter numbers, change the string “0000” and the number that the row length is subtracted from (in this case 4). For records from 001 to 999, use "000" and 3-LEN(ROW(reference); for records from 00001 to 99999, use "00000" and 5-LEN(ROW(reference), and so on.

My favourite formats

Daniel’s favourite file formats for distributing multimedia to friends, on the web, that kind of thing. I know a lot of hardcore geeks will know all this stuff (and disagree with some of it), but I was asked recently about it, so here goes.

For pictures containing large blocks of colour, including most screendumps out of Windows… PNG. GIF used to do the job for this, but PNG supports more than 256 colours, and is now in widespread use. Even Windows XP Paint manages to handle it, though older versions don’t.

For pictures with lots of variance in colour, such as most photos, it’s gotta be… JPEG. Try out the various compression settings to see what works. You can often shrink files down a surprising amount with little in the way of visible detail loss. Of course, keep a pristine copy if it’s a picture you’ll need to work on in the future. Windows XP Paint can do JPEGs… uhh, kind of… no control over compression.

For sound, the cross-platform standard now is MP3. A nice balance of quality and file size. Playback software is standard on any computer of the last 5 years or so (and if not already installed ‘cos it’s some ancient beast, is freely available), and unlike WMA it’s supported on pretty much every portable digital music player ever made. (I’m with Cameron – Microsoft should retire WMA. They won’t, of course.)

Movies? MPEG-1. Don’t get me wrong, I love DivX. DivX good. And in comparison, MPEG-1 is not the most efficient of formats, but even ancient creaking Windows NT Media Player supports it. No mucking about with making people download extra codecs or special players – any setup can handle it. No RealPlayer or Quicktime nagware. No Windows-only WMV that the poor Mac and Linux people can’t handle.

And if you use a decent encoder such as TmpGenC (free!), you can do a fair bit of tweaking so you get a watchable movie that’s not too humungous. Fiddle the bitrates, try a variable bitrate, reduce the frames per second, crop unused parts of frame, use a efficient sound compression. Takes a bit of practice, but worthwhile in my opinion. Hopefully in the near future, more efficient versions such as MPEG-2 and MPEG-4 will become standard on all computers. But in the meantime, I’m sticking with MPEG-1.

Agree? Disagree? Comment?

Introducing the message exchange

The company I work for is called eVision, and their main product is called MessageXchange. (See what happens to your spelling when you’re looking to find a good .com address?)

It’s basically a B2B message broker… messages go in, messages go out, and in the middle they get conditionally routed and transformed. The upshot is you can set up to hook up a bunch of systems that use completely different types of message… one system’s PurchaseOrderCreate in a fixed-length FTP’d batch file can happily go along as another system’s PurchOrdReq XML HTTP message.

The clever bit is in the monitoring, letting you see what’s pumping through the system at any time. And the fact that the whole shebang is configurable through a web interface.

It started out as a software package… well, not exactly a package, not in the MS Office sense, but a system you’d plonk on a Windows server or two and away you go. I haven’t been directly involved, but over the past year they’ve rejigged it as an ASP… a paid hosted service, that is, so that if you don’t want to run it on your own boxes, you pay to access it on fully maintained servers instead.

At the same time they’ve expanded its reportoire to cover a lot of the new and emerging XML standards such as ebXML and RosettaNet. As well as some of the more ancient, creaking standards like EDI.

Handy stuff. The whole B2B area must surely grow, it’s a no-brainer for reducing the cost of commerce. Will be an interesting area to watch. In fact I might get one of the guys to expand a bit on some of these topics…

MSXML HTTP Post: Access Denied error

In MSXML 4 SP2 (and later, I assume in advance) if you try to send Post data using the ServerXMLHTTP40 object to a site that’s in the Internet Zone, you get an Access Denied error.

This is another of those things that had me banging my head in frustration until I eventually solved the problem. Contrary to what you may first think, it’s not a bug.

It’s actually upgraded security in this release: it uses the IE settings, and if you try to send unencrypted Post data by HTTP to Internet zone sites, you run into trouble. Details are at the end of the readme for the SP2 release and in KB 820882, but the workaround given does not work in Windows 2000 because the MMC Snapin referred to is only in Windows XP.

Apparently now there’s a Windows 2000 hotfix you can get, but as with all hotfixes, it involves mucking about ringing up Microsoft PSS to get it. (And when it says “Applies to Microsoft Windows 2000 Standard Edition”… what is that, exactly? Maybe they mean Professional?)

You can also get into IE and change the zone settings, but it has to be the same user that runs your process. If the process is some kind of robot, it’s not always possible to do this.

Eventually I dug around in some MSKB articles and eventually found article 182569 that talked about how to change the relevant settings via the registry.

To tell Windows to ignore user-specific settings, and always use the zone setting you are about to define, create or edit the following Registry key:

HKEY_LOCAL_MACHINE; Software; Policies; Microsoft; Windows; CurrentVersion; Internet Settings; Security_HKLM_only (DWord value) = 1

The alternative would be to change the following setting for each user that will try and do the HTTP post, eg in HKCU/HKU instead of HKLM.

Okay, so to tell it to allow unencrypted HTTP Post data into the Internet zone:

HKEY_LOCAL_MACHINE; Software; Microsoft; Windows; CurrentVersion; Internet Settings; Zones; 3; 1601 (DWord value) = 0

See MSKB 182569 for more details.

File listings from zips

Need to get a file list out of a zip file? Winzip is a fine product, but don’t muck about with their recommended method, setting up a text file printer driver to print a list to, then having to chop out the paper-style headings and linefeeds.

No… Instead go to Info-Zip and grab their command-line zip package.

unzip -l filespec.zip [Optional filespec if you don't want them all] > filelist.txt

Easy.

PNG! PNG!

JPEG artefacts I’ve been discovering just how great PNG is for screendumps. It’s not lossy, yet it compresses particularly well for screendumps off Windows… even when there’s those gradient title bars that have become fashionable over the last couple of years.

For instance, the Path Not Found dump out of Windows explorer the other day:

PNG 22,140 bytes
JPG (saved out of Photopaint at compression 100 out of 255, noticeable loss of quality in the letters – See right) 76,824 bytes
GIF (converted to 256 colours, so some colour lost) 22,345 bytes
GIF (converted to 16 colours, so LOTS of colour lost) 17,285 bytes

Okay, so the 256 colour GIF is only marginally bigger, but to produce it you have to fiddle the colours, and of course it uses the proprietary LZW algorithm for which Unisys once would have wanted all our souls. PNG is just a save, no having to even think about it. Coolness.

GeekRant.org Primer: GPG encryption in five minutes or your money back

A while back I was given the task of setting up encryption for sending files around. Ooh. Sounds tricky, I thought. I’ve seen PGP signatures on privacy freaks’ e-mail for years now, but it all seemed a bit like black magic. I had no idea how it worked.

I went looking, and it turns out it’s not really particularly difficult to figure out or get working. But I had to wade through a few hefty (in web terms) manuals to find all the info I needed. I never really found a web page which detailed the basics in an easy to digest format. This could be that page.

PGP and GPG

PGP is Pretty Good Privacy, invented by Phil Zimmerman and now run by the PGP Corporation. It’s the defacto standard for this kind of stuff. It’s fairly secure, and has the added benefit of compressing text quite well. PGP sell a number of solutions, but if you’re wondering about freebies, then…

GPG is Gnu Privacy Guard, which is the free implementation of (most of) PGP. It lives here: www.gnupg.org.

How to use it

Encryption of this type is all about keys. If you haven’t grasped the key concept before, here it is in brief: a recipient has a public and a private key. The public key is given to anybody. Senders encrypt stuff using the public key. Only the recipient has the private key, and uses this to decrypt stuff. Obviously if communication is two-way, you need multiple public and private keys. Okay? Easy.
Continue reading →

Missing the bloat

A colleague was pasting a picture into his Powerpoint presentation. Some kind of diagram, and unfortunately he didn’t have the original document it came from, so no matter which Paste Special option he tried, it came through as a bitmap. Saved it to disk, e-mailed it to someone else, and wondered why it took so long.

Then he saw the size. It had blown out from a couple of hundred K to over 4Mb.

So he tried zipping it. WinZip took it down to, believe it or not, 80Kb.

No wonder people complain about Microsoft bloatware… sometimes it’s not just the apps, it’s the way they store stuff as well.