• Zygo (unregistered) in reply to DaveK
    DaveK:
    Most serious, professional, high-security cryptography software zeroes out its temporary memory after use.

    Yes, but such software does the zeroing once, which is sufficient to prevent attacks based on software. Doing the zeroing more often hurts performance and provides no extra security. If power to the device is killed before the zeroing code is executed then the zeroing doesn't get done. Someone who tosses your freshly stolen powered-off laptop RAM into a cooler full of dry ice while they whisk it off to their forensics lab to apply their nefarious technologies to it is not going to be inconvenienced. ;-)

    If the key is stored in a constant address in RAM then there is the possibility of it burning into the chips. Better start allocating key memory using a wear-levelling algorithm to select RAM addresses.

    Mind you, if the attacker is smart, they steal your laptop from you while it's powered on with the keys in memory, and just halt the CPU without cutting power to the machine (how that's done is left as an exercise for the WTF readers ;-). The motherboard logic will preserve the contents of RAM nicely while they connect the machine to a power source in their getaway vehicle, then later when they connect a logic analyzer to the DRAM bus. After that they just dump the RAM contents and fish out the key at their leisure.

  • JavaSucks (unregistered) in reply to Zygo
    Zygo:
    So...what you're saying is...that Quicksilver used O() notation correctly?
    No, he's saying that he's about to flunk out of his freshman "intro to CS" course. Gotta love Java schools.
  • Tibit (unregistered) in reply to pscs
    pscs:
    clively:
    If you don't believe me, just call a data recovery company. Tell them you formatted the drive and see if they can recover the data. They will tell you yes, for a fee.

    That's because formatting doesn't write over anything (except control tables (eg MFT etc on NTFS))

    Think about it from first principles here.

    It IS possible to see previous states of the data on a disk. It's very hard though.

    However, as far as I know HDD bits aren't in a FIFO queue. So, if you're looking at vague traces of data on a magnetic platter, you can't tell if that bit that looks like it might have been a '1' was a '1' at the same TIME as the bit that's next to it looks like it might have been a '0'. Any CRC will have been trashed as well (probably more so than the rest of the data), so there's no way of checking that what you think you've recovered is what the data originally was.

    IOW, if a byte on a disk was first 01010101 then 01001001 then 00010101 then 00000000

    At the end, with your ultrasensitive detector you will probably see traces of '1's in these positions 01011101

    The 'decay' of the magnetic coding would have to be so precisely uniform to be able to tell what a combination of bits were at a particular moment in time that it's just infeasible except on 24 or CSI..

    It's not as easy, but the results can be in fact stunning.

    Here's what one can do, in a nutshell (assuming that the drive as submitted is operational).

    First, drive's channels (all of them) are fully characterized. Like so:

    Usually one in the business will know how a drive chews the data before writing it - drives don't write the raw, they write the data after it passes through a scrambling algorithm.

    So, knowing what bits get written for which data, one characterizes how bits really look on the medium due to temperature, variations in the electronics, and in the medium itself. All of this gets converted into statistical measures, so that for any bit you will subsequently try to recover you'll have a good idea as to how it would have looked when written over a clean medium. You also characterize how the medium behaves when overwritten multiple times - how the old values decay, etc. Again, it all parametrizes a statistical model of the whole drive system. A lot of theory (and highly paid experts) goes into this phase. Each company will have (one or more) in-house developed analysis systems based on the theory and experience so far.

    What happens then is that you read the drive in multiple fashions. First of all, you do a read using drive's read system (duh): heads, read amplifiers and descramblers, servo loops. Everything is digitized in realtime, and generates usually 20-100:1 data expansion (after losslessly compressing!). A one gigabyte drive will blow up to 0.1 terabyte as a rule of thumb. This is done just so that the "obvious" are kept on record: how drive electronics performed on the data.

    At this point you're free to run the drive via it's regular IDE/SCSI controller, your digitized raw signals must demodulate to the same data that the controller feeds you. At this point you better knew everything there's to know about how that drive implements scrambling, error correction, its own data structures for management (thermal, servo, reallocation, flux gains) etc. I.e. everything that you have read so far must be fully explainable.

    Then some test writes are performed in the "clean" areas where previous data is known not to be valuable. First cylinder of the disk is a place which gets written to quite rarely: boot record, partition table, etc - pretty dormant places. Again, everything that leaves or enters drive's electronics is digitized. This is done so that drive's write channels can be characterized, including the servo trends of course. Depending on given lab's modus operandi, and whether the platters are to be preserved, you either have to give up any subsequent writing at this point, or feel free to write after you return hopefully intact platters from the scanners (below).

    This finishes characterization of the drive's subsystem. At this point you rip the platters out and put them on a scanning machine. You typically do a non-contact scan using something like spin-polarized SEM. Then you do a "contact" scan on a (typically in-house-made) machine which uses custom MR heads to get another scan of the drive's platter. Quite often the scans will be repeated after some processing of data shows that this or that can be tweaked.

    At this point you may have a few terabytes of compressed data per every gigabyte on the drive, and have spent thousands of dollars worth of amortization alone on your equipment. This is where the data goes to processing, which is necessarily mostly automated. Hand-holding is done to align the track models with actual scans, such that the analyzers can simulate various "heads" that scan the "tracks". This initial alignment is really about accounting for each drive's uniqeness and is pretty non-critical: one the analysis system locks in on the magnetic patterns, it's supposed to know what to do next. Remember: the amound of data is vast and it's wholly infeasible to do much by hand at this point.

    Depending on how good the data is after this analysis, you may need to improve on the resolution of "old" data.

    This is where you use another machine to very very carefully remove a very tiny layer of magnetic material from the platter. I don't know if some electrochemical methods exist for that. Sometimes you will do local annealing to try and kill off the masking effect of strong domains, this is done again using a custom, one-off tool which will very rapidly and locally heat the domain. This is of course an automated process: the platter's most magnetized domains are annealed under feedback from a scanner (say spin-sem or what's your favourite). Sometimes whole platter will be annealed under a "selective" process where annealing strength and depth depends on the data (or whichever trend is the most favourable at any given part of the platter). The analysis tools will quite often resort to (dis)proving statistical hypotheses in order to "peel" each layer of data, and as such they may try to "push" the remnants even in the wrong way.

    In this whole process, one must factor in tons of common "disk scrubbing" tools, some of which write very repeatable patterns. This is where some knowledge about what the "customer" could have potentially done to the drive helps. But in general, the whole process doesn't care diddly squat about what's actually on the drive. For every "location", a history of data is provided, "annotated" by lots of statistical data such as to further guide recovery based on data interpretation.

    Data interpretation is all about what the data means: if a given "layer" of data can be somehow known (i.e. the oftentimes noisy result of analysis can be cleaned up), it's easier to get clean(er) underlying data. This is where you'll actually use software that knows what filesystems are, etc. But at this point the drive is either back to the customer, or has been already pulverized. It goes without saying (I hope) that your systems include, among others, a pretty much complete database of every published binary (or otherwise) file of every common (and less common) software package out there [this really applies to commercial stuff, less to the blow-yer-bandwidth-to-pieces plethora of free software].

    I.e. if someone had some good data on the drive, and then say a defragmentation tool puts some Adobe Photoshop build whatnot help file over the spot, you better knew what the noise-free version of the file is. Implementation of such database is non trivial in itself (an understatement). The task is this: given a bunch potentially noisy 512 (or whatnot) byte sectors, find what OSes/applications were installed on that drive, and which file each sector corresponds to. This allows "staticalizing" a lot of noisy data, which helps in looking "under it", or for long-time-resident data which is not moved much be defragmenters, even classifying a part of the medium as COTS. For any particular customer, if they can provide you with some files/emails the user has been playing wiht, it helps as your COTS span grows beyond the mere minimum.

    I have really spared the details, which are actually mind-boggling. The infrastructure needed to do such analysis is, um, not for the faint of heart, and whether gov't or private, you need lots of funding to have a lab like that. And lots of talent, on all levels, from techs to the ivory tower theoreticians (?).

    From what I know, the whole process to this point takes tens of kilobucks and that's when the lab runs at full steam. Meaning that you churn a drive a day, and that's really pushing it. Understandably enough, a very non-WTFy IT infrastructure is crucial in processing all this data. Bandwidth-wise, it's all comparable to running a particle detector in an accelerator. Do understand that you'll want to run all the equipment non stop, preferably on customer units.

    When the equipment is not otherwise down for maintenance, you'll probably want to scan known good hard drives that have hit the market, to populate your characterization database (both for electronics and for the medium itself).

    Craaazy indeed.

  • (cs)
    to burn away any remains of the key from memory.

    Burn, burn the volatile memory!

    else it is possible to extract the key with some technology.

    Dude, if a squad of gamers storms your datacenter armed with assault weapons and proceeds to memory-dump your servers, you are going to worry about other things than what is or isn't stored how well on memory.

    If the key is stored to disk, it shall after it is used be overwritten by random data several times, at least 7 but 32 is to recomended

    Wow. Is this a game or a nuclear warhead arsenal? Military standards require 7 randomizations as far as I know...

  • d000hg.wordpress.com/ (unregistered)

    Games are notorious for being badly designed and poorly coded - I thought WTF didn't normally cover games?

  • (cs) in reply to Tibit
    Tibit:
    It's not as easy, but the results can be in fact stunning.

    I'm interested - which data recovery companies claim to be able to do this?

    ALL the ones I've looked at say 'once the data has been overwritten we can't get it back'. Most of them specialise in recovering data from failed drives (motors failed, heads crashed, fried electronics etc) where they will extract the platters in a clean room and re-assemble them with new mechanisms and electronics. They may use amplifiers to recover data from damaged platters, but none say they can recover overwritten data.

    Given that all the stuff I've read about data recovery (except Guttman's paper which was written 10 years ago, and other documents quoting him), says it can't be done with any degree of success I'm very sceptical. eg people say you MIGHT be able to recover, say, 40-60% of the BITS of data - which is useless if you think about it. If you know what the data was beforehand, this might be useful as confirmation, but otherwise, how do you know what

    0??1?11? ?10?0??0 ??000??0

    says?

    MFM drives were different because they were so inprecise that the write heads weren't accurately positioned and bits of data wouldn't always be overwritten on later writes. Nowadays to get the high data densities, they're much more precise.

  • Beerguy (unregistered) in reply to Tibit

    <It's not as easy, but the results can be in fact stunning.

    Here's what one can do......>

    As someone said earlier. Degauss, Shred then Burn.

    What you are describing would only be available to very large corporations or governments. I don't think that the 1337 crowd has access to those type of resources.

    A normal data shredder program should be enough for 99.99% of users. I have had to use a data recovery company in the past to recover some files for a client, and they could not recover overwritten files.

    CAPTCHA - atari

  • Andrei S. (unregistered)

    Looks like trying to brainwash the RAM. What about the L2-cache, did they forgot about it? :D

  • tibit (unregistered) in reply to pscs
    pscs:
    Tibit:
    It's not as easy, but the results can be in fact stunning.

    I'm interested - which data recovery companies claim to be able to do this?

    Hint: The one(s) that do, don't have a website, and don't need to advertise, and they aren't used by the corporate world either (save for a few friends) :)

  • tibit (unregistered) in reply to pscs
    pscs:
    Tibit:
    It's not as easy, but the results can be in fact stunning.

    otherwise, how do you know what

    0??1?11? ?10?0??0 ??000??0

    says?

    I wouldn't know, but the people who do this stuff know enough to figure it out well enough to be sufficiently well paid. I presume that a typical "data recovery" company is worth (throwing in all of its revenues from inception) less than some of the machines used in the labs I described :)

  • Cloak (unregistered) in reply to java.lang.WTFException
    java.lang.WTFException:
    what i find amazing is that the comment would seem to suggest totally different behaviour from the actual code (even if you correct the for condition)

    the code would suggest the memory getting zero-ed for a few seconds, while the code just runs through the array once (if we fix the for loop). Also the comment would seem to suggest that if you dont override a memory cell at least X times, its previous contents can still be recovered, which is not true, for hard disks, maybe, for RAM, not at all

    I would say RAM and HD work essentially the same way, through magnetisation. And some magnetisation could then remain also on RAM modules. Or do they use different material for that? Anyway its ridiculous to what point they push their "security" staff.

    consequat? maybe consequatsch

  • Ace (unregistered) in reply to Jimmy
    Jimmy:
    Who here would by a table with only three legs or a car without a possibility to fill it up?

    Actually, I have a three-legged table (by design). And, I would like a car that I didn't have to refuel.

  • Happy Dude (unregistered) in reply to Martin Gomel
    Martin Gomel:
    Dear reader:

    My name is Martin Gomel, and I am the lead developer for "Some Technology". If you are interested in purchasing "Some Technology" for elite hacking into game server code, please send a certified check for $1000 to:

    c/o Marin Gomel Some Technology Enterprises 245 Thisg'uy Wil Lbeliev Eanyt Hing PO Box 1200 Nigeria

    All softwares are garentee virusus free, or some monies back.

    Greetings, friends. Do you wish to look as happy as me? Well, you've got the power inside you right now. So use it and send one dollar to

    Happy Dude 742 Evergreen Terrace Springfield.

    Don't delay. Eternal happiness is just a dollar away.

  • Happy Dude (unregistered) in reply to Jimmy
    Jimmy:
    Ahh so that should help to explain why games are so fast...oh wait, maybe more likely to explain why games are filled with bugs.

    I do understand that game developers wants their product to reach the shelf as quick as possible but seriously. Who here would by a table with only three legs or a car without a possibility to fill it up with gas (or whatever fule you are using).

    Thats more or less what game developers are sending to their customers, no wonder people are annoyed at them. It have happened at time that people I know buy a game and doesn't get it running but by downloading a pirate version they have no trouble what so ever.

    CAPTCHA: pirates...yeah I was just writing about that...

    I would, three legged tables to wobble like 4 legged tables.

  • Cloak (unregistered) in reply to Christopher
    Christopher:
    Martin Gomel:
    Dear reader:

    My name is Martin Gomel, and I am the lead developer for "Some Technology". If you are interested in purchasing "Some Technology" for elite hacking into game server code, please send a certified check for $1000 to:

    c/o Marin Gomel Some Technology Enterprises 245 Thisg'uy Wil Lbeliev Eanyt Hing PO Box 1200 Nigeria

    All softwares are garentee virusus free, or some monies back.

    Before anyone thinks this is a real post, try sounding out the street address...

    Blitzmerker

  • Arioch (unregistered) in reply to foo
    foo:
    I'm going to encrypt my password multiple times in DES and then spray paint it on the side of this bridge and post a photo of the bridge on my homepage. They will never break my security system because we use encryption. So I can feel safe putting my password on the side of that bridge.

    To increase security in a password just encrypt it multiple times in a loop like this one:

    level = 3 password = "plain text"; for(i=0;i>level;i++){ password = cryptDES(password); }

    Look, now no one will be able to break my encryption because the DES is applied "level" number of times. This is super efficient because when I test this it runs blindingly fast even if I set level to something rediculalsy large like five billion. It's amazingly effecient.

    I'm using level 2048 encryption now. It doesn't slow anything down. In fact everything runs just as fast as if I never encrypted anything. I know its secure because its encrypted. So I can put all my biggest secrets on the sides of bridges in bright red spray paint.

    because the DES is applied "level" number of times ...with the same and very key! That does not give your more security, while You keeping the key the same for all you O(const) passes of loop, you moron :-)))

    PS: thinking about something like "level := High(i) - real_level + 1;" :-)

  • (cs) in reply to Zygo
    Zygo:
    DaveK:
    Most serious, professional, high-security cryptography software zeroes out its temporary memory after use.
    Yes, but such software does the zeroing once, which is sufficient to prevent attacks based on software. Doing the zeroing more often hurts performance and provides no extra security.
    Well, I completely agree. Mind you, the code sample we're discussing here doesn't zero it more than once. Matter of fact, it doesn't even do it once, but if the loop condition typo was fixed it would do it only once; there's no hint of an outer loop around what we were shown.
    Zygo:
    If the key is stored in a constant address in RAM then there is the possibility of it burning into the chips. Better start allocating key memory using a wear-levelling algorithm to select RAM addresses.
    PGP corp. implemented a clever and simple solution for this in PGPdisk: they store the key at a constant address sure enough (and that memory is locked out from paging by a kernel-mode device driver), but to avoid any hint of the danger of burn-in they flip all the bits in the key at regular (short) intervals. Each ram cell spends the same amount of time at zero and one, equalling out any cumulative charge carrier accumulation or depletion.
  • (cs) in reply to Cloak
    Cloak:
    Christopher:
    Martin Gomel:
    Dear reader:

    My name is Martin Gomel, and I am the lead developer for "Some Technology". If you are interested in purchasing "Some Technology" for elite hacking into game server code, please send a certified check for $1000 to:

    c/o Marin Gomel Some Technology Enterprises 245 Thisg'uy Wil Lbeliev Eanyt Hing PO Box 1200 Nigeria

    All softwares are garentee virusus free, or some monies back.

    Before anyone thinks this is a real post, try sounding out the street address...

    Blitzmerker

    Gesundheit!

  • Teh Optermizar (unregistered) in reply to Zygo
    Zygo:
    Teh Optermizar:
    mikecd:
    lgrave:
    That is wrong. As you can see there is a 'n':
    for (int i = 0; i < 4; i++) {
    Thank you for bringing this to my attention. I'm going to spearhead a project here to change all of our iteration counters from int to short.

    Tsk tsk... such typical knee-jerk reaction... think about what you are doing man, throwing away the ability to do large numbers of iterations by using short...

    Might I suggest using a float... you get the best of both worlds, neatly avoiding the O(n) issue, and also allowing for a truly massive number of iterations ;)

    Except that you can't stop iterating once you hit 8 million or so.

    No, it's better to do this:

    // trwtf.h
    typedef int imt;
    
    // trwtf.c
    #include "trwtf.h"
    // The "n" above doesn't count because it's in preprocessor code, not in the algorithm.
    
    for (imt i = 0; i < 4; ++i) {...}
    

    Pish posh... don't bother me with trifling details such as floating-point error!

    I'm telling you man, floats FTW!!1!eleven!

    And just think, if you start chucking around some SSE, you could increment 4 loop counters at once! zomg! Thats, like, O(1/4)!!

  • lantastik (unregistered) in reply to Teh Optermizar

    More people have whipped out their ePenises in this comment thread than any other I have ever seen before...and that is saying a lot for WTF.

  • Boxer (unregistered) in reply to mrprogguy
    mrprogguy:

    Dear DaveK,

    You really should read the entire post before you go shooting your mouth off at some guy who's writing a fairly funny parody of a 419 scam attempt.

    Most serious, professional posters do. Apparently you zeroed out your temporary memory before starting your reply.

    hmm.. where do i sign up to become one of those? "professional posters" ? would be lovely to make a living by just posting comments.

    Though given the ratio of time i have spent working and the time ive spent just reading these forums, i possibly am already paid to post :D

  • Boxer (unregistered) in reply to DaveK
    DaveK:
    mrprogguy:
    Most serious, professional posters do. Apparently you zeroed out your temporary memory before starting your reply.

    Bwaaaaah! "Professional posters"? Is that the current euphemism for "people who spend all day on the 'net because they have no life"? Sure, I may misinterpret or misunderstand things or just be plain wrong sometimes. But I'd really rather be any/and/or/every kind of wrong forever than fall under the impression that being a "professional poster" was somehow a worthwhile way of life...

    Damm.. i am not professional enough it seems.. i replied before reading the rest of the posts...
  • ludus (unregistered) in reply to Brian
    Brian:
    From Wikipedia: http://en.wikipedia.org/wiki/Computer_forensics

    "RAM can be analyzed for prior content after power loss. Although as production methods become cleaner the impurities used to indicate a particular cell's charge prior to power loss are becoming less common. Data held statically in an area of RAM for long periods of time are more likely to be detectable using these methods. The likelihood of such recovery increases as the originally applied voltages, operating temperatures and duration of data storage increases. Holding unpowered RAM below − 60 °C will help preserve the residual data by an order of magnitude, thus improving the chances of successful recovery."

    Ram and hard disks can both absolutely be examined after powering down even, and there are plenty of tools that would allow someone to examine RAM contents while powered up.

    You have to put this in the context of a cracker that makes a living doing basically nothing but breaking copyright protections on software. He/she can probably get the hardware required to do these things.

    Otoh, if the hardware is compromized, you may as well throw all hope of security out the window from the get go. Not much will be secure then. And even if its a case where someone managed to get access via a network, a cracker having access to the machine can find out way more from other things than what may have been on the RAM a second ago. Such as grabbing the whole damn server software and analyzing it back home. ;)

  • Stefan W. (unregistered)

    Haha, indeed very funny!

    How many layers do you recover that way? What's the cost for - let's say: 1 MB? How long will it take, to find all layers of a 1 MB section?

    Even in Gutman times, restoring drives didn't happen. Just a few bytes were recovered.

    However - a clever idiot would write randoms patterns (7 times or 32 times, depending on his idiocy) multiple times to the disk before using it. Later on he only needs to overwrite it once per file, because the secret memory of the disk is already filled with garbage. :)

  • john (unregistered) in reply to clively
    clively:
    The only real way to make sure the data is not recoverable is to physically destroy the drive. However, you can make it near impossible by filling the drive with random data over multiple passes; 32 is the NSA accepted number.

    If you don't believe me, just call a data recovery company. Tell them you formatted the drive and see if they can recover the data. They will tell you yes, for a fee. About the only time they can't recover all of the data is if a drive head crashed into the platters; but they can usually get some of it depending on the physical damage.

    You're dead wrong. I've personally done exactly that.

    I work in IT security, and a co-worker attended a forensics class along with some FBI members. They got talking about data recovery etc, and said even they do not possess technology to retrieve data once overwritten even just once.

    I'd heard that pro shops could recover it using magnetic force scanning tunneling microscopy (STM), and we bet a lunch on whether or not they'd be able to. I thought they would.

    So, after about 30 calls to leading data recovery experts, NOT ONE could recover data from a drive overwritten with zeros even just once, regardless of how much I wanted to pay them.

    So, in summary, you're wrong, like I was.

  • 10gauge (unregistered)

    Actually most optimizing compilers would strip that crap out. Using SecureZeroMemory() or:

    #pragma optimize("", off) memset(data, 0, sizeof(data)); #pragma optimize("", on)

    would be a better choice. Not sure why clearing the memory multiple times would make a difference? Looks retarted to me.

  • Kaenneth (unregistered)

    If if were possible to recover data 32 writes deep on a hard drive platter, wouldn't you think some storage companies would use those extra 'layers' to store more user data?

    As for encrypting IP addresses, with the envelope principle, if you encrypt everything, then it's not obvious which peices are important.

  • Alex Girgoriev (unregistered) in reply to Tibit

    [quote user="Tibit] It's not as easy, but the results can be in fact stunning.

    Here's what one can do, in a nutshell (assuming that the drive as submitted is operational).

    First, drive's channels (all of them) are fully characterized. Like so:

    Usually one in the business will know how a drive chews the data before writing it - drives don't write the raw, they write the data after it passes through a scrambling algorithm.

    So, knowing what bits get written for which data, one characterizes how bits really look on the medium due to temperature, variations in the electronics, and in the medium itself. All of this gets converted into statistical measures, so that for any bit you will subsequently try to recover you'll have a good idea as to how it would have looked when written over a clean medium. You also characterize how the medium behaves when overwritten multiple times - how the old values decay, etc. Again, it all parametrizes a statistical model of the whole drive system. A lot of theory (and highly paid experts) goes into this phase. Each company will have (one or more) in-house developed analysis systems based on the theory and experience so far.

    What happens then is that you read the drive in multiple fashions. First of all, you do a read using drive's read system (duh): heads, read amplifiers and descramblers, servo loops. Everything is digitized in realtime, and generates usually 20-100:1 data expansion (after losslessly compressing!). A one gigabyte drive will blow up to 0.1 terabyte as a rule of thumb. This is done just so that the "obvious" are kept on record: how drive electronics performed on the data.

    At this point you're free to run the drive via it's regular IDE/SCSI controller, your digitized raw signals must demodulate to the same data that the controller feeds you. At this point you better knew everything there's to know about how that drive implements scrambling, error correction, its own data structures for management (thermal, servo, reallocation, flux gains) etc. I.e. everything that you have read so far must be fully explainable.

    Then some test writes are performed in the "clean" areas where previous data is known not to be valuable. First cylinder of the disk is a place which gets written to quite rarely: boot record, partition table, etc - pretty dormant places. Again, everything that leaves or enters drive's electronics is digitized. This is done so that drive's write channels can be characterized, including the servo trends of course. Depending on given lab's modus operandi, and whether the platters are to be preserved, you either have to give up any subsequent writing at this point, or feel free to write after you return hopefully intact platters from the scanners (below).

    This finishes characterization of the drive's subsystem. At this point you rip the platters out and put them on a scanning machine. You typically do a non-contact scan using something like spin-polarized SEM. Then you do a "contact" scan on a (typically in-house-made) machine which uses custom MR heads to get another scan of the drive's platter. Quite often the scans will be repeated after some processing of data shows that this or that can be tweaked.

    At this point you may have a few terabytes of compressed data per every gigabyte on the drive, and have spent thousands of dollars worth of amortization alone on your equipment. This is where the data goes to processing, which is necessarily mostly automated. Hand-holding is done to align the track models with actual scans, such that the analyzers can simulate various "heads" that scan the "tracks". This initial alignment is really about accounting for each drive's uniqeness and is pretty non-critical: one the analysis system locks in on the magnetic patterns, it's supposed to know what to do next. Remember: the amound of data is vast and it's wholly infeasible to do much by hand at this point.

    Depending on how good the data is after this analysis, you may need to improve on the resolution of "old" data.

    This is where you use another machine to very very carefully remove a very tiny layer of magnetic material from the platter. I don't know if some electrochemical methods exist for that. Sometimes you will do local annealing to try and kill off the masking effect of strong domains, this is done again using a custom, one-off tool which will very rapidly and locally heat the domain. This is of course an automated process: the platter's most magnetized domains are annealed under feedback from a scanner (say spin-sem or what's your favourite). Sometimes whole platter will be annealed under a "selective" process where annealing strength and depth depends on the data (or whichever trend is the most favourable at any given part of the platter). The analysis tools will quite often resort to (dis)proving statistical hypotheses in order to "peel" each layer of data, and as such they may try to "push" the remnants even in the wrong way.

    In this whole process, one must factor in tons of common "disk scrubbing" tools, some of which write very repeatable patterns. This is where some knowledge about what the "customer" could have potentially done to the drive helps. But in general, the whole process doesn't care diddly squat about what's actually on the drive. For every "location", a history of data is provided, "annotated" by lots of statistical data such as to further guide recovery based on data interpretation.

    Data interpretation is all about what the data means: if a given "layer" of data can be somehow known (i.e. the oftentimes noisy result of analysis can be cleaned up), it's easier to get clean(er) underlying data. This is where you'll actually use software that knows what filesystems are, etc. But at this point the drive is either back to the customer, or has been already pulverized. It goes without saying (I hope) that your systems include, among others, a pretty much complete database of every published binary (or otherwise) file of every common (and less common) software package out there [this really applies to commercial stuff, less to the blow-yer-bandwidth-to-pieces plethora of free software].

    I.e. if someone had some good data on the drive, and then say a defragmentation tool puts some Adobe Photoshop build whatnot help file over the spot, you better knew what the noise-free version of the file is. Implementation of such database is non trivial in itself (an understatement). The task is this: given a bunch potentially noisy 512 (or whatnot) byte sectors, find what OSes/applications were installed on that drive, and which file each sector corresponds to. This allows "staticalizing" a lot of noisy data, which helps in looking "under it", or for long-time-resident data which is not moved much be defragmenters, even classifying a part of the medium as COTS. For any particular customer, if they can provide you with some files/emails the user has been playing wiht, it helps as your COTS span grows beyond the mere minimum.

    I have really spared the details, which are actually mind-boggling. The infrastructure needed to do such analysis is, um, not for the faint of heart, and whether gov't or private, you need lots of funding to have a lab like that. And lots of talent, on all levels, from techs to the ivory tower theoreticians (?).

    From what I know, the whole process to this point takes tens of kilobucks and that's when the lab runs at full steam. Meaning that you churn a drive a day, and that's really pushing it. Understandably enough, a very non-WTFy IT infrastructure is crucial in processing all this data. Bandwidth-wise, it's all comparable to running a particle detector in an accelerator. Do understand that you'll want to run all the equipment non stop, preferably on customer units.

    When the equipment is not otherwise down for maintenance, you'll probably want to scan known good hard drives that have hit the market, to populate your characterization database (both for electronics and for the medium itself).

    Craaazy indeed.[/quote]

    OK. consider this:

    Modern hard drive has track pitch less than 100 nm, bit length less than 20 nm, and magnetic coating as thin as under 20 nm. It doesn't use your dad's MFM or RLL, nor binder-based coating. Read pulses of separate fluxes are superimposed. To decode data, the analogue signal is matched against expected response of various patterns, using approaches similar to analog modems. This is called PRML - Partial Response, Maximum Likelihood. Bit error rate is pretty high, and to manage that, it uses quite sofisticated Reed-Solomon codes. That said, it's quite a miracle that such little bits can be read at all at such high speed. If you simply overwrite the original data with just anything, 1) it will go well below random noise, and 2) you won't be able to subtract the new data, because you cannot be sure what to subtract.

    Regarding data retention in DRAM, if it lost power, it will keep charge longer than under power without refresh. That said, I observed a device which kept much of its DRAM contents while unplugged from USB power for more than 20 seconds. That caused problems for its power cycle detection in the firmware and host software. The software was not sure if it needed to reload the firmware. RAM Burn-in, of course, is just steer manure.

  • Andrew (unregistered)

    Who's out in space intecepting and jaming the radio signals that the CPU gave off as it processed the encrytped data?

  • (cs) in reply to Happy Dude
    Happy Dude:
    Martin Gomel:
    Dear reader:

    My name is Martin Gomel, and I am the lead developer for "Some Technology". If you are interested in purchasing "Some Technology" for elite hacking into game server code, please send a certified check for $1000 to:

    c/o Marin Gomel Some Technology Enterprises 245 Thisg'uy Wil Lbeliev Eanyt Hing PO Box 1200 Nigeria

    All softwares are garentee virusus free, or some monies back.

    Greetings, friends. Do you wish to look as happy as me? Well, you've got the power inside you right now. So use it and send one dollar to

    Happy Dude 742 Evergreen Terrace Springfield.

    Don't delay. Eternal happiness is just a dollar away.

    One dollar for eternal happiness. Mmmm... I'd be happier with the dollar.

  • Babbage (unregistered) in reply to Zygo

    Err.....

    A couple things here.. First of all, something I haven't seen mentioned here yet... no preprocessor is going to look at code like that (even IF it did work), and allow it to stay equivalent to how it's been presented by the coder.. If a preprocessor sees the net result will always be the same, to hell with your code, it'll forgive you for being human and build the binary according to the best-form shortcut.

    Overwriting memory is pointless. Memory doesn't work that way, Now, if the page in memory that code resides in happens to make it to swap at some point, that's different, but we're talking about two mosquitos colliding in the grand canyon at that point.

    A true WTF.

  • (cs)

    I disagree that these routines are blazingly fast.

    They are still consuming precious instructions to do the initial compares even if it will always equate to false, and the loop will not execute :)

  • disco (unregistered) in reply to bdew
    bdew:
    Nutmeg Programmer:
    To begin with, most people today are running in a virtual machine

    Just imagine that: neo takes the red pill and escapes the matrix just to find that it is running in a VM an he's now in the host matrix 0.o

    Haha! I think that's what happened in the third movie and I defy anyone to repudiate this claim because I don't think anyone was really that sure at the time of production.

  • dinesh kumar (unregistered)
    Comment held for moderation.

Leave a comment on “It Only Seems Redundant and Stupid”

Log In or post as a guest

Replying to comment #:

« Return to Article