Disks Not Configured in FreeNAS 9.1 Release

Status
Not open for further replies.

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Ok it sounds like Fln & VT have a fairly good idea of what they are doing. The really important thing to know is make sure you are using physical RDMs and not virtual ones. Virtual is indeed the kiss of death. Physical ones are fine and there is no reason to fear for your data if your drives are passing in unique serial #s which I saw someone verify that they where for at least one of the installs. The only real danger is vmware might remove or break the function in 6.0 or beyond since it's not exactly a supported method in their eyes, RDM was designed for passing though LUNs for fiber channel SANs, see below for more comments.

So go find the threads where people had a RAIDZ2, one disk failed, so they replaced it, and suddenly 3 other disks are "CORRUPT" and their zpool is unmountable. Cause that's exactly what has happened to quite a few people that have tried to do RDM and then showed up in the forum to tell us mods and VM gods that we're morons because it works great. This is the principle reason why RDMs are so strongly advised against. It sounds great, and it works great, until it doesn't.

Also it looks like the root cause has been tracked down to a known FreeBSD bug. I'd suggest whoever wants to champion this issue go ahead and register at http://support.freenas.org and create a ticket with the important details posted in this thread. Several of the ix guys are also FreeBSD developers and can possibly poke the someone who is responsible for the broken code. I'd then monitor FreeBSD and watch for the bug fix, it would be nice if they'd fix it before 9.2 ships(which is very soon). If you can find when they patch FreeBSD and what the MFC # is for the patch you can probably then get the FreeNAS guys to pull it in for the next FreeNAS release. In the mean time stick to the 8.3.x series until this gets straightened out. Also please use LSI SCSI or SAS(I'd probably try SAS though SCSI might be a safer choice) has your controller type, buslogic might work but that's also looking for trouble down the road.

*cough*, those people that I mentioned above that had RAIDZ2s that failed.... they were using 8.x...So don't tell me that after this 9.x bug is fixed all will be better. Remember, some of us mods read virtually every thread created, so we see what people do wrong regularly and what people do right. And let me tell you from person experience, those people that lose data really stick in your mind. We always want to know whether it was stupidity(like a 10 drive striped zpool), neglect(turned it on and never looked at it again), or an actual issue that needs to be fixed.

It's been well hashed out now that RDM isn't the best way to fly but some days it is the only way to fly.

Yeah, if you plan to fly with your gas tank already on "E". Then sure, go ahead. But what goes up will come down...in a ball of fire.

IThat said I've successfully run a production server using RDM physical disks(it just holds my backups so I'm not loosing sleep at night - but the backs do put some real stress on the VM & FreeNas during the backup session), it wasn't my 1st choice but I had no other choice. The server had a adaptec RAID controller that I could pass through but it had issues and it was not in my control to replace it with a desired LSI card. Using an adaptec controller with FreeBSD is probably worse then RDM pass through if you love your data(if you don't believe me read the source code for the adaptec controller). Eventually I was able to build myself an adaptec driver from source the worked ok and I switched from RDM to using that, just because I feared someday vmware would nuke RDM, that and I like being able to bring up my ESXi boxes from a fresh install without spending and bunch of time in the cli building RDM files to bring my SAN online.

Well, good for you. And unfortunately, you're that person I mentioned above that shows up, says its worked great for so long, but you'll eventually lose everything. And then you'll be back in this thread advocating against it like most RDM-ers have done. You'll go back and edit your posts to make sure everyone understands that it just doesn't work and your advice and recommendation has made a 180.

Notice that STILL none of the VM gods in the forum have posted...But good luck. Gonna unsubscribe to this thread thread now. I get pissed just listening to the ignorance of some people and how they really think they know better than the guys that have been here since 8.0.
 

pbucher

Contributor
Joined
Oct 15, 2012
Messages
180
Notice that STILL none of the VM gods in the forum have posted...But good luck. Gonna unsubscribe to this thread thread now. I get pissed just listening to the ignorance of some people and how they really think they know better than the guys that have been here since 8.0.

So well I might not be a VM god, but I really doubt there are is anyone else on this forum that has pushed as many TBs of data or # of disk transactions through FreeNAS in VMs as I have. Well that doesn't make me better then any of them, I just happen to be in a position where I can clock lots of data moving though various FreeNAS SANs for one reason or another.

Now back to RDMs and loosing all your data, your point about RAID2Z arrays going corrupt fails to mention if we are talking about physical or virtual RDMs, there is a bit of a difference between the two. Also it might simply indicate the need to develop some documentation on how to swap disks in this situation. First thing that comes to mind is RDMs would not be hot swappable, which I'm guess folks probably tired to pull the failed and insert a new drive. I'd think the correct procedure would include power down VM, remove RDM from VM, remove drive from physical hardware, insert new drive, create new RDM mapping for new disk, attach it to the VM and power it up. I'd be truely stunned if this procedure would corrupt a pool. Also I'd suspect that the zpool was corrupted before the drive swap and the drive swap just brought the corruption to light. I bet many of these zfs pool deaths where not because it was in a VM would have happened in the physical world just the same. Folks who just power off VMs also probably run physical NAS boxes without UPSes also.

Finally word is VM or physical one should setup and run regular ZFS scrubs to detect and repair corruption, us a UPS and have automated monitoring and shutdown setup.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I don't know about that. We have someone that builds ESXi servers for a living. So he probably has you beat on TB of data in VMs. :P

One guy that was here had something like 5+ years of ESXi experience. He gives training lectures on ESXi implementation and he couldn't understand why RDMs seem to break. He traced the RDM back to the serial number of the disk and did the replacement. But still ended up with multiple "corrupt" disks in his RAIDZ2. He really didn't understand where things went wrong, only that he couldn't have possible done anything to "damage" the other drives.

One thing that someone had said a while back(not sure if I'm getting it all right though) is that RDM doesn't map correctly with local storage regardless of how you do it. The implementation isn't quite correct and disk reads and writes to/from disk X can suddenly be changed to disk Y without warning, with the VM running. Then all you have done is trashed more disks. It was hypothesized that some people have seen that issue, but there is little way to prove it.
 

FlynnVT

Dabbler
Joined
Aug 12, 2013
Messages
36
The only reason I posted was that I was hoping to provide some knowledge that was more than "just don't do that.. its stupid". But instead I got an attitude over it, hence I chose to leave the thread.

Gonna unsubscribe to this thread thread now. I get pissed just listening to the ignorance of some people and how they really think they know better than the guys that have been here since 8.0.

@cyberjock: I'll keep this short out of respect for your helpful posts elsewhere and aspects of the role you fulfil on the Forum. On the last page you mentioned that you once operated a nuclear reactor? You'll understand the function of a Moderator in that context. In stark contrast, your routine here seems almost designed to make things flare up. There's no need for that kind of behaviour here.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Nice one with the moderator. ;) If only Chernobyl understood moderators a little better...

Anyway, anyone that shows up and tells you that you're doing something that's not going to work out right, despite the overwhelming body of evidence supporting one's side, is still going to be taken as a comment to "flare things up". People don't like being told "no". And IT people REALLY don't like being told "you can't do that". One reason why I avoided working in IT. Too many people will gladly swear up and down that something can be done and that all the warnings, error messages, etc should be ignored until they figure out for themselves the technological reasoning for it. I don't have the technological reasoning to provide, nobody does. We just know that when it goes bad it goes bad without warning and without a chance of recovery. You don't know how many threads I've read with the exact same discussion with ESXi. I used to sit and watch, and I did for about 8 months. But I couldn't sit on the sidelines forever because I got tired of seeing so many people lose data. I hate those kinds of threads. Threads with people losing their data and used ESXi with RDM used to be a weekly thing.

Not sure how much forum reading you do, but I've cut back on my postings by a boatload recently. With the release of 9.1 the number of ignorant people that don't want to read the manual, read the stickies, or even accept some of the realities that others have dealt with for more than 2 years is amazing. And I thought it was pretty bad pre-9.1. If I had known that 9.1 would bring so many ignorant people to FreeNAS I would have quit the forum months ago. It may be time for me to "retire". It's sad to see where things are going and to know that its not for the better. I know I'm not the only moderator that's cut back lately, 2 others already have, I'm just a little slow and wasn't willing to accept reality.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526

Thomas

Dabbler
Joined
Jun 30, 2013
Messages
29
Thank you Cyberjock! And to all you guys: I got on board the VM+FreeNAS Express before there were obvious warnings everywhere, so please spare me the "blame it on yourself" talk. :) Reading this thread I got some ideas though:

- FlynnVT: do you have any more information on VT-d? My hardware should be VT-d capable and I might use it to access the disks without RDM, thus maybe recovering my data...
- I am on 8.3.1 now, I'll try with 9.1. Who knows...
- I might run FreeNAS live besides ESXi (4.1) and try to access the disks.

UPDATE: Is there a "Live" option anyway? I can't seem to find it at first glance.
UPDATE2: All options failed. Looking forward to your input.. :(
 

budmannxx

Contributor
Joined
Sep 7, 2011
Messages
120
We need a new section under Help & Support called "Virtualization" with cyberjock's and the manual's warnings as stickies at the top. Then the VM people could have their own place to discuss issues while the other sections wouldn't be affected.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I wasn't even going to respond to your comment(nothing personal). Virtualizing is a very hot topic because people don't take "its stupid to do this" for an answer and only adds more tension to the forums. But here's the unedited no-sugar-coating answer...

I actually proposed this idea about a month ago. The concensus among the few mods that discussed it was that it just wasn't a good idea.

Here's the reasons why:

1. The FreeNAS forums provide instructions for a VM but it doesn't specify a setup for production use. It says something like "for testing, experimenting, and learning". Everyone seems to assume that because there's a section of the manual devoted to using FreeNAS in a VM that it MUST be safe to use in a VM. (See how they jumped to the conclusion? Keep reading...)
2. As a partial extension from #1, if a section of the forum was created that was for virtualization questions, then everyone would also make the logical jump that it MUST be safe to use in a VM.
3. When things go wrong with the virtualization, its never been the fault of FreeNAS. At least, it hasn't been in the 18 months or so I've been on the forums. It usually has to do with the Hypervisor that was used(both type 1 and type 2). As such, Hypervisor support isn't something that should be provided in the forums. You really should be going back to your Hypervisor's developers for support. After all, FreeNAS is just FreeBSD with a pretty UI. This forum shouldn't be the place to discuss support for virtualization issues, the hypervisor designer is. They're the ones that will be having to implement the fix. Just about any issue you have with FreeNAS is almost certainly also a problem with FreeBSD. There's nothing that FreeNAS does that you couldn't do yourself from the command line. So why wouldn't you take your issue to the hypervisor's forum and get your solution there?
4. Lots of people that claim to have years of virtualization experience have lost their data due to virtualizating. They have no clue what went wrong and no idea how(or if) you can fix it. It really is a gamble. People have done everything right and still lost data with no explanation or understanding of what went wrong. So should we really be inviting FreeNAS users to go to virtualizing when we know it isn't something that should be done by the vast majority of users? Not to mention that everyone that shows up in this forum always thinks they know better and can handle things because you are always smarter than everyone else. The reality is that most people here are average(that is after all the definition of average). And the average person should not be attempting to virtualize.
5. Virtualizing makes lots of promises. It really does. It promises to save you lots of $$$ on your car insurance.. err hardware and software, you'll be able to make your system multipurpose for virtually nothing, that it won't rain when you want to grill steaks at a park, it'll save your marriage, take your dog for a walk for you every day and it'll keep your data safe. Nobody wants to think about how ugly things can be though. The reality is virtualizing can cost you a lot more than just $$$, it will cost you your marriage when you lose your wedding pictures, it'll kick your dog repeatedly behind your back, and it won't keep your data safe. Do you know how many people even understand what a hypervisor type 1 and type 2 are and what the difference is? I bet whatever percentage you make up, it's probably 1/2 that, at best. People really don't know what virtualizing is. But as soon as they see the promises they want in! Ignorance is bliss.
6. At least 80%(and probably more like 95%) of all of the forum users do NOT backup their data. So after considering numbers 1 to 5 do we really want to give any hint that going with virtualizing is in any way recommended or endorsed?
7. Guess what happens when %NewbieFreeNASuser% loses their data because they were part of that 95%+? They show up in here complaining about their lost data, the fact that they had no idea that FreeNAS was so unreliable and untrustworthy, that the forum failed to help them recover their data, that life sucks, their wife wants a divorce now, their dog died, etc. Remember that FreeNAS is sold by iXsystems as TrueNAS, and if lots of people are doing things that are less than smart and losing data, that doesn't matter. That's someone else that will tell all their friends how much FreeNAS sucks does matter. Do we really need that? So many people create an account on the forum and have a single post. That post contains "I lost my data cause FreeNAS sucks". Don't want it and don't need it in the forum. It adds zero value. If you want to be stupid and do things that cost you your data, its totally your choice. But the least us mods can do is try not to give a false indication that VMs are completely safe.

Now, I realize there's a handful of people in the forum that can and do use FreeNAS in a VM. Yes, I am one of those people.. and I keep religious backups of my data too. And I have a problem that I'd LOVE to post on the forum and get answers for. (For my situation I built 2 identical systems, one has the issue and one doesn't!) The only reason I did virtualize is because I had one-on-one support with our virtualizing master, I keep backups, and it allows me to experiment with FreeNAS to provide better support for oddball questions in the forum(which I use regularly).

So how do you separate the people that do know what they are doing from the people that don't? Do we even need to stop and separate the people into those groups? Short answer is that we do(see #7) but I can't tell RandomUser#1 from RandomUser#2 and tell you which one should and shouldn't virtualize. Notice that the manual doesn't really say anything about virtualizing FreeNAS in production. It points you to the forum. That was a smart choice and a deliberate one.

Right now, the discussion is about getting rid of some categories instead of adding more. There are so many and with all the cross posting people do, going off-topic of the original post, posting in the wrong section, etc. it's just an "all you can eat buffet" now.

So there you have it. Plain and simple. It's one of those things in life that sounds great in theory but really doesn't work out in practice. As unfortunate as it sounds, its the plain and simple truth. Maybe someday when things work better with virtualizing it will change. But I'm not counting on that because ZFS was designed to have total control of the hardware. And you are breaking that expectation anytime you virtualize.
 

budmannxx

Contributor
Joined
Sep 7, 2011
Messages
120
No worries, man. I was hoping to have a place for the VM crowd (which I'm definitely not a part of, at all) to go so they'd stop filling up the other sections. My idea was to put the warnings at the top (including something about it not being supported in production, people shouldn't expect answers from the experts, etc) and then let them have at it off to the side.

I doubt we'll ever get rid of cross-posting (or complaining when people don't get answers), but the solution to VM posts in any other section could be to just move the post back to the correct place (the VM section) and be done with it.

I understand it's not as simple as "just add a new section" though. Thanks for the insight into the thought process you and the other mods had.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
No problem. I didn't realize how much I wrote until I hit submit. Haha.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Hey look.. another week and another RDM issue where the data just disappeared. Looks similar to the previous thread I linked too...

http://forums.freenas.org/threads/lost-zfs-pool-after-upgrading-to-9-1-1.14629/

Is this sinking in yet?

FlynnVT, you can probably help this guy out the same as the last, assuming its the same issue. It appears to be on the surface(no partition table, etc.) but as always, no guarantees.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
On a similar note and since I'm going to continue to document every RDM disk loss I see in the forum when I have the time to find this thread I had someone PM me with a data loss issue.

They had a suddenly power loss(no UPS) while using RDM in a single disk environment. After bootup the pool was fine except 2 datasets were unmountable. When mounting the pool you'd get CHKSUM errors under a zpool status and an error during bootup to the effect of "Dataset Input/Output error". Note that the server had never had any problems until the sudden power loss(which isn't supposed to trash ZFS with its transactional design, but can/does because ESXi lies about sync writes). Anyways, after 15 hours of Teamviewer the data has been written off as lost. They are now considering going to a data recovery expert to see how likely they are to get their files back and how much it will cost. Currently it looks like at the cheapest end its going to be $4k, with potential 5 digit values if its not a simple error in the file system's metadata(that was straight from the phone call).
 

Letni

Explorer
Joined
Jan 22, 2012
Messages
63
Guys.. The problem is when using "unsupported" RDMs. Most people are using AHCI controllers and have to follow the CLI guide to create VMDK pointers (by hand in CLI) that point back to their physical disks hooked in their ESXi box. This is ultimately where everyone ends up as they find the option for RDM is GREYED-OUT in ESX as it is not supported on that type of controller (onboard SATA/ATA).

If people here are smart they would realize that actual SAS HBAs (supported in ESXi as SCSI not ATA/AHCI - Natively supports true RDMs - not CLI hack - must enable rdmfilter in ESX advanced options) can be had on eBay for as little at $10.00.

http://www.ebay.com/itm/LSI-Logic-H...sk_Controllers_RAID_Cards&hash=item461188bbd0

It even makes more sense to do this as then you can use the HBA to create a ESXi supported Raid-1 protected boot device for ESXi that also supports VMFS to hold the VM of your FreeNAS - something you can't do while booting ESXi from USB.

I still agree that more than likely folks are NOT following the most common sense approach to replace disks when a drive failure actually occurs, but I'm not going to rule out that there is a possibility that ESXi has issues with these "Unsupported RDMs" on ATA/AHCI controllers. Using a hardware controller (even if it is just to get disks to FreeNAS in RAW capacity) only makes too much sense, especially with all this debate about members using Virtualized FreeNAS!

And just an FYI, I had similar problem here last night upgrading my 8.3.1 VM to 9.1.1. I had 2 of my 5 disks in my RaidZ1 configuration on ACHI onboard controllers and as soon as I upgraded to 9.1.1, I ran into this SAME issue of 0 size disks on the 2 RDMs which were attached to the onboard controller (Couldn't import my pool). I fiddled and eventually put all 5 disks on my PCIe HBA SAS controller and upon reboot, reconfigured FreeNAS VM settings so it saw those 2 missing disks as normal supported RDMs. Booted the VM and Pool back online no problem (ran a scrub overnight no issues) all using supported RDMs on hardware controller on 9.1.1.

Go cheap LSI controllers on eBay!

my 2 cents!
 

SnarlingFox

Cadet
Joined
Sep 10, 2013
Messages
1
Hi folks, I'm in the same situation here, attempting to run 9.1 on ESXi 5.1 U1. I realise now this is probably not the wisest of ideas.

Before I continue, @cyberjock thank you for taking the time to explain everything thus far. Your patience is most abundant and greatly appreciated.

As I'm setting up a home server, I was hoping not to run FreeNAS bare-metal as I could ideally be using other resources on that machine. I've now completely dismissed virtualised FreeNAS with RDM and will look towards getting a system that supports VT-d.

Ps. I've read through this thread now and one thing seems to stand out; disk syncing. Is this the primary cause of so many corrupted zpools? If so, has anyone tried setting RDM's to "Independent-Permanent" to see if that makes a difference?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
ZFS is transactional, so it shouldn't be corruptible in the fashion that it is. Go figure. :P

As for the setting you mentioned, no. And to be honest I'd hate to be the guy to test that setting. It means you used RDM, which is just asking for trouble anyway. Despite the data loss, there's other "a-ha" reasons to not to RDM. In most cases you lose all SMART functions, you lose all SMART testing, and the read and write errors are not passed to the VM.
 

reks

Dabbler
Joined
Dec 28, 2012
Messages
23
Hello,
People, only what you need to do is change from physical to virtual mapping, of course with lsilogic. At this moment isn't another solution. Worked for me but is VERY dangerous!
 

FlynnVT

Dabbler
Joined
Aug 12, 2013
Messages
36
Thanks Paul! I'm not sure if FreeBSD-9.2-RC2 is available yet, but I just tried the latest 9.2 snapshot (PRERELEASE-amd64-20130811-r254222).

Unfortunately it gives the same result: discs are enumerated, model and serial # read OK but with a 0 byte media and 0 heads/sectors/sectorsize. Other symptoms are the same, e.g. "dd if=<dsk> of=/dev/null" gives "Device not configured". Hopefully, that CAM change commit you saw simply didn't make it to PR-20130811 and the 9.2 kernel will still fix this. I'll try again soon...

Interestingly - but not that it'll have any actual effect, changing the guest adapter from LSI Parallel to SAS makes the indicated transfer speed jump from 6.6 to 300 MB/s on both FreeBSD 8 and 9.

Finally, the BusLogic adapter isn't supported for 64-bit guests; giving an error message when attempting to start the machine. That rules out the second flawed workaround. 8.3.1 it is for now...

ifpu.jpg

I've been retrying this every so often - as updates to ESXi and FreeBSD/FreeNAS have been released. While both FreeBSD and FreeBSD 9.2 failed with ESXi 5.1 physical RDMs, I've just tried FreeBSD-10.0-RELEASE-amd64 and it enumerates everything OK.

(Most recently, FreeNAS-9.2.0-RELEASE-x64 failed with physical RDMs in ESXi 5.1.0 build 1312873)

The new kernel log is essentially the same as the screenshot above, but now shows the correct non-zero size for the HD204 drives. It appears that FreeBSD 10 has rolled back whatever quirk was introduced in 9, everything working again as it did in 8.
 

FlynnVT

Dabbler
Joined
Aug 12, 2013
Messages
36
Just tried some other combinations:
1) latest FreeNAS-9.2.1-BETA-x64 on latest ESXi 5.1.0 build 1483097: fail with 0 byte disc size
2) FreeBSD-10.0-RELEASE-amd64 also on latest ESXi 5.1.0 build 1483097: pass
 
Status
Not open for further replies.
Top