Wednesday, March 21, 2012

Image Data - Where should it be stored

Been pondering the idea of putting 43 - 56K Pdf documents into a SQL DB but everything I've read only goes as far as to explain how the data is stored or manipulated.

I need to establish whether there are real performance issues or gains that will be experienced by doing this.

If indeed, performance would be a big issue, would storing the files in a Folder on another drive outside the Database be all that much faster (Let windows handle the fetch and carry instead of SQL Server).

The plan at the moment is to store all the image data in a separate data file or multiple data files. Using partitioned views is also a consideration.

Number of records per day (pdf's) would be somewhere around a 1000.You also need to think through the archiving requirements/possibilities. No matter where you end up storing it, no drive will be big enough aft certain period of time.

If Image field vs. physical file are your only choices, I'd go with the first one. Your documents will be safer there. Consider SQLLiteSpeed as your backup tool, so that you can still perform your backups successfully when the database size grows to a considerable number.|||My vote...

Store them on the server and store the path in the database...keep each as separate files...

MOO|||Originally posted by rdjabarov
You also need to think through the archiving requirements/possibilities. No matter where you end up storing it, no drive will be big enough aft certain period of time.

If Image field vs. physical file are your only choices, I'd go with the first one. Your documents will be safer there. Consider SQLLiteSpeed as your backup tool, so that you can still perform your backups successfully when the database size grows to a considerable number.

Agreed that doc's might be safer inside DB but the real issue is where long term performance will be the best. In or out?

Archiving or even removal of the pdf's after a set time is definitely a must and SQL Litespeed would be the only option once the DB gets bigger.

Maybe I should just put a few hundred thousand of these things into a test DB and see how it goes and then do the same with the Filesystem Option.

Thanks|||Originally posted by Brett Kaiser
My vote...

Store them on the server and store the path in the database...keep each as separate files...

MOO

We've used this approach on another project and so far there haven't been any complaints or problems. Data size for the images on that one is at around 11 Gigs.

Thanks|||Brett,

I highly respect your opinion, but have to force myself to disagree with you this time.

Having docs stored on the file system, while having their paths stored in a table(-s) not simplifies but rather complicates your application architecture. As per performance, - it will not be any different whether you have the actual document stored in the image field, or its path in a varchar/nvarchar field. In fact, in case you choose the latter, your performance WILL be worse because of the mere presence of that nvarchar field. Keep in mind, that field may have to be as big as 4000 in case with nvarchar (which will translate to the same size as varchar(8000)) because while the file name cannot exceed 128 characters, the total path may and possibly will.

Also, I don't think anybody would do any mass operation on that table that would involve the Image field, so, again, the performance will actually be better.|||I bow to your graciousness...

Now to be fair, I've played with porting images in and out, and found it to be painful (MOO to that)

Worked with a Contenet mangament app once and they stored all their images outside the db

Then there's Bills Articel

http://www.sqlteam.com/item.asp?ItemID=986

And probably many more...

EDIT:

And a more balanced opinion

http://www.aspfaqs.com/aspfaqs/ShowFAQ.asp?FAQID=172|||1000 PDF a day?!!!!! That's a fair number. Where I work, we image all the companies paper. SQL tables get large very fast, as we image several thousand pieces of paper a day. My whole point to this, if you can afford it, why not look into either a small, introductory imaging system (Kodax/kofax come to mind) or a small SAN?

Our setup consists of 5 scanning stations, 3 quality control stations and 2 release stations. Part of the business consists of scanning pages, creating pdf's and putting those pdf's on line, to be viewed over the intranet. The process is fast and easy. We keep pointers to scanned images in approx 100-125 SQL tables (Each business unit gets it's own folder. Images are stored as bin files and written to optical media on a continious basis.) We have approx 100-125 users that view documents from a web application.

Originally posted by glyve
Been pondering the idea of putting 43 - 56K Pdf documents into a SQL DB but everything I've read only goes as far as to explain how the data is stored or manipulated.

I need to establish whether there are real performance issues or gains that will be experienced by doing this.

If indeed, performance would be a big issue, would storing the files in a Folder on another drive outside the Database be all that much faster (Let windows handle the fetch and carry instead of SQL Server).

The plan at the moment is to store all the image data in a separate data file or multiple data files. Using partitioned views is also a consideration.

Number of records per day (pdf's) would be somewhere around a 1000.|||Hmm. Let's see what that recipe was...add one can gasoline...light match..

I am actually a bit split on this idea. I can afford that position, since I have not had to deal with any of these systems to a large extent.

Microsoft (in their finite wisdom) believes that files should be kept in the database. At least when it comes to Sharepoint Team Services. They claim it is easier to maintain one copy of the document, and have twnety server serve it up, than it is to have twenty webservers with the same document on it. This overlooks one small point, of course. Just how much network traffic is there on that one poor network card, if twenty servers are all clamoring for this one document? Also, Sharepoint is geared more for editing documents, so they do have a point there. Easier to maintain one copy of each living document, than try to keep a bunch of other servers up to date on even 1000 documents.

Now, are these documents of yours static? Then you can drop a pile of network traffic, because the problem of keeping a server farm up to date becomes trivial (or left as an excercise for the interested reader).

This is all fine and well, of course, if you are "swinging for the fence" (hey, baseball season starts in 47 days). If you are planning on only one web server and one DB server, you can go either way. Depending on your expected amount of traffic, scalability requirements, size of the disk on the web server, colour of your bosses toupee, etc..

I do not know of any reason why you would need (or want) to keep paths as nvarchar. Also, I have never seen a path of 4,000 characters. I assume rdjabarov has been bit bad by one of these, though. Main thing to keep in mind at the design time will probably be how in heck are you going to organize all of these documents?

One last thing to note. Microsoft put out the "TerraServer" way back in 98 or so. It was a terabyte database running on SQL 7.0. Naturally, it was all image data. I have no idea how much maintenance it required, but it was running everytime I went to see the website...at least, so they said ;-)

Opinions expressed above may not reflect those of the owners/operators of this website, or indeed anything coming close to being rational.|||I indeed may sound very pationate about the whole idea, but I not only experimented with both methods, but also had to switch from file-based to database-based mid-way through development (half of the app was already in prod) And it was done not on 2K or 7.0, but in 6.5, with VB client and RDO as a "data truck." And we too were storing images, TIFF scans of HCFA-1500 and UB-92 claims, along with all the values coming from them into corresponding tables. The volume exceeded 1000 total claims a day, but the membership was growing rapidly (rigfht up untill the company went broke, and not because of database-based image storage :))

Advantages: no need to search on image field, and no affect on performance; If your sysadmin decides to move folders around (except for data and log-related folders) you are not affected; if the whole server goes CAPUT and the only thing that is left in your hands is a last-night tape backup, - you lost NOTHING!!! Contrary to "path or URL" as suggested by one of the articles mentioned by Brett (cheers ;)), where not only you need a db backup, but also the folder(-s) backup in order to claim you're back in business.|||I feel your pain. Only database corruption I ever faced (outside of a disk controller going screwy) was in image data in SQL 6.5. Although, the fool application was attempting to use the image data as an encrypted jumble of all sorts of order information, so it was constantly updating the image data.

1000 claims per day does not sound too bad, though. For an 8 hour day, that comes out to 125 per hour...carry the one...2 per minute or so. Naturally, you would have gusts in there. Those users are never so polite as to put these things in in an orderly and uniform rate, eh? How did the application perform?|||Thanks People. Life seems a bit less obscure now.

BTW the documents will all be read only.|||It performed OK, the images were retrieved only when there was a dispute with a provider, or for case management purposes. Since the entire claim information was stored into appropriate table structure, there was never a need to even touch the original image, or do anything else with it. Purging was designed for a 2-year window back, after which the images were placed on microfilm, and the data was stored in an aggregate form of lag, MLR, Encounter summary, and other analytical reports.|||Originally posted by MCrowley
This is all fine and well, of course, if you are "swinging for the fence" (hey, baseball season starts in 47 days). .

Gotta be a sox fan....

They count the days you see......

Thank god for their football team, or the entire state would be in therapy...

:D

(Sorry Mcrowley, had to get a zinger in their)

And as far as the topic goes...I've only seen one system deliver content from a blob...and that was home grown and on Oracle (major internal differences).

Content management, Kodak imagining systems ( now weren't those platters expensive!), Self service apps, Peoplesodt (has the ability to store images)...and every implementation stored them on the server...

As to the backup question, sure it's not in the db, just another part of the nightly back up process..

Hey WHATEVER works for you...|||That is quite all right, Brett. I have had some fun at others' expense here, as well ;-).

Besides, THIS year, is THE YEAR...just like last year........and the year before that.........and uh, well, the year before that... ;-)|||Originally posted by MCrowley
That is quite all right, Brett. I have had some fun at others' expense here, as well ;-).

Besides, THIS year, is THE YEAR...just like last year........and the year before that.........and uh, well, the year before that... ;-)

LOL

Can you say A-ROD?|||Never store any image or pdf in your database just the linking name to where it is on ya server.

Simple and no hassle todo!|||...here we go again...OK, I give up. As a matter of fact the concept that I described earlier helped us pass TDI audit without a hickup, which would have occurred, and in fact did occur with other Texas plans at that time. They DID store original documents as files, and they could not prove that information there stayed intact. The only thing they could do was to see "who modified the file last?" That answer was not enough for TDI, and those plans had to acquire additional, and at that time very expensive add-ons to their OS to prevent external, non-application-initiated access to those documents.

So there you have it, do as you please ;)|||Originally posted by rdjabarov
So there you have it, do as you please ;)

It's like the Indiana Jones movie...

"Go ahead, blow up the ark...Do as you wish...."

and remeber...we are merley passing through History...but SQL server...is History...(well maybe someday)

:D

(And if Oracle has anything to say about it....)

:eek:|||LOL

Can you say A-ROD?

I can say it...I just don't want to ;-)

And the obligatory thread related answer type comment:

It depends.

Cons:
single point of failure.
concentrate network traffic on one server
potentially HUGE database
others?

Pros:
Security
Data is from a single source
No worries about synchronizing paths vs files
others?

No comments:

Post a Comment