مقاله: جداول معکوس : جایگزینی برای ساختارهای رابطه ای [بایگانی]

رسول_57

پنج شنبه 12 دی 1392, 10:44 صبح

Inverted tables: an alternative to relational structures

Submitted by John Watson on Sun, 2013-09-08 02:52

The inverted table format can deliver fast and flexible query capabilities, but is not widely used. ADABAS is probably the most successful implementation, but how often do you see that nowadays? Following is a description of how to implement inverted structures within a relational database. All code run on Oracle Database 12c, release 12.1.0.1.
Consider this table and a few rows, that describe the contents of my larder:

create table food(id number,capacity varchar2(10),container varchar2(10),item varchar2(10));
insert into food values(1,'large','bag','potatoes');
insert into food values(2,'small','box','carrots');
insert into food values(3,'medium','tin','peas');
insert into food values(4,'large','box','potatoes');
insert into food values(5,'small','tin','carrots');
insert into food values(6,'medium','bag','peas');
insert into food values(7,'large','tin','potatoes');
insert into food values(8,'small','bag','carrots');
insert into food values(9,'medium','box','peas');

The queries I run against the table might be "how many large boxes have I?" or "give me all the potatoes, I don't care about how they are packed". The idea is that I do not know in advance what columns I will be using in my predicate: it could be any combination. This is a common issue in a data warehouse.
So how do I index the table to satisfy any possible query? Two obvious possibilities:
First, build an index on each column, and the optimizer can perform an index_combine operation on whatever columns happen to be listed in the predicate. But that means indexing every column - and the table might have hundreds of columns. No way can I do that.
Second, build a concatenated index across all the columns: in effect, use an IOT. That will give me range scan access if any of the predicated columns are in the leading edge of the index key followed by filtering on the rest of the predicate. Or if the predicate does not include the leading column(s), I can get skip scan access and filter. But this is pretty useless, too: there will be wildly divergent performance depending on the predicate.
The answer is to invert the table:

create table inverted(colname varchar2(10),colvalue varchar2(10),id number);
insert into inverted select 'capacity',capacity,id from food;
insert into inverted select 'container',container,id from food;
insert into inverted select 'item',item,id from food;

Now just one index on each table can satisfy all queries:

create index food_i on food(id);
create index inverted_i on inverted(colname,colvalue);

To retrieve all the large boxes:

http://upload7.ir/images/98582657317013657952.jpg

Or all the potatoes:

http://upload7.ir/images/99262160008705216812.jpg

Of course, consideration needs to be given to handling more complex boolean expressions; maintaining the inversion is going to take resources; and a query generator has to construct the inversion code and re-write the queries. But In principle, this structure can deliver indexed access for unpredictable predicates of any number of any columns, with no separate filtering operation. Can you do that with a normalized star schema? I don't think so.
I hope this little thought experiment has stimulated the little grey cells, and made the point that relational structures are not always optimal for all problems.
--
John Watson
Oracle Certified Master DBA
http://skillbuilders.com

جداول معکوس : جایگزینی برای ساختارهای رابطه ای

فرمت جدول معکوس می تواند قابلیت انعطاف پذیری و سرعت بیشتر در جستجوها به ما بدهد . اما امروزه کمتر مورد استفاده قرار می گیرد . شاید پایگاه داده آداباس بهترین پیاده سازی این مفهوم تا به حال بوده باشد . اما آیا می توانیم در کاربردهای روزانه مان نیز از این مفهوم استفاده کنیم : در زیر چگونگی پیاده سازی ساختارهای معکوس را در پایگاه های داده رابطه ای توضیح می دهیم . تمامی کدها در پایگاه داده Oracle 12C نسخه 12.1.0.1 اجرا شده است .

جدول و تعدادی سطر را که نشاندهنده داده های ما می باشد به صورت زیر در نظر بگیرید :

create table food(id number,capacity varchar2(10),container varchar2(10),item varchar2(10));
insert into food values(1,'large','bag','potatoes');
insert into food values(2,'small','box','carrots');
insert into food values(3,'medium','tin','peas');
insert into food values(4,'large','box','potatoes');
insert into food values(5,'small','tin','carrots');
insert into food values(6,'medium','bag','peas');
insert into food values(7,'large','tin','potatoes');
insert into food values(8,'small','bag','carrots');
insert into food values(9,'medium','box','peas');

پرس و جوهایی که ممکن است بر روی این جدول داشته باشیم می تواند این باشد که "چه تعداد جعبه بزرگ داریم ؟ " یا "تعداد بسته های سیب زمینی را به من بگو ، من نمی دانم چه تعدادی از آنها بسته بندی شده اند ." . در حقیقت مشکلی که داریم این است که نمی دانیم چه تعداد ستون را در پرس و جو هایمان مورد استفاده قرار می دهیم . ممکن است ترکیبی از آنها باشد ، این مفهوم مشکل بزرگی در انبارگردانی داده ها است .

بنابراین ما باید روش ایندکس گذاری برای پاسخگویی مناسب به پرس و جوهایمان را مشخص سازیم : دو روش موجود است :
روش اول این است که بر روی هر ستون ایندکس بگذاریم و بهینه ساز برای پاسخگویی به پرس و جو ها از ترکیب ایندکس ها استفاده کند . این روش روش خوبی نیست زیرا ممکن تعداد ستون هایمان بسیار زیاد باشد و در نتیجه سرباز عملیاتمان بسیار زیاد می باشد .

روش دوم می تونیم ستون ها را به هم بچسبانیم و بر روی این اطلاعات به هم چسبادنه شده ایندکس بزنیم و در پرس و جوهایمان سعی کنیم قسمتی از ایندکس را که مربوط به ردیف مورد نظر است از ایندکس بازیابی کنیم . در حقیقت از روش Lot استفاده کنیم . اما این روش هم کارایی جستجو را به شدت پایین می آورد و معمولا بسیار کم استفاده می شود .

تنها راهی که داریم معکوس کردن جدول است :

create table inverted(colname varchar2(10),colvalue varchar2(10),id number);
insert into inverted select 'capacity',capacity,id from food;
insert into inverted select 'container',container,id from food;
insert into inverted select 'item',item,id from food;

حالا تنها یک ایندکس بر روی جدول جدید می تواند به تمامی پرس و جوهایمان پاسخ گوید :

create index food_i on food(id); create index inverted_i on inverted(colname,colvalue);

برای جستجوی تمامی جعبه های بزرگ می توانیم از دستور زیر استفاده کنیم :

http://upload7.ir/images/98582657317013657952.jpg

یا برای جستجوی تعداد بارهای شامل سیب زمینی :

http://upload7.ir/images/99262160008705216812.jpg
البته از عبارتهای بولین پیچیده ای در این دستورات بر روی جداول معکوس باید استفاده کرد ، منابع سیستمی زیادی نیز نیاز دارد و تولید کننده پرس و جو باید کدهای معکوس را تولید کرده و پرس و جو ها را بازنویسی کند . با این حال این روش به راحتی دسترسی با استفاده از ایندکس را بر روی تمامی ستون ها برای ما فراهم می سازد و به عملیات فیلتر کردن هم نیازی نیست . آیا در ساختار نرمال شده ستاره ای می توانید چنین عملیاتی را انجام دهید ، قطعا نه .

امیدوارم این تست ساده من باعث گردد که سلول های خاکستری مان را به کار بیندازیم و ببینیم که ساختارهای رابطه ای همواره بهترین پاسخ برای حل مسائل نیستند .

جان واتسن - مدیر پایگاه داده (OCM)