Paper: Calvin: Fast Distributed Transactions for Partitioned Database Systems

Posted on May 23, 2013 at 11:31 pm by Todd Hoff | Comments Off

Distributed transactions are costly because they use agreement protocols. Calvin says, surprisingly, that using a deterministic database allows you to avoid the use of agreement protocols. The approach is to use a deterministic transaction layer that does all the hard work before acquiring locks and the beginning of transaction execution.
Overview:
Many distributed storage systems achieve high data access throughput via partitioning and replication, each system with its own advantages and tradeoffs. In order to achieve high scalability, however, today’s systems generally reduce transactional support, disallowing single transactions from spanning multiple partitions. Calvin is a practical transaction scheduling and data replication layer that uses a deterministic ordering guarantee to significantly reduce the normally prohibitive contention costs associated with distributed transactions. Unlike previous deterministic database system prototypes, Calvin supports disk-based storage, scales near-linearly on a cluster of commodity machines, and has no single point of failure. By replicating transaction inputs rather than effects, Calvin is also able to support multiple consistency levels—including Paxos based strong consistency across geographically distant replicas—at no cost to transactional throughput.

If you are interested Daniel Abadi gives a very accessible overview of Calvin in If all these new DBMS technologies are so scalable, why are Oracle and DB2 still on top of TPC-C? A roadmap to end their dominance.

Visit Oracle Linux and Oracle VM at Solutions Linux – France

Posted on May 23, 2013 at 10:54 pm by Zeynep Koch | Comments Off

Linux Solutions Libres & Open Source is the 15th edition of the One & Only Event in France around Linux solutions & Open Source solutions.

With 220 exhibitors/partners including SuSe, Microsoft and more than 5700 attendees, it’s the number one event in France dedicated to Linux solutions and to the Linux community: it will take place at the Cnit in La-Défense area (in Paris, France) on the 28th & 29th of May 2013.

There will be 5 keynotes, 10 roundtables, 48 technical sessions and much more. 

This event will be your opportunity to meet the Oracle Linux and Oracle VM French team and have the opportunity to ask them questions.  They will be presenting and talking in sessions to the French Linux & Open Source community. 

Oracle Linux/VM booth will be No.E40 on the exhibition area.

Come to meet us there !!

Learn more and register for the event

Dynamic Sampling – 2

Posted on May 23, 2013 at 8:46 pm by Jonathan Lewis | Comments Off

I’ve written about dynamic sampling in the past, but here’s a little wrinkle that’s easy to miss. How do you get the optimizer to work out the correct cardinality for a query like (the table creation statement follows the query):

select	count(*)
from	t1
where	n1 = n2
;

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	mod(rownum, 1000)	n1,
	mod(rownum, 1000)	n2
from
	generator	v1,
	generator	v2
where
	rownum <= 1e6 ; 

If you’re running 11g and can changed the code there are a couple of easy options – adding a virtual column, or applying extended stats and then modifying the SQL accordingly would be appropriate.

 -- Virtual Column alter table t1 add ( 	n3	generated always as ( case n1 when n2 then 1 end) virtual ) ; execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for columns n3 size 1')

-- Extended Stats

begin
	dbms_output.put_line(
		dbms_stats.create_extended_stats(
			ownname		=> user,
			tabname		=> 'T1',
			extension	=> '(case n1  when n2 then 1 else null end)'
		)
	);

	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		block_sample 	 => true,
		method_opt 	 => 'for columns (case n1  when n2 then 1 else null end) size 1'
	);
end;
/

select	count(*)
from	t1
where	(case n1 when n2 then 1 else null end)= 1
;

If you can’t change the SQL statement, there’s always the option for bypassing the problem by fixing a suitable execution plan with an SQL Baseline, of course. Alternatively, if you can think of the right hint you could create an “SQL Patch” for the statement – but what hint might be appropriate ? I’ll answer that question in a minute.

Here’s another option, though: get Oracle to use dynamic sampling. (You probably guessed that from the title of the post.) So which level would you use to make this work ? Left to its own devices, Oracle would calculate the selectivity of the predicate n1 = n2 as the smaller of the two separate predicates “n1 = unknown” and “n2 = unknown”. So you might hope that level 3 (Oracle is “guessing”) or level 4 (more than one predicate on a single table) might be appropriate. It’s the latter that works. If you execute “alter session set optimizer_dynamic_sampling=4;” before executing this query, Oracle will sample the table before optimising.

The method works, but can you apply it ? Possibly not, if you’re not allowed to inject any extra SQL anywhere – after all, you probably don’t want to set the parameter at the system level (spfile or init.ora) because it may affect lots of other queries – introducing more work because of the sample, and then risking unexpected changes in execution plans. Setting the parameter for a session is often no better. And this brings me back to the SQL Patch approach – if you don’t want to create a baseline for the query then perhaps a patch with the hint /*+ opt_param(‘optimizer_dynamic_sampling’ 4) */ will do the trick. Don’t forget all the doubling of single quotes that you’ll need, though (this is the code fragment I used):

begin
	sys.dbms_sqldiag_internal.i_create_patch(
		sql_text	=>
'
select
	count(*)
from	t1
where	n1 = n2
',
		hint_text	=> 'opt_param(''optimizer_dynamic_sampling'' 4)'

	);
end;
/

For more analysis and commentary on the SQL Patch mechanism, you might like to read Dominic Brooks’ mini-series:


在Hadoop上使用庖丁解牛

Posted on May 23, 2013 at 1:51 pm by Data & Architecture DBA | Comments Off

背景
在淘宝,卖家发布商品,或者淘宝会员发布二手,都有一个重要的操作,选择商品所在的类目。例如如下的发布二手页面示例:

对于不太熟悉淘宝的会员,选择这个类目肯定会有点头痛,我在想,在基于淘宝海量商品数据的情况,能否根据用户输入的商品标题,自动定位到此商品可能存在的类目。

技术准备
Hadoop 计算淘宝的商品标题数据
庖丁解牛 对标题进行分词
在hadoop上使用庖丁解牛,用google搜索,发现很多人都在问这个问题,没有找到一篇比较全面解决此问题的文章,所以这也是我写此篇文章的目的。

目标可行性验证
为了快速验证目标可行性,我先采用单字拆分的方式,没有使用庖丁解牛,将淘宝所有在线商品的标题进行了分析,形成了一个库。并随机用淘宝上一些商品标题进行了测试,发现根据商品标题预测类目是可行的。

Hadoop上使用庖丁解牛
下载庖丁解牛源码包,里面也包括字典库
# Non-members may check out a read-only working copy anonymously over HTTP.
svn checkout http://paoding.googlecode.com/svn/trunk/ paoding-read-only
庖丁解牛有一个字典库dic,这是一个目录,里面有很多的文件,这个字典库如何在hadoop环境上使用,是一个问题.

hadoop有一个东西,叫DistributedCache,可以把一些配置数据分发到真正执行map reduce的机器上.对于DistributedCache,可以参见文章:

自己写的map or reduce的程序中的
public void configure(JobConf conf) {
     //就可以读取DistributedCache中的数据
}

具体步骤:
1. 将字典目录打包成dic.zip or dic.tar都可以,上传到hdfs上
2. 在map reduce java program中的main函数中添加如下的代码:
    DistributedCache.addCacheArchive(new URI("/group/test/danchen/hive/category/dic.tar"), conf);
   注:添加CacheArchive方式,分发到执行任务的机器上时,hadoop执行框架会帮你自动解压成一个文件目录
3. 在map类的configure(JobConf conf)中通过如下的方法来读取:
              
Path[] localArchives;
Path real_hadoop_dic_home=null;
try {
localArchives = DistributedCache.getLocalCacheArchives(conf);
for(int i=0;i<localArchives.length;i++){
String fileName = localArchives[i].toUri().toString();
if(fileName.indexOf("dic.tar")>-1){
real_hadoop_dic_home = localArchives[i];
break;
}
}
} catch (IOException e) {
e.printStackTrace();
}
           我刚开始以为这样可以获取到字典的真正目录了,实际上没有,后来打印了dic.tar所在目录下的所有目录和文件后,才发现hadoop解压后的真正目录是:
  String hadoop_dic_home = real_hadoop_dic_home.toUri().toString()+"/"+"dic";
  把hadoop_dic_home打印出来是这样的(这个目录格式,网上没有找到相应的文章介绍):                /disk4/mapred/local/taskTracker/archive/-1294301082447095161/hdpnn/group/test/danchen/h
ive/category/dic.tar/dic

hadoop上dic目录的问题解决了,但接下来,如何把这个真实的目录传给庖丁解牛,需要修改一点配置与代码。
打开配置文件
paoding\paoding-analysis\src\paoding-dic-home.properties
写入paoding.dic.home=/tmp/dic   这个配置项是虚假的,这个路径是不对的,如果不配置,庖丁解牛会去读环境变量。

修改源代码:
paoding-analysis\src\net\paoding\analysis\knife\PaodingMaker.java
增加一个定义:
public static String hadoop_dic_home = "";

修改一个函数private static void setDicHomeProperties(Properties p):
        // 规范化dicHome,并设置到属性文件对象中
dicHome = hadoop_dic_home;
dicHome = dicHome.replace('\\', '/');
if (!dicHome.endsWith("/")) {
dicHome = dicHome + "/";
}
p.setProperty(Constants.DIC_HOME, dicHome);// writer to the properites

退出到paoding-analysis目录下,执行build.bat重新打包,将会生成新的paoding-analysis.jar包

将这个修改后的paoding-analysis.jar,以及lucene-core-3.6.2.jar 打包到自己的项目jar里就可以了。
初始化庖丁解牛字典的代码,hadoop_dic_home即上面动态获取的路径:
PaodingMaker.hadoop_dic_home = hadoop_dic_home;
analyzer =  new PaodingAnalyzer(); 

过程有点折腾,不知道有没有更好的方法没?

效果
原来分词是单字的,在使用庖丁解牛后,现在既有单字,也有一些词。看了一下跑出来的结果数据,比原来好。后面也可以不断往这个词库里,添加新的词,维护方便一些。

Strategy: Stop Using Linked-Lists

Posted on May 22, 2013 at 11:30 pm by Todd Hoff | Comments Off

What data structure is more sacred than the link list? If we get rid of it what silly interview questions would we use instead? But not using linked-lists is exactly what Aater Suleman recommends in Should you ever use Linked-Lists?

In The Secret To 10 Million Concurrent Connections one of the important strategies is not scribbling data all over memory via pointers because following pointers increases cache misses which reduces performance. And there’s nothing more iconic of pointers than the link list.

Here are Aeter's reasons to be anti-linked-list:

gather_plan_statistics – 2

Posted on May 22, 2013 at 8:40 pm by Jonathan Lewis | Comments Off

Some time ago – actually a few years ago – I wrote a note about the hint /*+ gather_plan_statistics */ making some informal comments about the implementation and relevant hidden parameters. I’ve recently discovered a couple of notes from Alexander Anokhin describing the feature in far more detail and describing some of the misleading side effects of the implementaiton. There are two parts (so far): part 1 and part 2.

 


Learn to use Oracle VM’s Application Driven Architecture

Posted on May 22, 2013 at 12:51 am by Antoinette O'Sullivan | Comments Off

Oracle's server virtualization products are designed to optimize efficiency and performance. You can get the most from your x86 and SPARC systems by taking the relevant training, available as live events run by experienced instructors.

In the 3-day Oracle VM Administration: Oracle VM Server for x86 course you learn to:

  • Plan a virtual solution.
  • Install the Oracle VM Server and the Oracle VM Manager software.
  • Configure network resources to provide isolation and redundancy.
  • Add SAN and NFS to provision storage for the virtual environment.
  • Create server pools and repositories to support application workloads.
  • Speed up virtual machine deployment with templates and assemblies.
  • Use virtual machine high availability.
  • Use server pool policies to maximize the performance of your server workloads.

Live-Virtual Events: Take this training from your own desk choose from a selection of events on the schedule to suit different timezones.

In-Class Events: Travel to an education center to attend an event. Below is a selection of events already on the schedule:

 Location

 Date

 Delivery Language

 Zagreb, Croatia

 11 November 2013

 Croatian

 Prague, Czech Republic

 21 October 2013

 Czech

 Bordeaux, France

 18 September 2013

 French

 Paris, France

 10 July 2013

 French

 Strasbourg, France

 11 September 2013

 French

 Dusseldorf, Germany

 24 June 2013

 German

 Munchen, Germany

 28 October 2013

 German

 Budapest, Hungary

 9 September 2013

 Hungarian

 Riga, Latvia

 30 September 2013

 Latvian

 Warsaw, Poland

 27 May 2013

 Polish

 Bucharest, Romania

 17 June 2013

 English

 Madrid, Spain

 12 August 2013

 Spanish

 Istanbul, Turkey

 17 June 2013

 Turkish

 Tokyo, Japan

 3 June 2013

 Japanese

 Singapore

 29 May 2013

 English

 Canberra, Australia

 4 November 2013

 English

 Melbourne, Australia

 12 June 2013

 English

 Perth, Australia

 17 July 2013

 English

 Sydney, Australia

 8 July 2013

 English

 Buenos Ares, Argentina

 2 October 2013

 Spanish

 Santiago, Chili

 27 May 2013

 Spanish

 Lima, Peru

 29 May 2013

 Spanish

 Roseville, MN, United States

 28 May 2013

 English

 Reston, VA, United States

 31 July 2013

 English

In the 2-day Oracle VM Server for SPARC: Installation and Configuration course you learn to: 

  • Properly design, install, implement and administer this virtual environment
  • Create and assign resources to domains where business applications are deployed.
  • Use the migration feature, which increases availability and adds flexibility to the data center virtual environment.

Live-Virtual Events: Take this training from your own desk choose from a selection of events on the schedule to suit different timezones.

In-Class Events: Travel to an education center to attend an event. Below is a selection of events already on the schedule:

 Location

 Date

 Delivery Language

 Czech Republic, Prague

 9 September 2013

 Czech

 Paris, France

 7 October 2013

 French

 Hamburg, Germany

 22 May 2013

 German

 Stuttgart, Germany

 28 October 2013

 German

 Budapest, Hungary

 12 September 2013

 Hungarian

 Bucharest, Romania

 8 July 2013

 English

 Madrid, Spain

 20 June 2013

 Spanish

 Istanbul, Turkey

 30 September 2013

 Turkish

 Brisbane, Australia

 24 July 2013

 English

 Canberra, Australia

 3 June 2013

 English

 Melbourne, Australia

 30 October 2013

 English

 Perth, Australia

 15 July 2013

 English

 Sydney, Australia

 25 September 2013

 English

 Sacramento, CA, United States

 1 July 2013

 English

 San Jose, CA, United States

 1 July 2013

 English

Targetbase gets deeper insight and 40x faster query response with Oracle Exadata

Posted on May 21, 2013 at 11:20 pm by margaret hamburger | Comments Off

This is a great example of the evolution happening in Oracle Data Warehousing environments as customers consolidate databases into a single source of truth for their businesses with Oracle Exadata. Not only was Targetbase able to integrate large volumes of social, retail, and point-of-sale (POS) data, such as Web blogs, to drive richer and more actionable insights for their clients. They enabled real-time analytics scoring and consumer-specific text mining on large volumes of marketing-related data while achieving some great performance gains with up to 40x faster queries. Learn how Oracle Database compression features helped reduce their storage costs by 30%.

Targetbase Analyzes Marketing Data up to 40x Faster, Helps Clients Launch Targeted Marketing Campaigns Earlier

Back Home …

Posted on May 21, 2013 at 5:47 pm by Mike Dietrich | Comments Off

Now finally back from my short trip to Johannesburg, South Africa. Flying out Monday overnight, returning Friday morning after another overnight flight (all Eco). But thanks to Turkish Airlines - including the short stop-over in Istanbul was worth it as the seating comfort in Turkish's economy class is far better than Lufthansa's - as is the food and especially the entertainment system, the service ... almost everything. And luckily the Turkish Airline's employees didn't go on strike in Thursday :-)

But after running now 6 internal Oracle Database 12c: Upgrade, Migrate and Consolidate workshops in the US and in EMEA I've learned a lot from our over 180 participants doing the full Hands-On part of the workshop.

I'm really looking forward to the release date :-) 

Need a good reason to upgrade to Oracle Database? I’ll give you seventy five!

Posted on May 21, 2013 at 2:52 am by margaret hamburger | Comments Off

A few times a year, I'm asked to update the Oracle Database Reference Booklet. Each time, the number of customers benefiting from upgrading to the Oracle Database grows. Read for yourself how an airline cut analysis time for travel partner bookings from 10 hours to 10 minutes, helping staff develop better informed partner sales and marketing strategies. Or how a bank completed their database upgrade without disruption to their day-to-day operations and minimized system downtime for more than 2 million customers, who expect around the clock services. Or how a manufacturing company saved an expected US$609,000 on storage hardware costs. These are just of few of the customer who improved performance, provided better service for their customers and saved money with the Oracle Database.

Need a good reason to upgrade to the Oracle Database? Look no further.

Oracle Database References booklet




older posts »

Page 1 of 57612345...102030...Last »

Recent Posts

Tag Cloud

11g 11i 12 12c Best Practice Best Practices certifications chris Cloud database desktop Desktop Virtualization ebs EBS 11i EBS 12 ebs12 EBS Release 12 enterprise Events exadata hot links Java kawalek management manager News optimizer Oracle Linux Oracle Virtualization Oracle VM performance server Strategy sun training Uncategorized VDI VirtualBox virtualization vm webcast WebLogic Windows Workshop 大话技术

Meta

FreeDBA.net is proudly powered by WordPress and the SubtleFlux theme.

Copyright © FreeDBA.net



日志宝-在线日志分析平台