Update#1: Database 12C Release 2 allows you to modify parameters of an external table on the fly.
Update#2: Database 18C allows you to create inline External Tables so you don’t have to define them in advance.
External tables are an useful feature which allows us to read flat-files directly using SQL just like normal Oracle tables. They offer an uncomplicated way of moving data in and out of a database making ETL processes much easier. Once an external table is created, loading data from the flat-file to the database is as simple as writing a INSERT INTO…SELECT…FROM external_table.
Example 1: A simple data load:
1. Create the directory, MY_DATA which will contain the data file.
SQL> create or replace directory MY_DATA as '/home/oracle/data';
Directory created.
SQL>
2. The data file, emp.csv contains the employee number, employee name, job, manager number, salary, commission and department number in the following format:
7369,SMITH,CLERK,7902,800,,20
7499,ALLEN,SALESMAN,7698,1600,300,30
7521,WARD,SALESMAN,7698,1250,500,30
7566,JONES,MANAGER,7839,2975,,20
7654,MARTIN,SALESMAN,7698,1250,1400,30
7698,BLAKE,MANAGER,7839,2850,,30
7782,CLARK,MANAGER,7839,2450,,10
7788,SCOTT,ANALYST,7566,3000,,20
7839,KING,PRESIDENT,,5000,,10
7844,TURNER,SALESMAN,7698,1500,0,30
7876,ADAMS,CLERK,7788,1100,,20
7900,JAMES,CLERK,7698,950,,30
7902,FORD,ANALYST,7566,3000,,20
7934,MILLER,CLERK,7782,1300,,10
3. Create the external table, emp_external.
CREATE TABLE emp_external
(
EMPNO NUMBER(4),
ENAME VARCHAR2(10),
JOB VARCHAR2(9),
MGR NUMBER(4),
SAL NUMBER(7,2),
COMM NUMBER(7,2),
DEPTNO NUMBER(2)
)
ORGANIZATION EXTERNAL
(TYPE ORACLE_LOADER
DEFAULT DIRECTORY my_data
ACCESS PARAMETERS
(RECORDS DELIMITED BY NEWLINE
LOGFILE my_data:'data.log'
BADFILE my_data:'data.bad'
DISCARDFILE my_data:'data.disc'
fields terminated by ','
MISSING FIELD VALUES ARE NULL
)
LOCATION (my_data:'emp.csv')
);
The DEFAULT DIRECTORY points to the directory which contains the data file. LOGFILE, BADFILE and DISCARDFILE are the files which will be created when the data is loaded into the database. LOCATION indicates the name of the data file in the directory. For a complete description of the clauses refer to the Oracle documentation. One important point to remember is that comments must be placed before any access parameters. If you include comments in the access parameter sections, Oracle will throw an error when you query the external table but not when you are creating it.
Update: Oracle supports wilcards for the LOCATION clause from 12C Release 1 onwards. So you can use * to match multiple characters or ? to match a single character.
LOCATION (my_data:'*emp*.csv')
LOCATION (my_data:'*emp?.csv')
4. You can now access the data in the file by querying the external table.
SQL> select * from emp_external;
EMPNO ENAME JOB MGR SAL COMM DEPTNO
---------- ---------- --------- ---------- ---------- ---------- ----------
7369 SMITH CLERK 7902 800 20
7499 ALLEN SALESMAN 7698 1600 300 30
7521 WARD SALESMAN 7698 1250 500 30
7566 JONES MANAGER 7839 2975 20
7654 MARTIN SALESMAN 7698 1250 1400 30
7698 BLAKE MANAGER 7839 2850 30
7782 CLARK MANAGER 7839 2450 10
7788 SCOTT ANALYST 7566 3000 20
7839 KING PRESIDENT 5000 10
7844 TURNER SALESMAN 7698 1500 0 30
7876 ADAMS CLERK 7788 1100 20
7900 JAMES CLERK 7698 950 30
7902 FORD ANALYST 7566 3000 20
7934 MILLER CLERK 7782 1300 10
14 rows selected.
SQL>
Example 2: Skip specific fields/columns, load dates
Consider the following data set which contains an additional hire date field in addition to those in the previous example. Also observe the name of the first employee which contains a comma and is surrounded by double-quotes:
7369,"S,SMITH",CLERK,7902,DEC-17-1980,800,,20
7499,ALLEN,SALESMAN,7698,FEB-20-1981,1600,300,30
7521,WARD,SALESMAN,7698,FEB-22-1981,1250,500,30
7566,JONES,MANAGER,7839,APR-02-1981,2975,,20
7654,MARTIN,SALESMAN,7698,SEP-28-1981,1250,1400,30
7698,BLAKE,MANAGER,7839,MAY-01-1981,2850,,30
7782,CLARK,MANAGER,7839,JUN-09-1981,2450,,10
7788,SCOTT,ANALYST,7566,APR-19-1987,3000,,20
7839,KING,PRESIDENT,,NOV-17-1981,5000,,10
7844,TURNER,SALESMAN,7698,SEP-08-1981,1500,0,30
7876,ADAMS,CLERK,7788,MAY-23-1987,1100,,20
7900,JAMES,CLERK,7698,DEC-03-1981,950,,30
7902,FORD,ANALYST,7566,DEC-03-1981,3000,,20
7934,MILLER,CLERK,7782,JAN-23-1982,1300,,10
While loading the data, we want to skip the JOB field.
SQL> drop table emp_external;
Table dropped
SQL> CREATE TABLE emp_external
2 (
3 EMPNO NUMBER(4),
4 ENAME VARCHAR2(10),
5 MGR NUMBER(4),
6 HIREDATE DATE,
7 SAL NUMBER(7,2),
8 COMM NUMBER(7,2),
9 DEPTNO NUMBER(2)
10 )
11 ORGANIZATION EXTERNAL
12 (TYPE ORACLE_LOADER
13 DEFAULT DIRECTORY my_data
14 ACCESS PARAMETERS
15 (RECORDS DELIMITED BY NEWLINE
16 LOGFILE my_data:'data.log'
17 BADFILE my_data:'data.bad'
18 DISCARDFILE my_data:'data.disc'
19 fields terminated by ',' OPTIONALLY ENCLOSED BY '"'
20 MISSING FIELD VALUES ARE NULL
21 ( EMPNO,
22 ENAME,
23 JOB,
24 MGR,
25 HIREDATE DATE "MON-DD-YYYY",
26 SAL,
27 COMM,
28 DEPTNO
29 )
30 )
31 LOCATION (my_data:'emp.csv')
32 );
Table created
SQL> select * from emp_external;
EMPNO ENAME MGR HIREDATE SAL COMM DEPTNO
----- ---------- ----- ----------- --------- --------- ------
7369 S,SMITH 7902 12/17/1980 800.00 20
7499 ALLEN 7698 2/20/1981 1600.00 300.00 30
7521 WARD 7698 2/22/1981 1250.00 500.00 30
7566 JONES 7839 4/2/1981 2975.00 20
7654 MARTIN 7698 9/28/1981 1250.00 1400.00 30
7698 BLAKE 7839 5/1/1981 2850.00 30
7782 CLARK 7839 6/9/1981 2450.00 10
7788 SCOTT 7566 4/19/1987 3000.00 20
7839 KING 11/17/1981 5000.00 10
7844 TURNER 7698 9/8/1981 1500.00 0.00 30
7876 ADAMS 7788 5/23/1987 1100.00 20
7900 JAMES 7698 12/3/1981 950.00 30
7902 FORD 7566 12/3/1981 3000.00 20
7934 MILLER 7782 1/23/1982 1300.00 10
14 rows selected
SQL>
Notice that in this case, while creating the external table we had to specify the fields in the data file. This is required for two reasons:
1. The ORACLE_LOADER access driver by default considers all fields to be of CHAR(255) type. In order to process the date field, we need to specify the data type and the date format.
2. If the field names are not specified in the data file, the fields are inserted into the table in the order in which they appear in the file. In case they are specified, the field names in the data file are mapped to the column names in the external table. To skip a specific field, all we need to do is to exclude it from the external table definition
The employee name field in the first record is taken care of by including the parameter OPTIONALLY ENCLOSED BY.
Example 3: Skip records/rows
The first n records in a file can be skipped by specifying SKIP n. Records can also be skipped conditionally by using the parameter LOAD WHEN. For this example, we will use the data file from Example#2. We want to skip the first record and the record with employee number 7499.
SQL> drop table emp_external;
Table dropped
SQL> CREATE TABLE emp_external
2 (
3 EMPNO NUMBER(4),
4 ENAME VARCHAR2(10),
5 MGR NUMBER(4),
6 HIREDATE DATE,
7 SAL NUMBER(7,2),
8 COMM NUMBER(7,2),
9 DEPTNO NUMBER(2)
10 )
11 ORGANIZATION EXTERNAL
12 (TYPE ORACLE_LOADER
13 DEFAULT DIRECTORY my_data
14 ACCESS PARAMETERS
15 (RECORDS DELIMITED BY NEWLINE
16 SKIP 1
17 LOAD WHEN EMPNO!='7499'
18 LOGFILE my_data:'data.log'
19 BADFILE my_data:'data.bad'
20 DISCARDFILE my_data:'data.disc'
21 fields terminated by ',' OPTIONALLY ENCLOSED BY '"'
22 MISSING FIELD VALUES ARE NULL
23 ( EMPNO,
24 ENAME,
25 JOB,
26 MGR,
27 HIREDATE DATE "MON-DD-YYYY",
28 SAL,
29 COMM,
30 DEPTNO
31 )
32 )
33 LOCATION (my_data:'emp.csv')
34 );
Table created
SQL> select * from emp_external;
EMPNO ENAME MGR HIREDATE SAL COMM DEPTNO
----- ---------- ----- ----------- --------- --------- ------
7521 WARD 7698 2/22/1981 1250.00 500.00 30
7566 JONES 7839 4/2/1981 2975.00 20
7654 MARTIN 7698 9/28/1981 1250.00 1400.00 30
7698 BLAKE 7839 5/1/1981 2850.00 30
7782 CLARK 7839 6/9/1981 2450.00 10
7788 SCOTT 7566 4/19/1987 3000.00 20
7839 KING 11/17/1981 5000.00 10
7844 TURNER 7698 9/8/1981 1500.00 0.00 30
7876 ADAMS 7788 5/23/1987 1100.00 20
7900 JAMES 7698 12/3/1981 950.00 30
7902 FORD 7566 12/3/1981 3000.00 20
7934 MILLER 7782 1/23/1982 1300.00 10
12 rows selected
SQL>
SKIP 1 skips the first record while LOAD WHEN EMPNO!=’7499′ skips the second record in the data file.
Example 4: Using the preprocessor feature in 11gR2
The preprocessor feature allows users to preprocess the input data before it is sent to the access driver. The preprocessor feature can be used to execute system commands, user-generated binaries or user-supplied scripts before the data is processed by the access driver. This means that users can modify the data file and its contents before it is loaded into the database in a single step. However, the preprocessor programs must write to the standard output for the feature to work since it reads the input from the standard output.
For this example, we will consider the compressed file, emp.csv.gz as the data file. The contents of this file is same as in Example#2. In real-life scenarios, the input data file is generally provided in a zipped format which needs to be unzipped first before the data is loaded. This example will unzip the file and load it in a single step by using the preprocessor feature.
SQL> create or replace directory MY_BIN as '/home/oracle/bin';
Directory created.
SQL> drop table emp_external;
Table dropped
SQL>
SQL> CREATE TABLE emp_external
2 (
3 EMPNO NUMBER(4),
4 ENAME VARCHAR2(10),
5 MGR NUMBER(4),
6 HIREDATE DATE,
7 SAL NUMBER(7,2),
8 COMM NUMBER(7,2),
9 DEPTNO NUMBER(2)
10 )
11 ORGANIZATION EXTERNAL
12 (TYPE ORACLE_LOADER
13 DEFAULT DIRECTORY my_data
14 ACCESS PARAMETERS
15 (RECORDS DELIMITED BY NEWLINE
16 LOAD WHEN EMPNO!='7369'
17 LOGFILE my_data:'data.log'
18 BADFILE my_data:'data.bad'
19 DISCARDFILE my_data:'data.disc'
20 PREPROCESSOR my_bin: 'zcat'
21 fields terminated by ',' OPTIONALLY ENCLOSED BY '"'
22 MISSING FIELD VALUES ARE NULL
23 ( EMPNO,
24 ENAME,
25 JOB,
26 MGR,
27 HIREDATE DATE "MON-DD-YYYY",
28 SAL,
29 COMM,
30 DEPTNO
31 )
32 )
33 LOCATION (my_data:'emp.csv.gz')
34 );
Table created
SQL> select * from emp_external;
EMPNO ENAME MGR HIREDATE SAL COMM DEPTNO
----- ---------- ----- ----------- --------- --------- ------
7499 ALLEN 7698 2/20/1981 1600.00 300.00 30
7521 WARD 7698 2/22/1981 1250.00 500.00 30
7566 JONES 7839 4/2/1981 2975.00 20
7654 MARTIN 7698 9/28/1981 1250.00 1400.00 30
7698 BLAKE 7839 5/1/1981 2850.00 30
7782 CLARK 7839 6/9/1981 2450.00 10
7788 SCOTT 7566 4/19/1987 3000.00 20
7839 KING 11/17/1981 5000.00 10
7844 TURNER 7698 9/8/1981 1500.00 0.00 30
7876 ADAMS 7788 5/23/1987 1100.00 20
7900 JAMES 7698 12/3/1981 950.00 30
7902 FORD 7566 12/3/1981 3000.00 20
7934 MILLER 7782 1/23/1982 1300.00 10
13 rows selected
SQL>
A new directory, MY_BIN has been created which contains the executable called by the preprocessor. The PREPROCESSOR parameter points to the executable, zcat that we want to invoke before the loading process starts. Creating separate directories for preprocessor executables is not mandatory but is considered a best practice. The preprocessor feature does not support passing of arguments to the executable. For example, specifying gunzip -c instead of zcat(both of which are functionally similar) for the PREPROCESSOR parameter would have resulted in an error while accessing the external table. The solution lies in using a shell script, etst_script.sh and using it as the preprocessor executable.
[oracle@db11gr2 bin]$ cat test_script.sh
#!/bin/sh
/home/oracle/bin/gunzip -c $1
[oracle@db11gr2 bin]$
SQL> drop table emp_external;
Table dropped
SQL>
SQL> CREATE TABLE emp_external
2 (
3 EMPNO NUMBER(4),
4 ENAME VARCHAR2(10),
5 MGR NUMBER(4),
6 HIREDATE DATE,
7 SAL NUMBER(7,2),
8 COMM NUMBER(7,2),
9 DEPTNO NUMBER(2)
10 )
11 ORGANIZATION EXTERNAL
12 (TYPE ORACLE_LOADER
13 DEFAULT DIRECTORY my_data
14 ACCESS PARAMETERS
15 (RECORDS DELIMITED BY NEWLINE
16 LOAD WHEN EMPNO!='7369'
17 LOGFILE my_data:'data.log'
18 BADFILE my_data:'data.bad'
19 DISCARDFILE my_data:'data.disc'
20 PREPROCESSOR my_bin: 'test_script.sh'
21 fields terminated by ',' OPTIONALLY ENCLOSED BY '"'
22 MISSING FIELD VALUES ARE NULL
23 ( EMPNO,
24 ENAME,
25 JOB,
26 MGR,
27 HIREDATE DATE "MON-DD-YYYY",
28 SAL,
29 COMM,
30 DEPTNO
31 )
32 )
33 LOCATION (my_data:'emp.csv.gz')
34 );
Table created
SQL> select * from emp_external;
EMPNO ENAME MGR HIREDATE SAL COMM DEPTNO
----- ---------- ----- ----------- --------- --------- ------
7499 ALLEN 7698 2/20/1981 1600.00 300.00 30
7521 WARD 7698 2/22/1981 1250.00 500.00 30
7566 JONES 7839 4/2/1981 2975.00 20
7654 MARTIN 7698 9/28/1981 1250.00 1400.00 30
7698 BLAKE 7839 5/1/1981 2850.00 30
7782 CLARK 7839 6/9/1981 2450.00 10
7788 SCOTT 7566 4/19/1987 3000.00 20
7839 KING 11/17/1981 5000.00 10
7844 TURNER 7698 9/8/1981 1500.00 0.00 30
7876 ADAMS 7788 5/23/1987 1100.00 20
7900 JAMES 7698 12/3/1981 950.00 30
7902 FORD 7566 12/3/1981 3000.00 20
7934 MILLER 7782 1/23/1982 1300.00 10
13 rows selected
SQL>
Please note the following points regarding preprocessor shell scripts:
1. The full path name must be specified for system commands (for example, gunzip)
2. The data file listed in the external table LOCATION clause should be referred to by $1.
External tables and their features can also be put to other uses such as unloading data to files, reading the alert log and listing OS files using SQL.