Test driving Rhino.ETL
Here is the first test:
[Test] public void EvaluatingScript_WithConnection_WillAddDataSourceToContext() { EtlConfigurationContext configurationContext = EtlContextBuilder.FromFile(@"Connections\connection_only.retl"); Assert.AreEqual(3, configurationContext.Connections.Count, "should have three connections"); }
There is quite a bit of information just in this test, we introduced the EtlConfigurationContext class, decided that we will create it from a factory, and that we have something that is called a connection. Another decision made was the “retl” extension (Rhino ETL), but that is a side benefit.
The source for this is:
Connection( "Northwind", ConnectionType: SqlConnection, ConnectionString: "Data Source=localhost;Initial Catalog=Northwind; Integrated Security=SSPI;", ConcurrentConnections: 5 ) Connection( "SouthSand", ConnectionType: OracleConnection, ConnectionStringName: "SouthSand" ) Connection( "StrangeOne", ConnectionType: OracleConnection, ConnectionStringGenerator: { System.Environment.GetEnvironmentVariable("MyEnvVar") } )
You may have wondered about the last one, what does this do? Well, it allows you to do runtime evaluation of something, in this case, it get the value from an env-var, but that has a lot of potential. Here it a test that demonstrate the capabilities:
[Test] public void DataSources_ConnectionStringGenerator_CanUseEvnrionmentVariables() { Environment.SetEnvironmentVariable("MyEnvVar","MyExpectedValue"); Assert.AreEqual( "MyExpectedValue", configurationContext.Connections["StrangeOne"].ConnectionString ); Environment.SetEnvironmentVariable("MyEnvVar", "2"); Assert.AreEqual( "2", configurationContext.Connections["StrangeOne"].ConnectionString ); }
Comments
Can't wait... whens it going in the repo?
Looks sweet, is that so you can redirect sources & targets at run time?
I haven't really dug into it, but I was scanning the ActiveWarehouse Ruby project the other day which has it's own way of dong things. Might be good for inspiration!
It uses control files to orchestrate sources, transforms etc much like your own does. See example:
I'm working on an ETL project right now, which was started by someone else as a bespoke VB.NET solution. I asked him if he's considered using any ETL tools such as SSIS or Kettle etc, but he didn't feel that they gave him enough control. I think a lot of developers feel this way. I only used SSIS briefly, but found that it encouraged far too much mouse clicking, and that annoys me because it's not easily automated. Well, it probably is, but I didn't hang around long enough to find out how!
Anyway, I won't rant on. I'I look forward to giving the Ayende flavour of ETL a spin!
Oh, forgot to ask. Will you be including any "common" data cleaning features in your ETL solution? In fact, do you even think that they have a place in such a library?
Common ones I've written recently are things like...
Remove commas from numbers
Trim and convert empty string to null
Reformat UK postcodes
Make title case
Remove blank rows
Remove repeated column headers in data
Derive title from name and drop into column 'n'
Unpivot repeated groups onto new rows, Unpivot( startCol, colsPerGroup, numberOfGroups)
Format dates
Remove illegal dates
You get the idea :-)
There are many other open source ETL tools that you might want to have a look at first, for instance:
http://kettle.pentaho.org/
http://www.cloveretl.org/clover-etl/
http://www.ketl.org/
http://www.enhydra.org/tech/octopus/index.html
http://www.glassfishwiki.org/jbiwiki/Wiki.jsp?page=ETLSE
http://scriptella.javaforge.com/
Some of these are quite powerful. Not as fun as writing your own tool/DSL of course, but great if you want to abandon SSIS quickly. :-)
Comment preview