Rhino.ETLFull Package Syntax
Okay, here is the full package syntax that I have now, which is enough to express quite a bit, I am now getting started on working on the engine itself, I am going to try the message passing architecture for now, since it is much more flexible.
connection( "NorthwindConnection", ConnectionType: SqlConnection, ConnectionString: "Data Source=localhost;Initial Catalog=Northwind; Integrated Security=SSPI;" ) source Northwind, Connection="NorthwindConnection": Command: "SELECT * FROM Orders WHERE RequiredDate BETWEEN @LastUpdate AND @CurrentDate" Parameters: @LastUpdate = date.Today.AddDays(-1) @CurrentTime = ExecuteScalar("NorthwindConnection", "SELECT MAX(RequiredDate) FROM Orders") transform ToLowerCase: for column in Parameters.Columns: Row[column] = Row[column].ToLower() if Row[column] isa string destination Northwind, Connection = "NorthwindConnection": Command: """ INSERT INTO [Orders_Copy] ( [CustomerID], [EmployeeID], [OrderDate], [RequiredDate], [ShippedDate],[ShipVia], [Freight],[ShipName],[ShipAddress],[ShipCity],[ShipRegion],[ShipPostalCode], [ShipCountry] ) VALUES ( @CustomerID,@EmployeeID,@OrderDate,@RequiredDate,@ShippedDate,@ShipVia,@Freight, @ShipName,@ShipAddress,@ShipCity,@ShipRegion,@ShipPostalCode,@ShipCountry ) """ pipeline CopyOrders: Sources.Northwind >> ToLowerCase(Columns: ['ShipCity','ShipRegion']) ToLowerCase >> Destinations.Northwind
More posts in "Rhino.ETL" series:
- (04 Aug 2007) Status Report - Joins, Distinct & Engine work
- (21 Jul 2007) Full Package Syntax
- (21 Jul 2007) Turning Transformations to FizzBuzz tests
- (21 Jul 2007) Providing Answers
Comments
All looks good to me :)
I realise you're doing this to solve your own problems, not mine, but some other thoughts I've had on this are:
1) Mappings
It's common to set up a mapping for each source, which indicates how source columns map to target columns. Had you considered making this an explicit concept in your DSL?
map SkandiaFileFormatToDestinationsTable:
'Customer ID' => 'customer_id'
'Order ID' => 'order_id'
'First Name' => 'forename'
pipeline CopyOrders:
Sources.SkandiaFile >> Destinations.ValuationsTable using map SkandiaFileFormatToDestinationsTable
2) Back to the idea of having a stock library of standard cleaning functions. You could just allow file include?
pipeline ImportOrdersFromPegasus:
include "StandardTransforms.boo"
Sources.Orders >> ToTitleCase(Colums: ['Title','Forename','Surname']) >> Destinations.Orders with map PegasusOrdersToSunOrders
Food for thought, anyway!
I am looking forward to see what you can do with this :-)
=> is not a valid Boo operator, so I used >>, but other than that, isn't your map basically a transform of column names?
transform SkandiaFileFormatToDestinationsTable:
Row.customer_id = Row["Customer ID"]
I will probably not allow a file include, but rather a project system, which will allow better reuse (including shared dlls).
Um. WOW is all I can say.
That is very nice short and concise language.
I have to admit that I've used C# instead of SSIS in cases where I can get away with it, but for my next case I'll be looking at these bits.
Looking good. I'm coming from a non-ORM background, so I have issues with the literal sql strings. Can't you do anything about those?
Well, what would you have instead?
This is a ETL project, I don't see any better way to handle this, frankly.
Comment preview