Powershell: import complex PRICAT csv file
Background
A while ago I was asked to help with importing tyre data from several different manufacturers into a local PIM system.
For exchanging data electronically the tyre industry have adopted the EDIFACT subset EANCOM PRICAT for the trade, particularly for the tyre trade by the term "EDIWheel". EDIWheel process begin with an electronic product and price catalogue (PRICAT) then orders (ORDERS), confirmation of orders (ORDRSP), delivery dispatch (DESADV) and electronic invoice (INVOIC).
The EDI documents I had to process was PRICAT and invoice. PRICAT is basically a CSV file with some extra information on the first rows of the file.
Delimiter in the file was ";". The extra rows in the csv file prevented me from simply reading the file with the built-in powershell function import-csv.
EDIWheel PRICAT exists in two different versions, B2 (2008) or B4 (2016). In my case the format will be the older B2 format.
Since the only script language available, due to the IT-policy, was Windows Powershell I had to parse the PRICAT with Powershell and store the data in the PIM system
B2 or B4?
First step was to agree on what version of PRICAT to use. Some of the manufacturers offered only B2 so we decided to go with that. They are both structured the same way but B4 have additional parameters and extra values for some of the B2 parameters. They are both a CSV with extra information on the top two lines. EDIWheel foundation call this top information "HDS".
HDS looks like this
The documentation is copyrighted and can be obtained from the Ediweel web site. Read more Ediwheel.net
Description of HDS in below table
Parameter |
Status |
Format |
Length |
Example |
Comment |
Record type |
Mandatory |
AN* |
3 |
HDS |
|
Document ID |
Mandatory |
AN |
17 |
DED_BRV_010607_1 |
Pricelist/Catalogue Identification |
Date |
Mandatory |
AN |
12 |
20070601 |
Date or Date and
Time of Pricelist/Catalogue creation, FORMAT="YYYYMMDD" or
"YYYYMMDDHHMM" |
Currency |
Optional |
AN |
3 |
EUR |
|
Supplier ID |
Mandatory |
AN |
13 |
4399901585232 |
Code to identify
the supplier defined by third parties (ILN, DUNS, ...) or customer. |
Receiver ID |
Optional |
AN |
13 |
142145 |
Customer Number |
Country code |
Mandatory |
AN |
3 |
DE |
Country code, ISO
3166, e.g. DE, GB |
EDIWheel version |
Mandatory |
AN |
35 |
B2 |
|
Catalogue type |
Optional |
AN |
1 |
P |
If blank = full catalogue.
P= partial catalogue |
* AlphaNumeric
Step 1: Read the HDS from the file with powershell
There is a built in function in powershell to import a CSV, called import-CSV but the HDS record will make powershell refuse to import the file as CSV. However there is other built in functions to read from files. After a lot of Googling and trial and error i found away to only load the second row with the built in function Get-Content.
1 2 3 4 5 6 | # Load the CSV-file specified in $filename. # Two first rows contain file info. Read only row 2 [-1]. # Store in $header variable and then split into array and store in $data $header = (Get-Content -Path $filename -First 2)[-1] $data = $header.Split(';') |
From $data we can now read the parameters we need in our system. In my case that is the manufacturer, currency and country.
I also had to translate $SUPPLIER_ID to the internal ID of the manufacturer but I will not show how as that is company internal.
1 2 3 4 5 6 7 8 | #Currency $currency = $data[3] #Supplier_ID as the ID of the manufacturer $SUPPLIER_ID= $data[4] #Country $country = $data[6] |
Step 2: Read the POH rows from the file with powershell
From row 3 an onwards the file is a standard CSV files so I first tried Import-CSV function but I couldn't get the result I wanted so I tried with Get-Content instead which has built in function for skipping rows when reading the file.
After reading the file from row 3 and onwards I used ConvertFrom-CSV to be able to parse the data with powershell.
First I declare which columns to read, I did not need all the information in the PRICAT file. The parameter $desiredColumns is an array with the column names in the CSV file.
1 2 3 4 5 | #Column headers is row 3 $desiredColumns = 'POS','SUPPLIER_CODE','DESCRIPTION_1','WEIGHT','BRAND_TEXT','PRODUCT_TYPE','VEHICLE_TYPE','NET_VALUE','NV_VALID_FROM','RECYCLING_FEE','NOISE_PERFORMANCE','NOISE_CLASS_TYPE','ROLLING_RESISTANCE','WET_GRIP', 'EC_VEHICLE_CLASS', 'EU_DIRECTIVE_NUMBER' #Load the PRICAT file, skip first two rows and only store columns we need $pricat = Get-Content -Path $filename | Select-Object -Skip 2 | Out-String | ConvertFrom-Csv -Delimiter (';') | Select $desiredColumns |
Now it is only a matter of looping through each $row of the loaded $pricat and insert it into the database. In this example I am using Invoke-Sqlcmd that comes with the MS SQL tools. In the real case I had to store it in a old PIM database using a text based flat file format.
1 2 3 4 5 6 7 8 9 10 11 | #Loop through all rows foreach($row in $pricat) { $SQL="INSERT INTO tyre_info (POS,supplier_id,country,currency,description,weight,brand,product_type,vehicle_type,net_price,price_valid_from, rec_fee,noise,noise_type,rol_resistance,wet_grip,EC_veh_class,EU_directive)" $SQL+=" VALUES($row.POS,$SUPPLIER_ID,$country,$currency,$row.DESCRIPTION,$row.WEIGHT,$row.BRAND_TEXT,$row.PRODUCT_TYPE,$row.VEHICLE_TYPE,$row.NET_VALUE,$row.NV_VALID_FROM,$row.RECYCLING_FEE,$row.NOISE_PERFORMANCE,$row.NOISE_CLASS_TYPE,$row.ROLLING_RECISTANCE,$row.WET_GRIP,$row.EC_VEHICLE_CLASS,$row.EU_DIRECTIVE_NUMBER)" #Execute the SQL using Invoke-SqlCmd from MS SQL tools Invoke-Sqlcmd -Query $SQL -ServerInstance "MyServer\Instance" } |
Comments
Post a Comment