Data Obfuscation

View previous topic View next topic Go down

Data Obfuscation

Post  JimShaw on Wed May 18, 2011 6:46 am

Our Information Security department are unhappy about use of production data in test environments.

However, the belief in the ETL team is that we would be extremely uncomfortable to go to production without having tested a sample of live data. ("The Data Warehouse Lifecycle Toolkit", Second Edition, p545 and p548, seems to agree with this view).

Does anybody have experience of obfuscating production data and using this for test purposes? Any hints, tips or experiences which could be shared would be useful.

Thanks

Jim

JimShaw

Posts : 2
Join date : 2009-12-09
Location : Edinburgh, Scotland

View user profile

Back to top Go down

Re: Data Obfuscation

Post  Sealeopard on Wed May 18, 2011 10:02 am

It depends entirely on the type of data. The problem with obfuscating data is that you need to change it in such a way that you do not loose the inherent characteristincs of the data. For example, if you work with balances and transaction amounts then your obfuscated data still needs to tie out between the balances and transactions. Additioanlly, if there are inter-relationships betwene datasets, e.g. events, then these must be preserved as well.

In our environment, we ultimately got out IS/compliance/Audit groups to agree that we use full complements of production data in our DEV/STG environments. This has the advantage that code is developed under real-workd data conditiosn (warts and all) as well as provide the ability for real performance testing.

Sealeopard

Posts : 4
Join date : 2011-05-17

View user profile

Back to top Go down

Re: Data Obfuscation

Post  Mike Honey on Wed May 18, 2011 8:36 pm

Hi Jim,

I agree that use of production data for testing DW/BI applications is standard practice.

In my experience it is very difficult to establish and maintain a rigorous data obfuscation routine. Consider that as your source systems evolve (especailly when schemas change), you will need to completely refresh your test data and re-obfuscate.

So if you have to go down this path, try minimise the scope of the obfuscation - typically it only really needs to target people or organisation names, which hopefully only occur in a handful of columns in your DW.

Good luck!
Mike
avatar
Mike Honey

Posts : 185
Join date : 2010-08-04
Location : Melbourne, Australia

View user profile http://www.mangasolutions.com

Back to top Go down

Re: Data Obfuscation

Post  ngalemmo on Wed May 18, 2011 9:39 pm

I agree with Mike. Usually security issues only involve the ability to identify persons or organizations from the data. So obfuscation is often a simple matter of blanking or dummying names, addresses, and government identifiers, such as social security numbers. You should not need to obfuscate business keys, amounts, status codes or anything else of importance to testing.

And if they give you a hard time about business keys, you could argue that the only people who could tie a business key to a person or organization is one who has access to the production system (since such identification would not exist in test or QA). Such persons should be trusted enough since they have access to that data anyway.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Data Obfuscation

Post  JimShaw on Wed May 25, 2011 8:39 am

Thanks for your replies to my question, which confirm my own thinking on this.

It is very valuable to get this kind of external validation. This will be helpful evidence in future debate.

Jim


JimShaw

Posts : 2
Join date : 2009-12-09
Location : Edinburgh, Scotland

View user profile

Back to top Go down

Re: Data Obfuscation

Post  BoxesAndLines on Fri Jun 03, 2011 11:29 am

BTW, if your ETL team is using Informatica, they have an option that will subset and obfuscate production data.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Data Obfuscation

Post  data_cook on Sun Jun 23, 2013 4:39 am

I've recently been looking into the same issue. I suppose the only real difference is that I have had experience in this area in building and masking test data subsets for application development. For that particular task I used Data Masker

However in the Data Warehouse environment I suggested that we go down the path of data discovery and only mask the critical information and do that under an ETL approach. For the ETL we used Talend community edition. In the end it was all about managing the risk from an organisational perspective.

IF your inclined you can check out some more info at http://www.datakitchen.com.au

data_cook

Posts : 1
Join date : 2013-06-23

View user profile

Back to top Go down

Re: Data Obfuscation

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum