Suyati Technologies
  • Services
    • Salesforce Services
      • Sales Cloud
      • Service Cloud
      • Marketing Cloud
      • Einstein
      • Experience Cloud
      • Mulesoft
      • Commerce cloud
      • Finance cloud
      • CPQ
      • Consultation
      • Implementation
      • Integration
      • Custom Development
      • Salesforce DevOps
      • Support & Maintenance
      • App Development
      • Managed Services
    • IT Services
      • Content Management Services
      • Analytics
      • RPA
      • Front end Technologies
      • Microsoft Applications
      • Cloud
      • DevOps
      • Snowflake
  • Approach
    • Development Methodology
    • Engagement Model
    • Consulting
  • Intel
    • Blog
    • eBooks
    • Webinars
    • Case Studies
  • About Us
    • Management Team
    • Advisory Board
    • Our Story
    • Testimonials
  • Careers
  • Contact Us
Suyati Technologies
  • Services
    • Salesforce Services
      • Sales Cloud
      • Service Cloud
      • Marketing Cloud
      • Einstein
      • Experience Cloud
      • Mulesoft
      • Commerce cloud
      • Finance cloud
      • CPQ
      • Consultation
      • Implementation
      • Integration
      • Custom Development
      • Salesforce DevOps
      • Support & Maintenance
      • App Development
      • Managed Services
    • IT Services
      • Content Management Services
      • Analytics
      • RPA
      • Front end Technologies
      • Microsoft Applications
      • Cloud
      • DevOps
      • Snowflake
  • Approach
    • Development Methodology
    • Engagement Model
    • Consulting
  • Intel
    • Blog
    • eBooks
    • Webinars
    • Case Studies
  • About Us
    • Management Team
    • Advisory Board
    • Our Story
    • Testimonials
  • Careers
  • Contact Us
Suyati Technologies > Blog > CSV/TSV/TAB to DataTable

CSV/TSV/TAB to DataTable

by Team Suyati July 18, 2012
by Team Suyati July 18, 2012 4 comments

Summary: Troubled by enormous data  extraction when using CSV/TSV/TAB? Our Software Developer at Suyati Technologies has come up with a unique answer to prevent data loss while reducing the time taken for extracting the data. According to the team – “We find that the time taken for extraction using our string handling approach on a file containing more than 5000 records is 4 times faster than the Excel Interop approach. The same code logic works for CSV/TSV/TAB extraction and has been tested with more than 300 files.” Read on to find out how they did it!
In most of the B2B systems, huge amount of data is transferred, usually in plain text format. For example, CSV (Comma separated value), TSV (Tab Separated value) or TAB (Tab Separated) file formats are used particularly in data generation by e-commerce partners, and is sent to multiple business end points for analytics and reporting.
The most popular way of handling these file formats would be to use Excel Interop. The problem that we faced in using Excel Interop is that it takes huge amount of time to process a file with more than 5000 records and ultimately gets timed out on the service or app that houses this extraction logic.
Another way to tackle extraction of data from CSV/TSV/TAB is to use OleDBAdapter with specific delimiters in the connection string. It works, but data will not be in the proper format as expected.
The need is to have a technique in place that can prevent data loss and work with better performance. Working on the assumption that complex problems could always have a simple solution, we came up with a solution from scratch that only uses string handling in .NET and pure mathematics to achieve this task of data extraction into a Data Table format.

Code Extraction Logic

The plain text data is read using StreamReader and data starting from the HeaderLength is taken as the finalData for the extraction logic. Since we know the kind of delimiter for a CSV/TSV/TAB file, we use this delimiter to strip the data into individual data. Column headers are calculated from data at the 0th index in the finalData string array till the line break, since data starts from the line break of the last column.
From this point, data is read row by row using the line break concept. We are also handling scenarios where data consists of a comma in a CSV file where delimiter is also a comma.
Using all this column information and row information, we construct a DataTable with the data extracted from CSV/TSV/TAB.
If the code needs to work for CSV Extraction, then we need to call our extraction method as follows.



 
// For CSV Extraction
 

  1. Extraction Extractionoperations = new Extraction();
  2. fileInfo = new FileInfo(@"D:TestTest.csv");
  3. HeaderLength = 0;
  4. operations.EndLineDelimiter = Constants.CSVEndLine;
  5. operations.FileDelimiter = Constants.CSVDelimiter;
  6. DataTable CSVData = operations.ExtractDataFromDelimitedFiles(fileInfo, HeaderLength);


If the code needs to work for TSV/TAB Extraction, then we need to call our extraction method as follows.



 
// For TSV Extraction
 

  1. Extraction operations = new Extraction();
  2. FileInfo fileInfo = new FileInfo(@"D:DevTestTest.tsv");
  3. int HeaderLength = 0;
  4. operations.EndLineDelimiter = Constants.TSVEndLine;
  5. operations.FileDelimiter = Constants.TSVDelimiter;
  6. DataTable TSVData = operations.ExtractDataFromDelimitedFiles(fileInfo, HeaderLength);


Here, one most important thing that we need to note is the HeaderLength variable which needs to be set with the row number from where the data starts in the excel sheet. Row number can be noted from excel as shown below (HeaderLength=5).

From our performance analysis, we find that the time taken for extraction using our string handling approach on a file containing more than 5000 records is 4 times faster than the Excel Interop approach. The same code logic works for CSV/TSV/TAB extraction and has been tested with more than 300 files.
The visual studio projects types are of 2008.
Download Core logic Source Code (Class Library Project)
Download Sample Application that uses our DLL (Console Application)
Download DLL

4 comments
0
FacebookTwitterLinkedinTumblr
previous post
Hybris for e-Commerce
next post
What is Salesforce?

You may also like

What you need to know before kick-starting cloud...

January 13, 2022

An Eye-opener into the Future Trends in Salesforce...

January 13, 2022

Seven Key IT Outsourcing Trends to Expect in...

January 13, 2022

How to Select the Right Partner for a...

January 13, 2022

On Premises vs Cloud CRM: Which is Better?

September 28, 2021

Choosing between Cloud and On-Premise Servers for your...

September 28, 2021

Broken Customer Experience? What’s the Fix?

August 19, 2020

Are Remote Proctored Exams a New Reality?

August 18, 2020

10 Exciting Features in Salesforce’s new Summer ’20...

August 17, 2020

Importance of Data Analytics in Developing Smart Cities

August 11, 2020

4 comments

Avatar
Kiran July 18, 2012 - 10:42 pm

I tried a few tsv files with the sample application and it works with lightening speed. The code looks pretty complex but hats off in deriving a code with just string handling techniques. I think this is definitely a great parser.

Reply
Avatar
Trina Michael July 18, 2012 - 11:07 pm

I had a code which uses(using JET instead of ODBC) and have run into a serious drawback. I have a field that is numbers, but I use a CSV format instead of Excel because I NEED to keep the leading zeros. When I use JET, it reads that field as a double and drops the leading zeros.
This is a great relief. I am happy to see my leading zeros still intact after reading my CSV. 🙂

Reply
Avatar
Wendy Miloy July 18, 2012 - 11:37 pm

I have a .CSV file from which I read the data and put it in a dataset, the dataset then undergoes various actions and I end up putting the data into a sql server2005 database. The following is the code that I am using to access to .CSV file and extract the data:
cnnCSVConnection = New OleDb.OleDbConnection(“Provider=Microsoft.Jet.OLEDB.4.0;Data Source=” + path + “;Extended Properties=Text;”)
Dim cmdCSVCommand As OleDb.OleDbCommand = New OleDb.OleDbCommand(“SELECT * FROM ” & fileName, cnnCSVConnection)
Dim adpCSV As OleDb.OleDbDataAdapter = New OleDb.OleDbDataAdapter(cmdCSVCommand)
cnnCSVConnection.Open()
adpCSV.Fill(dstCSVFile, “MemberData”)
The problem is that the select command doesn’t always return all the data in the .CSV file. The current problem that I am having is that the first column is being returned as blank. I have attached the .csv file for people to have a look at (Note the file had to be uploaded as a .txt file). The problem only seems to occur when there is a limited amount of data in the .csv file. If I submit a file with 5 or 133 records no problems occur.
What am I doing wrong? Would be glad if I can get the VB.NET version for csv data extraction.
Thanks
Wendy

Reply
Avatar
Abhishek July 20, 2012 - 9:13 am

Thank you all for your responses.
Regarding Wendy’s question, it often happens with OleDB that the data might be garbled during extraction or sometimes the connection might refuse to open if the file has some problem. Therefore, a generic parser using pure string handling technique would be apt for extraction of data from plain text format files.
Regaring the VB.NET version of this code, We suggest you to try online C#.NET to VB.NET code converters.
Do let us know your comments. Thank you guys 🙂

Reply

Leave a Comment Cancel Reply

Save my name, email, and website in this browser for the next time I comment.

Keep in touch

Twitter Linkedin Facebook Pinterest

Recent Posts

  • What you need to know before kick-starting cloud implementation

    January 13, 2022
  • An Eye-opener into the Future Trends in Salesforce Commerce Cloud

    January 13, 2022
  • Seven Key IT Outsourcing Trends to Expect in 2022

    January 13, 2022

Categories

  • Twitter
  • Linkedin
  • Facebook
  • Instagram
  • Services
    • Salesforce Services
      • Sales Cloud
      • Service Cloud
      • Marketing Cloud
      • Einstein
      • Experience Cloud
      • Mulesoft
      • Commerce cloud
      • Finance cloud
      • CPQ
      • Consultation
      • Implementation
      • Integration
      • Custom Development
      • Salesforce DevOps
      • Support & Maintenance
      • App Development
      • Managed Services
    • IT Services
      • Content Management Services
      • Analytics
      • RPA
      • Front end Technologies
      • Microsoft Applications
      • Cloud
      • DevOps
      • Snowflake
  • Approach
    • Development Methodology
    • Engagement Model
    • Consulting
  • Intel
    • Blog
    • eBooks
    • Webinars
    • Case Studies
  • About Us
    • Management Team
    • Advisory Board
    • Our Story
    • Testimonials
  • Careers
  • Contact Us

© 2021 Suyati Technologies


Back To Top
Suyati Technologies

Popular Posts

  • 1

    What are the Top 3 risks for implementing a CX Program?

    August 30, 2019
  • 2

    Do you need a separate CX Team at your company?

    September 2, 2019
  • 3

    How to build Employee Advocacy for your Business?

    September 3, 2019
  • 4

    What is Salesforce CRM and What Does it Do?

    February 19, 2014
  • 5

    Tips to Reduce Salesforce Pricing

    February 17, 2015
© 2021 Suyati Technologies

Read alsox

So, what exactly is this so called, Load Test?

September 5, 2013

Customizing Salesforce CRM to Suit Your Business Model

July 22, 2014

Suyati at Dreamforce ’18: Powered by connected insights

September 24, 2018

By continuing to use this website you agree with our use of cookies. Read More Agree