Welcome Guest, Not a member yet? Create Account  


Extracting Elevation Data from PDF files (Contours, Spot Levels etc...)

#1
(This post was last modified: 01-30-2017, 10:51 AM by Ted Woods. Edit Reason: Spelling correction )

I have been looking into the process of extracting elevation data from PDF files into CAD data for use Kubla's takeoff module.

There are a number of issues that come up immediately as PDF files are not designed to store technical meta data. So technical complications often arise during the process. It should not be attempted with the expectation of perfect results every time. 

However it is a huge time saver potentially.  I have identified the following as items we could extract : 

  • Continuos Lines (Outlines, Breaks lines, Contour lines) : Perhaps the easiest element to extract from a PDF is a continous line (i.e one that is not dashed or dotted). These can be loaded into Kubla Cubed as contour lines, break lines or outlines. It is important to realise that the PDF lines contain no elevation information so the elevations will need to be corrected by adjusting the polylines Z property in a CAD program or adjusting the elevation features level in Kubla Cubed.

  • Dashed Lines (Outlines, Breaks lines, Contour lines) : Dashed lines often cause problems as they usually get converted to a polyline for each dash. Converting these dash segments back into polylines in CAD is so time consuming it makes the whole process counter productive. The more advanced conversion tools can have special functionality for handling this requirement and can often extract a dashed line as single polyline.

  • Crossed lines (Points): In PDF files point levels are often marked with crosses, there is no concept of a CAD 'point' in a PDF file. Kubla Cubed can extract points from crosses in CAD files so the positions of point levels can often be extracted. Again no elevation will be contained in the points extracted from crossed lines. However Kubla Cubed has the ability to extract elevations from nearby text. If text is also successfully extracted from the PDF then both position and elevation of points can be extracted.


  • Text (Point Elevations): The CAD importer in Kubla Cubed has the ability to match points with nearby text that may contain elevation information. This means that if we can extract text from the PDF file and the crosses of points we can import points with their elevations into Kubla Cubed. A frequent problem with this is that some programs save text into PDF files as collections of lines rather than as a text entity (this is usually when a non truetype font has been used). Again some of the more advanced conversion software can try to handle this and convert the polylines representing text back into a CAD text element.
So far I have been experimenting in just extracting contour lines (that are not dashed) and have had some mixed results using the freeware program InkScape.  It seems if the PDF is too complex the DXF export often fails.  However I have heard of better software to do this like :

Able2Extract Standard
www.investintech.com

Able2Extract Pro
www.investintech.com

PDF2DWG
www.dotsoft.com

Print2CAD
http://www.backtocad.com/


Has anyone else got experience with these?  To me it seems Print2CAD or Back2CAD as it is also known are the market leaders in this area.  They do regular seminars and can do things like convert dashed lines into CAD polylines and extract text using Optical Character Recognition (OCR).
Reply

#2

It has been my understanding that vectors contain unique information that can tell the software which vector is next in line when importing. One of the problems I see is which way did the engineer draw the line or what type of line did they use can have a large impact on importing linework. I have seen Ghostscript, VectorDraw, and VeryPDF used too.
Reply

#3

(06-09-2017, 03:11 PM)Digger662 Wrote: It has been my understanding that vectors contain unique information that can tell the software which vector is next in line when importing. One of the problems I see is which way did the engineer draw the line or what type of line did they use can have a large impact on importing linework. I have seen Ghostscript, VectorDraw, and VeryPDF used too.

Hi Digger662 

Welcome to the forums and thanks for your input.  I have not heard of those three packages you mentioned apart from Ghostscript which is interesting as it is actually free to download and use I think.  I will download it and give it a go with some site plans to see how effective it is.  Currently I have been using Inkscape which is really hit and miss.  With complicated data it just does not export to a DXF at all.

However we would not be able to use directly in Kubla Cubed without paying because the license forbids commercial distribution I think.

I agree with you about the problems with the way the engineer has defined the lines.  Basically PDF files were never intended to be used this way so it is really difficult to have a consistent workflow to get the data in without a lot of technical understanding on the part of the user.

I bit later on I am going to try to publish a blog post about how to convert PDF to DXF and for some users with CAD expertise hopefully this will be helpful.

However we are going to be creating automatic line extraction tools in Kubla Cubed in the long term so hopefully all this complexity won't be necessary.
Reply

#4

(06-12-2017, 09:34 AM)Ted Woods Wrote:
(06-09-2017, 03:11 PM)Digger662 Wrote: It has been my understanding that vectors contain unique information that can tell the software which vector is next in line when importing. One of the problems I see is which way did the engineer draw the line or what type of line did they use can have a large impact on importing linework. I have seen Ghostscript, VectorDraw, and VeryPDF used too.

Hi Digger662 

Welcome to the forums and thanks for your input.  I have not heard of those three packages you mentioned apart from Ghostscript which is interesting as it is actually free to download and use I think.  I will download it and give it a go with some site plans to see how effective it is.  Currently I have been using Inkscape which is really hit and miss.  With complicated data it just does not export to a DXF at all.

However we would not be able to use directly in Kubla Cubed without paying because the license forbids commercial distribution I think.

I agree with you about the problems with the way the engineer has defined the lines.  Basically PDF files were never intended to be used this way so it is really difficult to have a consistent workflow to get the data in without a lot of technical understanding on the part of the user.

I bit later on I am going to try to publish a blog post about how to convert PDF to DXF and for some users with CAD expertise hopefully this will be helpful.

However we are going to be creating automatic line extraction tools in Kubla Cubed in the long term so hopefully all this complexity won't be necessary.

I think if the contour vectors can be selected from the PDF and imported with a zero value for elevation then allow the contours to be selected one by one and change the elevation would be a significant improvement to the current workflow of tracing contours. Maybe allow individual vectors to be deleted, broken or trimmed if they are imported improperly.
Reply

#5

(09-18-2018, 09:43 PM)AggieBQ86 Wrote:
(06-12-2017, 09:34 AM)Ted Woods Wrote:
(06-09-2017, 03:11 PM)Digger662 Wrote: It has been my understanding that vectors contain unique information that can tell the software which vector is next in line when importing. One of the problems I see is which way did the engineer draw the line or what type of line did they use can have a large impact on importing linework. I have seen Ghostscript, VectorDraw, and VeryPDF used too.

Hi Digger662 

Welcome to the forums and thanks for your input.  I have not heard of those three packages you mentioned apart from Ghostscript which is interesting as it is actually free to download and use I think.  I will download it and give it a go with some site plans to see how effective it is.  Currently I have been using Inkscape which is really hit and miss.  With complicated data it just does not export to a DXF at all.

However we would not be able to use directly in Kubla Cubed without paying because the license forbids commercial distribution I think.

I agree with you about the problems with the way the engineer has defined the lines.  Basically PDF files were never intended to be used this way so it is really difficult to have a consistent workflow to get the data in without a lot of technical understanding on the part of the user.

I bit later on I am going to try to publish a blog post about how to convert PDF to DXF and for some users with CAD expertise hopefully this will be helpful.

However we are going to be creating automatic line extraction tools in Kubla Cubed in the long term so hopefully all this complexity won't be necessary.

I think if the contour vectors can be selected from the PDF and imported with a zero value for elevation then allow the contours to be selected one by one and change the elevation would be a significant improvement to the current workflow of tracing contours. Maybe allow individual vectors to be deleted, broken or trimmed if they are imported improperly.

Hi AggieBQ86

Yes that is what we planned to do.  Effectively allow the user to pick vectors out of the site plan to use as a contour line and then enter the elevation.  It is quite tricky though in someways.  PDF files were never designed to store CAD data they are a print format so there are a number of complications.  

Have you tried the other techniques I mentioned above?  It would be worth experimenting with InkScape to see if you can convert the PDF into a CAD file (there are tutorials online).  Then scale it, delete all the data apart form Contour Lines and import into Kubla Cubed.

I have had limited success with this method.  The last version of InkScape I had seemed to load the PDF files OK but then crash when converting, they might have fixed things now though.  It is worth getting the latest version and giving it a go.  

Products like Back2CAD claim to be able to extract contour lines and even turn dashed lines into solid polylines.  It is not a free product but if you do a lot of take-off it might be worth a look.  I have had one report of this working for a user, but of course there were no elevation details so you would have to add that in manually in either CAD or Kubla Cubed.

Let us know how you get on.
Reply

#6

(09-18-2018, 09:43 PM)AggieBQ86 Wrote:
(06-12-2017, 09:34 AM)Ted Woods Wrote:
(06-09-2017, 03:11 PM)Digger662 Wrote: It has been my understanding that vectors contain unique information that can tell the software which vector is next in line when importing. One of the problems I see is which way did the engineer draw the line or what type of line did they use can have a large impact on importing linework. I have seen Ghostscript, VectorDraw, and VeryPDF used too.

Hi Digger662 

Welcome to the forums and thanks for your input.  I have not heard of those three packages you mentioned apart from Ghostscript which is interesting as it is actually free to download and use I think.  I will download it and give it a go with some site plans to see how effective it is.  Currently I have been using Inkscape which is really hit and miss.  With complicated data it just does not export to a DXF at all.

However we would not be able to use directly in Kubla Cubed without paying because the license forbids commercial distribution I think.

I agree with you about the problems with the way the engineer has defined the lines.  Basically PDF files were never intended to be used this way so it is really difficult to have a consistent workflow to get the data in without a lot of technical understanding on the part of the user.

I bit later on I am going to try to publish a blog post about how to convert PDF to DXF and for some users with CAD expertise hopefully this will be helpful.

However we are going to be creating automatic line extraction tools in Kubla Cubed in the long term so hopefully all this complexity won't be necessary.

I think if the contour vectors can be selected from the PDF and imported with a zero value for elevation then allow the contours to be selected one by one and change the elevation would be a significant improvement to the current workflow of tracing contours. Maybe allow individual vectors to be deleted, broken or trimmed if they are imported improperly.

Hi AggieBQ86, 

Just a quick update, the latest release- Kubla Cubed 2021 has now been launched!!

Thank you for your PDF vector extraction suggestions - the following have now been implemented:
- Contour PDF vector extraction (importing as 0.00 values) 
- Join Tool 
- Split Tool 
- Set Multiple Elevations (SME) Tool 

Plus, there are many other updates - you can see these in our video What's New in 2021?

If you are a Subscription Licence Holder, you can upgrade today if you follow the instructions when you open Kubla Cubed in your desktop. 

Please keep your new feature suggestions coming!

Kate
<p><br></p>
Reply

#7
(This post was last modified: 03-21-2023, 01:44 PM by kbkhan.)

Hey there! I see you've been doing some interesting work with extracting elevation data from PDF files. Although I'm not familiar with the software you mentioned, it's great that you're experimenting with different options to get the best results. It's true that PDF files can be tricky when it comes to technical metadata, so it's important to have the right tools to handle the job. Have you considered trying out Smart Engines SDK for OCR? It might be able to help you with extracting text from PDF files for elevation data. Anyway, keep up the great work, and don't hesitate to ask for help or advice on the forum!
Reply

#8

Although I am not an expert to be able to solve this problem for you. But I really hope someone can guide you here!
Minesweeper
Reply

#9

(03-18-2023, 02:46 PM)kbkhan Wrote: Hey there! I see you've been doing some interesting work with extracting elevation data from PDF files. Although I'm not familiar with the software you mentioned, it's great that you're experimenting with different options to get the best results. It's true that PDF files can be tricky when it comes to technical metadata, so it's important to have the right tools to handle the job. Have you considered trying out Smart Engines SDK for OCR? It might be able to help you with extracting text from PDF files for elevation data. Anyway, keep up the great work, and don't hesitate to ask for help or advice on the forum!

Hi.  Thanks for the tip.  We are a little way off investigating Optical Character recognition at this stage, but when we get onto it we'll take a look.
Reply




Users browsing this thread:
2 Guest(s)