0

I'm having some trouble with OCR extraction

The script is supposed to pull a link to an image from a cell in a google sheet and attempts OCR extraction using it, but the generated OCR file keeps getting the google login page instead of the image I need (And I'm logged in...)

Any thoughts on why this could be happening?

As an aside, if I wanted to put the OCR Files into their own folder how would I go about this?

Edit: Shared code

var extract_sheet = spreadsheet.getSheetByName("Extract Data");
var sheet = SpreadsheetApp.setActiveSheet(extract_sheet);
var startRow = 2;  // First row of data to process
var numRows = 150;   // Number of rows to process

function doExtract() {
  var dataRange = sheet.getRange(startRow, 1, numRows, 5)
  var data = dataRange.getValues();
  for (var i = 0; i < data.length; ++i) {
    var row = data[i];
    var file_link = row[4];
    var extracted_data = row[6];
    var ocr_file_link = row[7];

    var valueURL = sheet.getRange(startRow + i, 4).getValue();

    var valueURLlength = valueURL.length;
    if (valueURLlength != 0) {
      var image = UrlFetchApp.fetch(valueURL).getBlob();

      var file = {
        title: 'OCR File',
        mimeType: 'image/png'
      };

      // OCR is supported for PDF and image formats
      file = Drive.Files.insert(file, image, {ocr: true});
      var doc = DocumentApp.openByUrl(file.embedLink);
      var body = doc.getBody().getText();
      //Get link Doc that Generated
      sheet.getRange(startRow + Number(i), 6).setValue(file.embedLink);
      //Get Content of Doc that Generated
      sheet.getRange(startRow + Number(i), 7).setValue(body);
    }
  }
}

Here's how we solved this issue

Turns out I didn't need to leave google drive at all so I was able to use the Drive File ID instead of a FetchURL.

So I replaced:

var image = UrlFetchApp.fetch(valueURL).getBlob();

with var image = DriveApp.getFileById(file_ID).getBlob();

And added a var file_ID = row[1]; after var row = data[i];

Hopefully this helps other people who stop by with a similar problem! Thank you.

Justin
  • 1
  • 1
  • can you share code please? – JSmith Sep 07 '18 at 15:42
  • 1
    @JSmith Added the code, thank ya. – Justin Sep 07 '18 at 16:29
  • `UrlFetchApp` doesn't automatically say it's you. You need to provide an authorization header using an OAuth token (that your script can generate) in order to say who is making the request - otherwise it's an anonymous request from a Google IP. – tehhowch Sep 07 '18 at 16:34
  • @tehhowch I thought I was authorizing the application to use my data/credentials when I got the Authorization/permissions popup I hit 'Allow' on the first time I ran the script? – Justin Sep 07 '18 at 17:23
  • That authorized the code in the script project. It doesn't authorize external resources, such as a remote URL. `UrlFetchApp` doesn't send cookies or other parameters either, unless you explicitly tell it to do so. So persisting a log-in state and navigating through URL redirects, etc. will be a learning experience. – tehhowch Sep 07 '18 at 17:58
  • @tehhowch Oh that makes sense! I've googled around for adding authorization to the sheet, but I haven't seen anything that works so far. `var USERNAME = PropertiesService.getScriptProperties().getProperty('username'); var PASSWORD = PropertiesService.getScriptProperties().getProperty('password'); var url = PropertiesService.getScriptProperties().getProperty('url'); var headers = { "Authorization" : "Basic " + Utilities.base64Encode(USERNAME + ':' + PASSWORD) }; var params = { "method":"GET", "headers":headers }; var reponse = UrlFetchApp.fetch(url, params);` – Justin Sep 07 '18 at 19:24
  • Character limits are harsh haha - But that's one of the things I tried with no luck. (I did add the script properties, still nothing) – Justin Sep 07 '18 at 19:26
  • Consider using https://developers.google.com/apps-script/reference/script/script-app#getOAuthToken() – tehhowch Sep 07 '18 at 19:29
  • 1
    @Justin What's the link, you're accessing? – TheMaster Sep 07 '18 at 22:42
  • @I'-'I https://docs.google.com/uc?id=1y7PKaFsaZU7ko4-SxqG8reL1HVC2ACNe would be one of them (A test file) - It likely won't work for you - Part of the issue I realize now is the folder's locked down to specific users, as it should probably stay as the actual files are for work. I could just leave the folder wide open I guess but if I can figure out the OAuth/login method It'd be better for sure. Thanks for your help on this so far guys. – Justin Sep 08 '18 at 02:11
  • @tehhowch tehhowch That link didn't do anything for me unforunately. Spent a good while fiddling with it. Thank you though! :) – Justin Sep 08 '18 at 02:11
  • @Justin If the file is already in Drive... Why are you Urlfetching and putting it back in Drive? – TheMaster Sep 08 '18 at 08:42
  • https://stackoverflow.com/a/36059696 use `DriveApp` to getBlob – TheMaster Sep 08 '18 at 08:51
  • @I'-'I ...Oh wow. You know I feel prettyyy silly now. Let's call this resolved. Thank you so much. – Justin Sep 09 '18 at 02:24
  • @Justin If you worked it out,Post your code with explanation as a answer. It'll help others. – TheMaster Sep 09 '18 at 08:46
  • @justin You should self-answer, rather than edit your question. – tehhowch Sep 10 '18 at 02:21
  • Here's how we solved this issue Turns out I didn't need to leave google drive at all so I was able to use the Drive File ID instead of a FetchURL. So I replaced: `var image = UrlFetchApp.fetch(valueURL).getBlob();` with `var image = DriveApp.getFileById(file_ID).getBlob();` And added a `var file_ID = row[1];` after `var row = data[i];` Hopefully this helps other people who stop by with a similar problem! Thank you. – Justin Sep 16 '18 at 21:18

0 Answers0