1 / 5
2 / 5
3 / 5
4 / 5
5 / 5



TLExtractor is a Python script that extracts data from tldraw pages, saves it as JSON, and process images. Supporting both standard and Custom Submission Template with features like depth-first search, async, multi-threading and multi-processing.

This is mainly used to extract specific students data. Where they would utilize a template to submit their project to. In which this script would then utilize to extract the necessary data.


Video


  1. Depth-First Search algorithm used to search through tree-like data structure, starting from the deepest and working up.

  2. Async programming creating as many coroutine objects as there are pages. Running all of them concurrently while ensuring minimal overhead

  3. Multi-Processing used to process multiple images in true parallel

  4. Playwright website scrapper.

  5. Multi-Threading loading threads for each page. Simulate loading screen

  6. Python as the main programming language.


Example

#=> JSON Data Structure
{
     "project title": "CORE STUDIO 02-24-TEST",
     "data": [
         {
             "page": "benchmark 01",
             "date": "DUE 26 MAY (SUNDAY) 2359",
             "description": "First iteration of site in blender/rhino",
             "students": [
                 "person1",
                 "person2",
                 "person3",
             ]
         }
     ]
}


Check it out on Github/Tlextractor for more information.