[Question]: Preserving hyperlinks when converting PDF to markdown or json

### Question Validation

- [x] I have searched both the documentation and discord for an answer.

### Question

Hi! I am currently trying to extract the citations from a series of PDFs. I need to convert the original PDF into either markdown or json, but I am noticing for both document versions the output does not preserve the hyperlinks. Please let me know if you have any suggestions!

This is my current code for parsing the PDFs: 

```
all_markdown = []

for filename in os.listdir(folder_path):
    if not filename.lower().endswith(".pdf"):
        continue

    file_path = os.path.join(folder_path, filename)

    parser = LlamaParse(
        api_key=LLAMA_CLOUD_API_KEY,
        result_type="markdown",
        annotate_links=true         
    )

    try:
        docs = parser.load_data(file_path)
        md_text = docs[0].text       

        base_name = os.path.splitext(filename)[0]
        md_filename = base_name + ".md"
        md_path = os.path.join(output_folder, md_filename)

        with open(md_path, "w", encoding="utf-8") as f:
            f.write(md_text)

        print(f"Saved markdown: {md_path}")
        all_markdown.append({"filename": base_name, "markdown": md_text})

    except Exception as e:
        print(f"Error processing {file_path}: {e}")
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question]: Preserving hyperlinks when converting PDF to markdown or json #20308

Question Validation

Question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question]: Preserving hyperlinks when converting PDF to markdown or json #20308

Description

Question Validation

Question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions